date:20101114

Robert Haas  writes:
> Looking through the code, it appears to me that we'd need to do the
> following (not necessarily in this order):

Don't forget

9. Unload loadable modules that do not exist according to the new
database's catalogs; eg we don't want postgis trying to run when
its supporting tables don't exist in the selected database.
10. Somehow persuade remaining loadable modules to discard whatever
state they have that might be database-local.

We don't have workable APIs for either of those operations ATM.

I believe also that there are probably race conditions in several of
the steps you listed; in particular there is certainly a risk involved
in changing the database-we-advertise-being-connected-to versus a
concurrent DROP DATABASE.  Maybe that's easily soluble, but I'm not sure
about it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-14 Thread Pavel Stehule

Hello

2010/11/15 Itagaki Takahiro :
> On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro
>  wrote:
>> Postgres supports ARRAY data types well, but there are some
>> more array functions in the SQL standard. Also, the standard
>> has MULTISET data type, that is an unordered array.
>
> Here is a WIP patch for multiset function supports. Note that multiset
> types are not added in the patch; it just adds functions and syntax.
> Arguments or result types of those functions are anyarray rather than
> anymultiset. The result type is always flatten into on-dimensional
> array because some functions requires per-element operations; I'm not
> sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So,
> it is treated as trim_array( 9 elements array, 2 elements ) in the patch.
>
> The SQL standard defines special syntax for multiset. I added four
> unreserved keywords for them; A, MEMBER, MULTISET, and SUB.
> (I don't like such ad-hoc syntax, but it is the standard...)
> Some of the new expressions are just syntactic sugar for existing
> other expressions or new functions. For example, "$1 MEMBER OF $2" is
> expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)".
>
> I have not researched the spec yet enough, especially NULLs in collections.
> I'll continue to check the details.
>
> BTW, some of the intermediate products to implement those features might
> be useful if exported. like array_sort() and array_unique(). If there is
> demand, I'll add those functions, too.
>
> Any comments for the approach or detailed features?
>

I has not a standard, so I can't to speak about conformance with
standard, but I must to say, so these functionality should be
extremely useful for plpgsql programming. You can see samples a some
functions on the net for array sorting, for reduce a redundant values
via SQL language. I am sure, so implementation in C will be much
faster. Maybe can be useful to implement a searching on sorted array.
You can hold a flag if multiset is sorted or not.

Regards

Pavel Stehule

> === New features ===
> - [FUNCTION] cardinality(anyarray) => integer
> - [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
> - [FUNCTION] element(anyarray) => anyelement
> - [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2)
> - [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2
> - [SYNTAX] $1 IS A SET  --> multiset_is_a_set($1)
> - [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 -->
> multiset_union($1, $2, all?)
> - [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 -->
> multiset_intersect($1, $2, all?)
> - [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 -->
> multiset_except($1, $2, all?)
> - [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg()
> - [AGGREGATE] fusion(anyarray) => anyarray
> - [AGGREGATE] intersection(anyarray) => anyarray
>
> --
> Itagaki Takahiro
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

Greg Stark  writes:
> Fwiw I think he's right that sum(int2) should perhaps be redefined to
> return int8. As it stands all it would take is a 64k rows to
> potentially overflow. It's not super likely but it is plausible and
> the performance penalty to use int8 wouldn't be super big either.

It's not unreasonable.  I think the performance penalty for int8 was
higher when that choice was made than it is now --- and in particular,
on a 64-bit machine it's now pretty much negligible.

On the other hand, you can always execute sum(foo::int4) if you need a
wider sum, just like the escape hatch if any of the other datatype
choices aren't working for you.  It's not clear that we should force a
performance loss on people who don't need it (and I can't offhand recall
*ever* hearing a complaint about sum(int2) overflowing...)

I believe what the OP was arguing for was not so much this sort of
marginal tinkering as inventing a more general notion of "integer type"
that would avoid the whole issue.  The problem with that is that we have
to work within the set of types specified by the SQL standard.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] changing MyDatabaseId

I've spent a few hours pouring over the source code with
coarse-toothed comb, trying to figure out just exactly what might
break if we changed MyDatabaseId after backend startup time, or in
other words, allowed a backend to unbind from the database to which it
was originally bound and rebind to a new one.  This is related to the
periodic conversations we've had about a built-in connection pooler,
and/or maintaining a pool of worker threads that could be used to
service parallel query, replication sets, etc.  What follows is not
meant to be a concrete design proposal; it's basic research that may
lead to a proposal at some time in the future.  Still, comments are
welcome.

For the use cases that seem interesting, it is sufficient to think
about changing MyDatabaseId when there is no transaction in progress,
or perhaps, when there is a transaction in progress which does that
and nothing else.  This means that code that simply uses MyDatabaseId
in the course of going about its duties is not going to blow up.
However, anything that stores the current database ID someplace is a
hazard, as is any code that stores information which was obtained
using the current database ID.  Looking through the code, it appears
to me that we'd need to do the following (not necessarily in this
order):

1. Blow away all of our catalog caches as if we've received a sinval
reset.  Since most of our caches are hooked into the catalog cache via
invalidation hooks, this should trigger cascading invalidations of
other caches that need it, at least in most cases.  (Further looking
is probably needed to identify any cases where a
sinval-reset-equivalent is insufficient.)
2. Blow away absolutely everything from the relcache.  It appears to
me that we'd need to repeat the phase 2 and 3 initialization steps.
We can't even keep the entries for shared catalogs, because the
descriptors that formrdesc() coughs up may not exactly match what we
read from the pg_class entry.  It's tempting to propose adding enough
additional shared catalogs to make the relcache entries for shared
catalogs independent of any particular database's contents, but right
now they are not.
3. Reinitialize the relmapper.
4. Release all locks on objects in the old database; and our shared
lock on the database itself.  Take a new shared lock on the new
database.
5. Flush out any backend-local statistics that have been gathered but
not yet sent to the statistics collector.
6. Update our entry in the PgBackendStatus array so pg_stat_activity
sees the new database assignment.
7. Change MyProc->databaseId while holding ProcArrayLock in LW_EXCLUSIVE mode.
8. Update MyDatabaseTableSpace.

I haven't benchmarked this (so perhaps I should STFU) but #2 and maybe
#1 and #3 sounds like the most painful part of this activity.  Still,
there's not much help for it that I can see.  The best you can hope
for is to keep around entries to the shared catalogs, and that's not
going to get you terribly far, and at least ATM even that is unsafe.
I think maybe the thing to do is try to quantify how much time is
being spent in each part of the backend startup process - like connect
to the database lots and lots of times with individual connections,
run a trivial SELECT against a table, and then disconnect and repeat.
The problem is that (unless someone knows how to do magic tricks with
oprofile or dtrace) you're only going to find out the amount of time
actually spent executing each portion of the code.  It won't tell you
how much overhead you have from fork() itself, let alone how much
faster the initialization steps would have run if they had been
running on a warmed-up process address space rather than one that
might well need to fault in a lot of page-table entries in its first
few moments of life.  But it might still provide some useful data.
Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-14 Thread Itagaki Takahiro

On Mon, Nov 15, 2010 at 12:41, Shigeru HANADA  wrote:
> No, SQL/MED would not support indexing foreign tables, at least in
> first version.  Because it would be difficult to use common row id for
> various FDWs.

I think the reason is the SQL standard never mention about indexes.
It is not a specific issue for SQL/MED.

> To support indexing foreign tables might need to change
> common structure of index tuple to be able to hold virtual row-id, not
> ItemPointerData.

I'm not sure we actually need foreign indexes because the query text
sent to another server is same whether the foreign table has indexes.
Of course, foreign indexes might be useful to calculate costs to scan
foreign tables, but the cost also comes from non-index conditions.

I think foreign table and foreign index are a model for row-based
databases, including postgres. But other DBs might have different
cost models. So, it would be better to encapsulate such operations in FDW.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-14 Thread Shigeru HANADA

On Fri, 12 Nov 2010 08:27:54 -0800
Eric Davies  wrote:
> Thank you for the time estimate and the interface discussion. It 
> sounds like the PostgreSQL SQL/MED code will be very useful when it 
> is done. Our product provides read-only access to files, so 
> updates/inserts/deletes aren't an issue for us.
> 
> One thing that is not clear to me is indexing support. Will it be 
> possible to index a SQL/MED table as if it were a regular table?

No, SQL/MED would not support indexing foreign tables, at least in
first version.  Because it would be difficult to use common row id for
various FDWs.  To support indexing foreign tables might need to change
common structure of index tuple to be able to hold virtual row-id, not
ItemPointerData.

Instead, FDW can handle expressions which are parsed from WHERE clause
and JOIN condition of original SQL, and use them to optimize scanning. 
For example, FDW for PostgreSQL pushes some conditions down to remote
side to decrease result tuples to be transferred.  I hope this idea
helps you.

>  What 
> would be the equivalent of Informix's row ids?

Answer to the second question would be "ItemPointerData".  It consists
of a block number and an offset in the block, and consume 6 bytes for
each tuple.  With this information, PostgreSQL can access to a data
tuple directly.  Actual definition is:

typedef struct ItemPointerData
{
BlockIdData ip_blkid;
OffsetNumber ip_posid;
} ItemPointer;

Does Informix uses common row-id (AFAIK it's 4 bytes integer) for
both of virtual tables and normal tables?

Regards,
--
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] MULTISET and additional functions for ARRAY

2010-11-14 Thread Itagaki Takahiro

On Fri, Nov 12, 2010 at 00:02, Itagaki Takahiro
 wrote:
> Postgres supports ARRAY data types well, but there are some
> more array functions in the SQL standard. Also, the standard
> has MULTISET data type, that is an unordered array.

Here is a WIP patch for multiset function supports. Note that multiset
types are not added in the patch; it just adds functions and syntax.
Arguments or result types of those functions are anyarray rather than
anymultiset. The result type is always flatten into on-dimensional
array because some functions requires per-element operations; I'm not
sure how we should treat trim_array( 2D 3x3 array, 2 elements ). So,
it is treated as trim_array( 9 elements array, 2 elements ) in the patch.

The SQL standard defines special syntax for multiset. I added four
unreserved keywords for them; A, MEMBER, MULTISET, and SUB.
(I don't like such ad-hoc syntax, but it is the standard...)
Some of the new expressions are just syntactic sugar for existing
other expressions or new functions. For example, "$1 MEMBER OF $2" is
expanded to "$1 = ANY ($2)" and "$1 IS A SET" to "multiset_is_a_set($1)".

I have not researched the spec yet enough, especially NULLs in collections.
I'll continue to check the details.

BTW, some of the intermediate products to implement those features might
be useful if exported. like array_sort() and array_unique(). If there is
demand, I'll add those functions, too.

Any comments for the approach or detailed features?

=== New features ===
- [FUNCTION] cardinality(anyarray) => integer
- [FUNCTION] trim_array(anyarray, nTrimmed integer) => anyarray
- [FUNCTION] element(anyarray) => anyelement
- [SYNTAX] $1 MEMBER OF $2 --> $1 = ANY ($2)
- [SYNTAX] $1 SUB MULTISET OF $2 --> $1 <@ $2
- [SYNTAX] $1 IS A SET  --> multiset_is_a_set($1)
- [SYNTAX] $1 MULTISET UNION [ALL | DISTINCT] $2 -->
multiset_union($1, $2, all?)
- [SYNTAX] $1 MULTISET INTERSECT [ALL | DISTINCT] $2 -->
multiset_intersect($1, $2, all?)
- [SYNTAX] $1 MULTISET EXCEPT [ALL | DISTINCT] $2 -->
multiset_except($1, $2, all?)
- [AGGREGATE] collect(anyelement) => anyarray --> same as array_agg()
- [AGGREGATE] fusion(anyarray) => anyarray
- [AGGREGATE] intersection(anyarray) => anyarray

-- 
Itagaki Takahiro

multiset-20101115.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] contrib: auth_delay module

On Thu, Nov 4, 2010 at 10:04 AM, Robert Haas  wrote:
> On Thu, Nov 4, 2010 at 6:35 AM, Stephen Frost  wrote:
>> * Jan Urbański (wulc...@wulczer.org) wrote:
>>> On 04/11/10 14:09, Robert Haas wrote:
>>> > Hmm, I wonder how useful this is given that restriction.
>>>
>>> As KaiGai mentined, it's more to make bruteforcing difficult (read: tmie
>>> consuming), right?
>>
>> Which it would still do, since the attacker would be bumping up against
>> max_connections.  max_connections would be a DOS point, but that's no
>> different from today.  Other things could be put in place to address
>> that (max # of connections from a given IP or range could be implemented
>> using iptables, as an example).
>>
>> 5 second delay w/ max connections at 100 would mean max of 20 attempts
>> per second, no?  That's alot fewer than 100*(however many attempts can
>> be done in a second).  Doing a stupid while true; psql -d blah; done
>> managed to get 50 successful ident auths+no-db-found errors done in a
>> second on one box here.  5000 >> 20, and I wasn't even trying.
>
> OK.  I was just asking.  I don't object to it if people think it's
> useful, especially if they are looking at it as "I would actually use
> this on my system" rather than "I can imagine a hypothetical person
> using this".

I haven't heard anyone say "yes, I would actually use this on my
system"?  Any takers?

If we're to commit this, then the patch needs to add a new file
authdelay.smgl, fill it in with appropriate contents, and update
contrib.sgml and filelist.sgml accordingly.  I also note that the
patch offers the ability to log superuser logins.  Since that seems a
bit off-topic for a contrib module called auth_delay, and since we
already have a GUC called log_connections, I'm inclined to propose
removing that part.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

On Sun, Nov 14, 2010 at 9:16 PM, Greg Stark  wrote:
> On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina  wrote:
>> SUM(int2) => int4
>> SUM(int4) => int8
>> SUM(int8) => numeric
>>
>> Some weaknesses:
>>
>> SUM, of any precision, assumes that the precision being accumulated
>> into (which is also the return-precision) is enough to avoid overflow.
>> This is generally the case, but there's no reason why it *must* be
>> true. I'm especially looking at the int2 to int4 conversion. One could
>> imagine a more interesting scenario where overflow behavior could
>> occur much more easily.
>
> Fwiw I think he's right that sum(int2) should perhaps be redefined to
> return int8. As it stands all it would take is a 64k rows to
> potentially overflow. It's not super likely but it is plausible and
> the performance penalty to use int8 wouldn't be super big either.
>
> int4 doesn't seem like as realistic a problem since it would take 4
> billion rows and the performance penalty for using numeric would be
> much higher.

I think you could get the same effect by removing sum(int2)
altogether, which I wouldn't lose any sleep over.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix for cube picksplit function

On Fri, Oct 29, 2010 at 2:59 PM, Alexander Korotkov
 wrote:
> Gist picksplit method implementation in cube contrib module contains a bug
> in Guttman's split algorithm implementation. It was mentioned before but it
> is still not fixed yet. Current implementation doesn't cause incorrect
> behavior, but index becomes very bad, because each index scan require to
> read significant part of index.

It looks to me like this can be safely back-patched, because it
doesn't affect the contents of index entries already on disk, just the
manner in which we construct new ones.

So I've committed this and back-patched it all the way back to 8.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Explain analyze getrusage tracking

2010-11-14 Thread Greg Stark

This is an update to my earlier patch to add getrusage resource
tracking to EXPLAIN ANALYZE.

With this patch you get something like:

  QUERY PLAN
--
 Seq Scan on i  (cost=0.00..6919.44 rows=262144 width=101) (actual
time=17.240..1123.751 rows=262144 loops=1)
   Resources: sys=210.000ms user=430.000ms read=33.6MB
   Buffers: shared read=4298
 Total runtime: 1548.651 ms
(4 rows)

The main change is to make it work under Windows. At least I think the
changes should make it work under Windows, I haven't been able to test
it. Actually I'm not to happy with the way I did it, I would be more
inclined to hack the getrusagestub,h definition of struct rusage to
have an instr_time in it so that we can use the same macros directly.
But that's more changes than I would be happy making without being
able to compile them to test them.


-- 
greg
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
***
*** 110,115  static void ExplainJSONLineEnding(ExplainState *es);
--- 110,116 
  static void ExplainYAMLLineStarting(ExplainState *es);
  static void escape_json(StringInfo buf, const char *str);
  static void escape_yaml(StringInfo buf, const char *str);
+ static double normalize_memory(double amount, char **unit, int *precision);
  
  
  
***
*** 142,147  ExplainQuery(ExplainStmt *stmt, const char *queryString,
--- 143,150 
  			es.costs = defGetBoolean(opt);
  		else if (strcmp(opt->defname, "buffers") == 0)
  			es.buffers = defGetBoolean(opt);
+ 		else if (strcmp(opt->defname, "resource") == 0)
+ 			es.rusage = defGetBoolean(opt);
  		else if (strcmp(opt->defname, "format") == 0)
  		{
  			char	   *p = defGetString(opt);
***
*** 368,373  ExplainOnePlan(PlannedStmt *plannedstmt, ExplainState *es,
--- 371,378 
  		instrument_option |= INSTRUMENT_TIMER;
  	if (es->buffers)
  		instrument_option |= INSTRUMENT_BUFFERS;
+ 	if (es->rusage)
+ 		instrument_option |= INSTRUMENT_RUSAGE;
  
  	/*
  	 * Use a snapshot with an updated command ID to ensure this query sees
***
*** 1098,1103  ExplainNode(PlanState *planstate, List *ancestors,
--- 1103,1199 
  			break;
  	}
  
+ 	/* Show resource usage from getrusage */
+ 	if (es->rusage && es->format == EXPLAIN_FORMAT_TEXT)
+ 	{
+ 		const struct rusage *usage = &planstate->instrument->rusage;
+ 
+ 		bool has_rusage = (!INSTR_TIME_IS_ZERO_TV(usage->ru_stime) || 
+ 		   !INSTR_TIME_IS_ZERO_TV(usage->ru_utime) || 
+ 		   usage->ru_inblock > 0 || 
+ 		   usage->ru_oublock > 0);
+ 		bool has_verbose_rusage = (usage->ru_minflt   > 0 || 
+    usage->ru_majflt   > 0 ||
+    usage->ru_nswap> 0 || 
+    usage->ru_msgsnd   > 0 || 
+    usage->ru_msgrcv   > 0 || 
+    usage->ru_nsignals > 0 || 
+    usage->ru_nvcsw> 0 || 
+    usage->ru_nivcsw   > 0);
+ 
+ 		if (has_rusage || (es->verbose && has_verbose_rusage))
+ 		{
+ 			appendStringInfoSpaces(es->str, es->indent *2);
+ 			appendStringInfoString(es->str, "Resources:");
+ 			
+ 			if (!INSTR_TIME_IS_ZERO_TV(usage->ru_stime)) 
+ 			{
+ double stime = INSTR_TIME_GET_DOUBLE_TV(usage->ru_stime);
+ appendStringInfo(es->str, " sys=%.3fms", stime * 1000);
+ 			}
+ 			
+ 			if (!INSTR_TIME_IS_ZERO_TV(usage->ru_utime)) 
+ 			{
+ double utime = INSTR_TIME_GET_DOUBLE_TV(usage->ru_utime);
+ appendStringInfo(es->str, " user=%.3fms", utime * 1000);
+ 			}
+ 			
+ 			if (usage->ru_inblock > 0)
+ 			{
+ double inblock;
+ char *units;
+ int prec;
+ inblock = normalize_memory((double)usage->ru_inblock * 512, &units, &prec);
+ appendStringInfo(es->str, " read=%.*f%s", prec, inblock, units);
+ 			}
+ 			if (usage->ru_oublock > 0)
+ 			{
+ double oublock;
+ char *units;
+ int prec;
+ oublock = normalize_memory((double)usage->ru_oublock * 512, &units, &prec);
+ appendStringInfo(es->str, " written=%.*f%s", prec, oublock, units);
+ 			}
+ 			if (es->verbose)
+ 			{
+ if (usage->ru_minflt > 0)
+ 	appendStringInfo(es->str, " minflt=%ld", usage->ru_minflt);
+ if (usage->ru_majflt > 0)
+ 	appendStringInfo(es->str, " majflt=%ld", usage->ru_majflt);
+ if (usage->ru_nswap > 0)
+ 	appendStringInfo(es->str, " nswap=%ld", usage->ru_nswap);
+ if (usage->ru_msgsnd > 0)
+ 	appendStringInfo(es->str, " msgsnd=%ld", usage->ru_msgsnd);
+ if (usage->ru_msgrcv > 0)
+ 	appendStringInfo(es->str, " msgrcv=%ld", usage->ru_msgrcv);
+ if (usage->ru_nsignals > 0)
+ 	appendStringInfo(es->str, " nsignals=%ld", usage->ru_nsignals);
+ if (usage->ru_nvcsw > 0)
+ 	appendStringInfo(es->str, " nvcsw=%ld", usage->ru_nvcsw);
+ if (usage->ru_nivcsw > 0)
+ 	appendStringInfo(es->str, " nivcsw=%ld", usage->ru_nivcsw);
+ 			}
+ 			appendStringInfoChar(es->str, '

Re: [HACKERS] Comparison with "true" in source code

2010-11-14 Thread Itagaki Takahiro

On Mon, Nov 15, 2010 at 11:13, Robert Haas  wrote:
>> I added an additional cleanup to 'header_mode' in ecpg; I changed the type
>> from bool to char to hold 'h' or 'c'. Do you think it is reasonable?
>
> I looked at this but found that part a bit too clever for its own good.
>
> So committed the rest, plus an additional one-line change to psql's
> print.c to avoid making the two accesses to format->wrap_right_pointer
> inconsistent with each other.

Thanks!

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

2010-11-14 Thread Greg Stark

On Sun, Nov 14, 2010 at 11:15 AM, Daniel Farina  wrote:
> SUM(int2) => int4
> SUM(int4) => int8
> SUM(int8) => numeric
>
> Some weaknesses:
>
> SUM, of any precision, assumes that the precision being accumulated
> into (which is also the return-precision) is enough to avoid overflow.
> This is generally the case, but there's no reason why it *must* be
> true. I'm especially looking at the int2 to int4 conversion. One could
> imagine a more interesting scenario where overflow behavior could
> occur much more easily.

Fwiw I think he's right that sum(int2) should perhaps be redefined to
return int8. As it stands all it would take is a 64k rows to
potentially overflow. It's not super likely but it is plausible and
the performance penalty to use int8 wouldn't be super big either.

int4 doesn't seem like as realistic a problem since it would take 4
billion rows and the performance penalty for using numeric would be
much higher.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comparison with "true" in source code

On Wed, Nov 3, 2010 at 9:45 PM, Itagaki Takahiro
 wrote:
> On Wed, Nov 3, 2010 at 2:19 AM, Michael Meskes  wrote:
>> On Mon, Nov 01, 2010 at 12:17:02PM +0900, Itagaki Takahiro wrote:
>>> There are some "== true" in the codes, but they might not be safe
>>> because all non-zero values are true in C. Is it worth cleaning up them?
>
> Here is a proposed cleanup that replaces "boolean == true" with "boolean".
> I didn't touch "== false" unless they are not in pairs of comparisons
> with true because comparison with false is a valid C code.
>
> Note that I also changed "boolean != true" in pg_upgrade,
> but I didn't change ones in xlog.c because it might check
> corrupted fields in control files.
>
>>> src/interfaces/ecpg/preproc/ecpg.c(310):
>>>    ptr2ext[3] = (header_mode == true) ? 'h' : 'c';
>> I actually see no reason why these variables are not defined as bool instead 
>> of
>> int, so I changed this. Hopefully I found all of them.
>
> I added an additional cleanup to 'header_mode' in ecpg; I changed the type
> from bool to char to hold 'h' or 'c'. Do you think it is reasonable?

I looked at this but found that part a bit too clever for its own good.

So committed the rest, plus an additional one-line change to psql's
print.c to avoid making the two accesses to format->wrap_right_pointer
inconsistent with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

On Sun, Nov 14, 2010 at 12:13 PM, Tom Lane  wrote:
> I wrote:
>> I still think it's worth looking into whether the bug can be dodged
>> by shortening the eval calls.
>
> In fact, that does seem to work; I'll commit a patch after testing a
> bit more.
>
> We still need someone to add the missing build dependencies so that
> make -j is trustworthy again.

Yes, please.  This is currently failing for me:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -Werror  -bundle -multiply_defined suppress -o
ascii_and_mic.so ascii_and_mic.o -L../../../../../../src/port
-L/opt/local/lib -Wl,-dead_strip_dylibs  -Werror  -bundle_loader
../../../../../../src/backend/postgres^M
ld: file not found: ../../../../../../src/backend/postgres
collect2: ld returned 1 exit status
make[3]: *** [ascii_and_mic.so] Error 1
make[2]: *** [all-ascii_and_mic-recurse] Error 2
make[1]: *** [all-backend/utils/mb/conversion_procs-recurse] Error 2
make[1]: *** Waiting for unfinished jobs

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls

On Sun, Nov 14, 2010 at 7:27 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> It might be even better to mention that the reason why we couldn't
>> forward the fsync request is that the fsync request queue is full.
>> I'm not sure exactly how to phrase that.  I thought about:
>
>> fsync request queue is full
>
>> But that seems not to answer the "so what" question.  There is an
>> example like this in the docs:
>
>> could not forward fsync request (fsync request queue is full)
>
>> ...but I'm not sure I like that.
>
> Well, that example is meant to cover cases where you have to assemble a
> couple of independently created phrases.  In this case I'd suggest
>        could not forward fsync request because request queue is full

Sounds good to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls

On Sun, Nov 14, 2010 at 7:19 PM, Greg Smith  wrote:
>> But if this is generating a lot of log data or adding a lot of
>> overhead, then you have bigger problems anyway:
>>
>> +               elog(DEBUG1, "Unable to forward fsync request, executing
>> directly");
>>
>
> The argument against this log line even existing is that if the field is
> added to pg_stat_bgwriter, that's probably how you want to monitor this data
> anyway.

I'll remove it if you really want it gone, but personally I think it's
useful to have.  I've more than once had to debug a problem given a
PostgreSQL log file with the debug level cranked up and not a whole
lot else.  Rare events that cause performance to tank are worth
logging, IMHO.

> I started out touching code that called it just "sync", but then crossed to
> other code that called it "fsync", and made the external UI use that name.
>  No objections to sorting that out within my patch so it's consistent.

OK, I'll do that before I commit it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+


Marti Raudsepp wrote:

PostgreSQL's default settings change when built with Linux kernel
headers 2.6.33 or newer. As discussed on the pgsql-performance list,
this causes a significant performance regression:
http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php

NB! I am not proposing to change the default -- to the contrary --
this patch restores old behavior.


Following our standard community development model, I've put this patch 
onto our CommitFest list:  
https://commitfest.postgresql.org/action/patch_view?id=432 and assigned 
myself as the reviewer.  I didn't look at this until now because I 
already had some patch development and review work to finish before the 
CommitFest deadline we just crossed.  Now I can go back to reviewing 
other people's work.


P.S. There is no pgsql-patch list anymore; everything goes through the 
hackers list now.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest 2010-11: Call for Reviewers

Despite some jerk submitting three brand new patches at the last minute 
by himself, the November CommitFest is now closed for new submissions 
and ready for review!  Only 7 of the 40 patches that need review have a 
reviewer assigned so far.  Reviewers, mark down a patch you want to look 
at in the CommitFest app:  
https://commitfest.postgresql.org/action/commitfest_view?id=8 or ask to 
be assigned one on the pgsql-rrreviewers list, if you want to work on 
something but don't have a preference for which patch.  There are a few 
larger patches in this round, but the majority of those already have a 
reviewer attached to them.  Many of the remaining patches are pretty 
approachable even if you're relatively new to reviewing PostgreSQL 
code.  See the usual for details:


http://wiki.postgresql.org/wiki/Reviewing_a_Patch
http://wiki.postgresql.org/wiki/RRReviewers

--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_bgwriter broken?

Greg Smith  writes:
> The way you've explained it now, I see why it's not worth adding a 
> specific regression test for this in the future, and what else I should 
> have checked when I ran into my problem (that other system stats views 
> were also working or not).

FWIW, I wouldn't object to a test that somehow exercised the specific
functionality of the view; I was just dubious that there was any value
in the sort of test you were suggesting.  We found ways to exercise the
table-counter stuff in a sufficiently reproducible way for the stats
test --- can you do something similar here?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls

Robert Haas  writes:
> It might be even better to mention that the reason why we couldn't
> forward the fsync request is that the fsync request queue is full.
> I'm not sure exactly how to phrase that.  I thought about:

> fsync request queue is full

> But that seems not to answer the "so what" question.  There is an
> example like this in the docs:

> could not forward fsync request (fsync request queue is full)

> ...but I'm not sure I like that.

Well, that example is meant to cover cases where you have to assemble a
couple of independently created phrases.  In this case I'd suggest
could not forward fsync request because request queue is full
or, if you think there might sometime be a need to have a strerror
variant, ie
could not forward fsync request: %m
then maybe this would make the most sense:
could not forward fsync request: request queue is full

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls


Robert Haas wrote:

I think this one could be removed:

+   if (n > 0)
+   elog(DEBUG1,"Absorbing %d fsync requests",n);
  


Right; that one is just there to let you know the patch is working, and 
how much the background writer does for you per pass, mainly for the 
purpose of reviewing the patch.




But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+   elog(DEBUG1, "Unable to forward fsync request, executing 
directly");
  


The argument against this log line even existing is that if the field is 
added to pg_stat_bgwriter, that's probably how you want to monitor this 
data anyway.  I don't think there's actually much value to it, which is 
one reason I didn't worry too much about matching style guidelines, 
translation, etc.  You've found the two things that I think both need to 
go away before commit, but I left them in because I think they're 
valuable for testing the patch itself does what it's supposed to.



Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the "f" in some places?
  


I started out touching code that called it just "sync", but then crossed 
to other code that called it "fsync", and made the external UI use that 
name.  No objections to sorting that out within my patch so it's consistent.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instrument checkpoint sync calls


Magnus Hagander wrote:

I think that it's reasonable for the sort of people who turn log_checkpoints
on to also get the sync summary line, thus it being logged at LOG level.



In that case, wouldn't it make more sense to add a couple of more
fields to the existing line? Easier to post-process the logfile that
way...
  


It might.  One trade-off is that if you're looking at the sync write 
detail, the summary comes out in a similar form.  And it was easy to put 
in here--I'd have to return some new data out of the sync phase call in 
order for that to show up in the main log.  If there's general buy-in on 
the idea, I could do all of that.




I personally think that if you're
already making an fsync call and have log_checkpoints on, the additional
overhead of also timing that fsync is minimal even on platforms where timing
is slow (I don't have such a system to test that assumption however)...

It sounds like it should be - but that should be possible to measure, no?
  


What I was alluding to is that I know gettimeofday executes fast on my 
Linux system here, so even if I did measure the overhead and showed it's 
near zero that doesn't mean it will be so on every platform.  The "how 
long does it take to find out the current time on every supported 
PostgreSQL platform?" question is one I'd like to have an answer to, but 
it's hard to collect properly.  All I know is that I don't have any 
system where it's slow to properly test again here.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us

Re: [HACKERS] pg_stat_bgwriter broken?


Tom Lane wrote:

If the cause was what I suggested, most likely the other stats views
were broken too --- were you able to pass the regression tests in that
state?
  


I was having a lot of problems with the system that included regression 
test issues, and yelped about this one earlier than I normally would 
because it was blocking work I wanted to get submitted today.  Sorry 
about the noise, and thanks for the info about why this was happening.  
The way you've explained it now, I see why it's not worth adding a 
specific regression test for this in the future, and what else I should 
have checked when I ran into my problem (that other system stats views 
were also working or not).


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Spread checkpoint sync

Final patch in this series for today spreads out the individual 
checkpoint fsync calls over time, and was written by myself and Simon 
Riggs.  Patch is based against a system that's already had the two 
patches I sent over earlier today applied, rather than HEAD, as both are 
useful for measuring how well this one works.  You can grab a tree with 
all three from my Github repo, via the "checkpoint" branch: 
https://github.com/greg2ndQuadrant/postgres/tree/checkpoint


This is a work in progress.  While I've seen this reduce checkpoint 
spike latency significantly on a large system, I don't have any 
referencable performance numbers I can share yet.  There are also a 
couple of problems I know about, and I'm sure others I haven't thought 
of yet  The first known issues is that it delays manual or other 
"forced" checkpoints, which is not necessarily wrong if you really are 
serious about spreading syncs out, but it is certainly surprising when 
you run into it.  I notice this most when running createdb on a busy 
system.  No real reason for this to happen, the code passes that it's a 
forced checkpoint down but just doesn't act on it yet.


The second issue is that the delay between sync calls is currently 
hard-coded, at 3 seconds.  I believe the right path here is to consider 
the current checkpoint_completion_target to still be valid, then work 
back from there.  That raises the question of what percentage of the 
time writes should now be compressed into relative to that, to leave 
some time to spread the sync calls.  If we're willing to say "writes 
finish in first 1/2 of target, syncs execute in second 1/2", that I 
could implement that here.  Maybe that ratio needs to be another 
tunable.  Still thinking about that part, and it's certainly open to 
community debate.  The thing to realize that complicates the design is 
that the actual sync execution may take a considerable period of time.  
It's much more likely for that to happen than in the case of an 
individual write, as the current spread checkpoint does, because those 
are usually cached.  In the spread sync case, it's easy for one slow 
sync to make the rest turn into ones that fire in quick succession, to 
make up for lost time.


There's some history behind this design that impacts review.  Circa 8.3 
development in 2007, I had experimented with putting some delay between 
each of the fsync calls that the background writer executes during a 
checkpoint.  It didn't help smooth things out at all at the time.  It 
turns out that's mainly because all my tests were on Linux using ext3.  
On that filesystem, fsync is not very granular.  It's quite likely it 
will push out data you haven't asked to sync yet, which means one giant 
sync is almost impossible to avoid no matter how you space the fsync 
calls.  If you try and review this on ext3, I expect you'll find a big 
spike early in each checkpoint (where it flushes just about everything 
out) and then quick response for the later files involved.


The system this patch originated to help fix was running XFS.  There, 
I've confirmed that problem doesn't exist, that individual syncs only 
seem to push out the data related to one file.  The same should be true 
on ext4, but I haven't tested that myself.  Not sure how granular the 
fsync calls are on Solaris, FreeBSD, Darwin, etc. yet.  Note that it's 
still possible to get hung on one sync call for a while, even on XFS.  
The worst case seems to be if you've created a new 1GB database table 
chunk and fully populated it since the last checkpoint, on a system 
that's just cached the whole thing so far.


One change that turned out be necessary rather than optional--to get 
good performance from the system under tuning--was to make regular 
background writer activity, including fsync absorb checks, happen during 
these sync pauses.  The existing code ran the checkpoint sync work in a 
pretty tight loop, which as I alluded to in an earlier patch today can 
lead to the backends competing with the background writer to get their 
sync calls executed.  This squashes that problem if the background 
writer is setup properly.


What does properly mean?  Well, it can't do that cleanup if the 
background writer is sleeping.  This whole area was refactored.  The 
current sync absorb code uses the constant WRITES_PER_ABSORB to make 
decisions.  This new version replaces that hard-coded value with 
something that scales to the system size.  It now ignores doing work 
until the number of pending absorb requests has reached 10% of the 
number possible to store (BgWriterShmem->max_requests, which is set to 
the size of shared_buffers in 8K pages, AKA NBuffers).  This may 
actually postpone this work for too long on systems with large 
shared_buffers settings; that's one area I'm still investigating.


As far as concerns about this 10% setting not doing enough work, which 
is something I do see, you can always increase how often absorbing 
happens by decreasing bgwr

Re: [HACKERS] Count backend self-sync calls

On Sun, Nov 14, 2010 at 6:10 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> With those changes, I think this is committable, and will do so,
>> barring objections.
>
> Obey message style guidelines, please, especially if you're going
> to promote any of those to ereports.

The only new message would be the one Greg has as:

Unable to forward fsync request, executing directly

For that, we could just go with:

could not forward fsync request

(Lower case, avoid use of unable, telegram style with program as
implicit subject.)

It might be even better to mention that the reason why we couldn't
forward the fsync request is that the fsync request queue is full.
I'm not sure exactly how to phrase that.  I thought about:

fsync request queue is full

But that seems not to answer the "so what" question.  There is an
example like this in the docs:

could not forward fsync request (fsync request queue is full)

...but I'm not sure I like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SSI update

2010-11-14 Thread David Fetter

That it's not ready for commit this minute does not mean that it
shouldn't be in the CF this month.  Delaying the first review of the
patch until the next CF pretty much ensures that we'll miss 9.1 with
it, so please add to the current CF :)

Cheers,
David.
On Sat, Nov 13, 2010 at 02:15:40PM -0600, Kevin Grittner wrote:
> Circumstances have conspired to leave me with very little time to
> work on the SSI patch during the last few weeks.  I'm still convinced
> that the work mentioned in this post is necessary to have a
> commit-quality patch:
>  
> http://archives.postgresql.org/pgsql-hackers/2010-10/msg01754.php
>  
> I also think it's going to be very desirable to convert the conflict
> pointers used in the papers (and in the patch so far) to lists, to
> eliminate one source of false positives and allow more aggressive
> clean-up of transactions.  I we implement the logic from the above
> post and *then* convert the pointers to lists, it seems like more
> work than implementing the lists first.  Therefore, I'm planning on
> doing the lists first.
>  
> Anyone who thinks that's a bad idea, please speak up soon, as I'm
> starting coding on that today.
>  
> There's no way there will be a patch implementing this in time for
> the 2010-11 CF.  A couple months ago there were a couple people who
> said they'd be willing to look at this between CFs, so I'm hoping
> that their schedules still permit them to do that after this
> unfortunate delay.
>  
> -Kevin
> 
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread David Fetter

On Sun, Nov 14, 2010 at 11:02:08PM +0100, Yeb Havinga wrote:
> On 2010-11-14 21:06, Marko Tiikkaja wrote:
> >On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote:
> >>On 2010-11-14 19:35, Robert Haas wrote:
> >>>On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
> >>>   wrote:
> In my opinion, all of these should have the same effect:
> DELETE all rows
> from "foo".  Any other option means we're going to have
> trouble predicting
> how a query is going to behave.
> >>>I think it's clear that's the only sensible behavior.
> >>What if CTE's ever get input parameters?
> >
> >What about input parameters?
> With input parameters there is a clear link between a CTE and a
> caller. If a CTE is called more than once, it must be executed more
> than once, e.g. (notation t:x means cte has parameter x)
> 
> WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *)
> SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2));
> runs the cte two times, hence two new rows in foo.

I think we can worry about that if we ever have run-time functions
done as WITH, but I think they'd be a *much* better fit for DO.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-14 Thread Andrew Dunstan




On 11/14/2010 05:15 PM, Tom Lane wrote:

Josh is ignoring the proposal that is on the table and seems actually
workable, which is to consult the visibility map during index-only
scans.  For mostly-static tables this would save trips to the heap for
very little extra I/O.  The hard part is to make the VM reliable, but
that is not obviously harder than making separately-stored hint bits
reliable.


I thought we had agreement in the past that this was the way we should 
proceed.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls

Robert Haas  writes:
> With those changes, I think this is committable, and will do so,
> barring objections.

Obey message style guidelines, please, especially if you're going
to promote any of those to ereports.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Count backend self-sync calls

On Sun, Nov 14, 2010 at 5:07 PM, Greg Smith  wrote:
> The patch also adds some logging to the innards involved here, to help with
> understanding problems in this area.  I don't think that should be in the
> version committed as is.  May want to drop the logging level or make it
> disabled in regular builds, since it is sitting somewhere it generates a lot
> of log data and adds overhead.

I think this one could be removed:

+   if (n > 0)
+   elog(DEBUG1,"Absorbing %d fsync requests",n);

But if this is generating a lot of log data or adding a lot of
overhead, then you have bigger problems anyway:

+   elog(DEBUG1, "Unable to forward fsync request, executing 
directly");

I'm inclined to change that to an ereport(), but otherwise it seems
OK.  Also, how about referring to this as buffers_backend_fsync
consistently throughout, instead of dropping the "f" in some places?

With those changes, I think this is committable, and will do so,
barring objections.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

Daniel Farina  writes:
> There are other ways one might be able to attack the performance part
> of the problem, but consider the loss of information about the type
> from int(2|4|8) to numeric when composing a series of sums: we know
> the value produced fits the abstract notion of an Integer, but we lose
> that information. The same could be said of SUM(x::numeric(1000,0))
> yielding an unrestricted numeric, rather than one of scale 0. Not only
> is there more information available to the user, but the optimizer
> should be able to benefit from that information as well. However, for
> an arbitrary user-defined operator to take advantage it would seem to
> me that there needs to be a hook where some reasoning can take place
> over its input types and subsequently determine the return prototype
> at that call site.

Yeah, this has been discussed before.  The problem is that it's too much
work for too little reward.  There are too many different possible
behaviors for functions, and nobody is going to go through and write
a subtype-inference helper function for every function in the system,
or even any usefully large fraction of the functions in the system.
What's more, the subtype definitions have largely been predetermined
for us by SQL, and predetermined in such a way that knowing the subtype
isn't really all that exciting.  Is anybody willing to put in
months of work to teach the system that sum(numeric(7,0)) yields
numeric(something,0) rather than numeric(something,something)?
I doubt it.  The return on that knowledge would be too small, and
there are too many cases where you couldn't deduce anything anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

Greg Stark  writes:
> On Sun, Nov 14, 2010 at 8:52 PM, Josh Berkus  wrote:
>> For example, imagine if the hint bits were moved to a separate per-table
>> bitmap outside the table instead of being stored with each row, as the
>> current FSM is.

> How many times do we have to keep going around the same block?

> We *already* have separate bitmap outside the table for transaction
> commit bits. It's the clog.

> The only reason the hint bits exist is to cache that so we don't need
> to do extra I/O to check tuple visibility. If the hint bits are moved
> outside the table then they serve no purpose whatsover. Then you have
> an additional I/O to attempt to save an additional I/O.

Well, not quite.  The case this could improve is index-only scans:
you could go (or so he hopes) directly from the index to the hint
bits given the TID stored by the index.  A clog lookup is not possible
without XMIN & XMAX, which we do not keep in index entries.

But I'm just as skeptical as you are about this being a net win.
It'll pessimize too much other stuff.

Josh is ignoring the proposal that is on the table and seems actually
workable, which is to consult the visibility map during index-only
scans.  For mostly-static tables this would save trips to the heap for
very little extra I/O.  The hard part is to make the VM reliable, but
that is not obviously harder than making separately-stored hint bits
reliable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Instrument checkpoint sync calls

2010-11-14 Thread Magnus Hagander

On Sun, Nov 14, 2010 at 22:37, Greg Smith  wrote:
> Attached patch adds some logging for each individual fsync call made during
> a checkpoint, along with a summary at the end.  You need to have the
> following to see all of the detail:
>
> log_checkpoints=on
> log_min_messages=debug1
>
> And here's a sample:
>
> LOG:  checkpoint starting: immediate force wait
> DEBUG:  checkpoint sync: file=1 time=1.946000 msec
> DEBUG:  checkpoint sync: file=2 time=0.666000 msec
> DEBUG:  checkpoint sync: file=3 time=0.004000 msec
> LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
> LOG:  checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log file(s)
> added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, total=0.003 s
>
> I think that it's reasonable for the sort of people who turn log_checkpoints
> on to also get the sync summary line, thus it being logged at LOG level.

In that case, wouldn't it make more sense to add a couple of more
fields to the existing line? Easier to post-process the logfile that
way...


>  The detail on individual syncs might go to DEBUG2 or lower instead of
> DEBUG1 where I put it, that part I don't have a strong opinion on.  It's at
> DEBUG1 to make testing the patch without a gigantic amount of log data also
> coming in easier.
>
> Right now the code is written such that all the calls that grab timing
> information are wrapped around "ifdef DEBUG_FSYNC", which is a variable set
> to 1 that could be a compile-time option like DEBUG_DEADLOCK, to allow
> turning this code path off at build time.  I personally think that if you're
> already making an fsync call and have log_checkpoints on, the additional
> overhead of also timing that fsync is minimal even on platforms where timing
> is slow (I don't have such a system to test that assumption however).  And
> I've seen enough value in troubleshooting nasty checkpoint sync problems
> using this patch to feel it's worth having even if it does add some
> overhead.

It sounds like it should be - but that should be possible to measure, no?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

On Sun, Nov 14, 2010 at 1:25 PM, Robert Haas  wrote:
> Like Tom, I'm not sure this is really a type-system problem.  This
> sounds like a complaint that operations on "numeric" are much slower
> than operations on "int4" and "int8", even for values that could be
> represented by either type.  I think that's a valid complaint, but I
> don't see how changing the type system would help.  I think what you'd
> need to is optimize the existing numeric type, or provide a new
> numeric-ish type with optimizations for dealing with
> small-to-medium-sized integers.

There are other ways one might be able to attack the performance part
of the problem, but consider the loss of information about the type
from int(2|4|8) to numeric when composing a series of sums: we know
the value produced fits the abstract notion of an Integer, but we lose
that information. The same could be said of SUM(x::numeric(1000,0))
yielding an unrestricted numeric, rather than one of scale 0. Not only
is there more information available to the user, but the optimizer
should be able to benefit from that information as well. However, for
an arbitrary user-defined operator to take advantage it would seem to
me that there needs to be a hook where some reasoning can take place
over its input types and subsequently determine the return prototype
at that call site.

I am not trying to suggest that there is no way that one could attack
the details around SUM I have mentioned without changing the type
system, only that they are limitations I encountered and worked around
where a more powerful type system came to mind as an avenue for a
simpler/more proper solution.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Peter Eisentraut

On tor, 2010-11-11 at 19:35 +0200, Marko Tiikkaja wrote:
> I apologize, I had misunderstood what you are suggesting.  But now  
> that I do, it seems to be an even worse idea to go your way.  Based on  
> my research, I'm almost certain that the SQL standard says that the  
> execution order is deterministic if there is at least one DML  
> statement in the WITH list.
> 
> Can anyone confirm this?

SQL:2008 doesn't allow any DML in the WITH list.

SQL:2011 has the "combined data store and retrieval" feature that was
discussed in another thread which basically implements the same thing.
They apparently avoid the whole issue by allowing only one data change
delta table per query.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Count backend self-sync calls

The attached patch adds a new field to pg_stat_bgwriter, counting the 
number of times backends execute their own fsync calls.  Normally, when 
a backend needs to fsync data, it passes a request to the background 
writer, which then absorbs the call into its own queue of work to do.  
However, under some types of heavy system load, the associated queue can 
fill.  When this happens, backends are forced to do their own fsync 
call.  This is potentially much worse than when they do a regular write.


The really nasty situation is when the background writer is busy because 
it's executing a checkpoint.  In that case, it's possible for the 
backend fsync calls to start competing with the ones the background 
writer is trying to get done, causing the checkpoint sync phase to 
execute slower than it should.  I've seen the sync phase take over 45 
minutes on a really busy server once it got into this condition, where 
hundreds of clients doing their own backend fsync calls were fighting 
against the checkpoint fsync work.  With this patch, you can observe 
that happening as an upwards spike in 
pg_stat_bgwriter.buffers_backend_sync, which as documented is an 
inclusive subset of the total shown in buffers_backend.


While it takes a busier system than I can useful show how to simulate 
here to show a really bad situation, I'm able to see some of these 
unabsorbed backend fsync calls when initializing a pgbench database, to 
prove they happen in the lab.  The attached test program takes as its 
input a pgbench scale counter.  It then creates a pgbench database 
(deleting any existing pgbench database, so watch out for that) and 
shows the values accumulated in pg_stat_bgwriter during that period.  
Here's an example, using the script's default scale of 100 on a server 
with 8GB of RAM and fake fsync (the hard drives are lying about it):


-[ RECORD 1 ]+-
now  | 2010-11-14 16:08:41.36421-05
...
Initializing pgbench
-[ RECORD 1 ]+--
now  | 2010-11-14 16:09:46.713693-05
checkpoints_timed| 0
checkpoints_req  | 0
buffers_checkpoint   | 0
buffers_clean| 0
maxwritten_clean | 0
buffers_backend  | 654716
buffers_backend_sync | 90
buffers_alloc| 803

This is with default sizing for memory structures.  As you increase 
shared_buffers, one of the queues involved here increases 
proportionately, making it less likely to run into this problem.  That 
just changes it to the kind of problem I've only seen on a larger system 
with a difficult to simulate workload.  The production system getting 
hammered with this problem (running a web application) that prompted 
writing the patch had shared_buffers=4GB at the time.


The patch also adds some logging to the innards involved here, to help 
with understanding problems in this area.  I don't think that should be 
in the version committed as is.  May want to drop the logging level or 
make it disabled in regular builds, since it is sitting somewhere it 
generates a lot of log data and adds overhead.  It is nice for now, as 
it lets you get an idea how much fsync work *is* being absorbed by the 
BGW, as well as showing what relation is suffering from this issue.  
Example of both those things, with the default config for everything 
except log_checkpoints (on) and log_min_messages (debug1):


DEBUG:  Absorbing 4096 fsync requests
DEBUG:  Absorbing 150 fsync requests
DEBUG:  Unable to forward fsync request, executing directly
CONTEXT:  writing block 158638 of relation base/16385/16398

Here 4096 is the most entries the BGW will ever absorb at once, and all 
90 of the missed sync calls are logged so you can see what files they 
came from.


As a high-level commentary about this patch, I'm not sure what most end 
users will ever do with this data.  At the same time, I wasn't sure what 
a typical end user would do with anything else in pg_stat_bgwriter 
either when it was added, and it turns out the answer is "wait for 
people who know the internals to write things that monitor it".  For 
example, Magnus has updated recent versions of the Munin plug-in for 
PostgreSQL to usefully graph pg_stat_bgwriter data over time.  As I have 
some data to suggest checkpoint problems on Linux in particular are 
getting worse as total system memory increases, I expect that having a 
way to easily instrument for this particular problem will be 
correspondingly more important in the future too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us


diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dbf966b..f701b8f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -264,8 +264,10 @@ postgres: user database host 
  
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 09574c3..8766cc

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Yeb Havinga


On 2010-11-14 21:06, Marko Tiikkaja wrote:

On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote:

On 2010-11-14 19:35, Robert Haas wrote:

On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
   wrote:
In my opinion, all of these should have the same effect: DELETE all 
rows
from "foo".  Any other option means we're going to have trouble 
predicting

how a query is going to behave.

I think it's clear that's the only sensible behavior.

What if CTE's ever get input parameters?


What about input parameters?
With input parameters there is a clear link between a CTE and a caller. 
If a CTE is called more than once, it must be executed more than once, 
e.g. (notation t:x means cte has parameter x)


WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *)
SELECT (SELECT * FROM t(1)), (SELECT * FROM t(2));
runs the cte two times, hence two new rows in foo.

But what about
WITH t:x AS (INSERT INTO foo VALUES(x) RETURNING *)
SELECT (SELECT t(1)), (SELECT t(1));
it would be strange to expect a single row in foo here, since the only 
thing different from the previous query is a constant value.


Though I like the easyness of "run exactly once" for uncorrelated cte's, 
I still have the feeling that it somehow mixes the expression and 
operational realm. In logic there's a difference between a proposition 
and an assertion. With "run exactly once", stating a proposition is made 
synonymous to asserting it. That makes syntactic operations or rewriting 
of writable CTEs hard, if not impossible. For instance, variable 
substitution in the second example makes a CTE without parameters:

WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *),
t'' AS AS (INSERT INTO foo VALUES(1) RETURNING *),
SELECT (SELECT t'), (SELECT t'');

since t' and t'' are equal,

WITH t' AS (INSERT INTO foo VALUES(1) RETURNING *)
SELECT (SELECT t'), (SELECT t');

A syntactic operation like this on the query should not result in a 
different operation when it's run. Hence two new rows in foo are still 
expected, but the "run exactly once" dictates one new row for that query.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

Robert Haas  writes:
> On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga  wrote:
>> What if CTE's ever get input parameters?

> Then they'd be functions, which we already have.

If you mean something like

prepare foo(int) as
with x as (delete from tab where id = $1 returning *)
insert into log_table select * from x;

I don't see that the parameter makes things any less well-defined.

If you mean a parameter in the sense of an executor parameter passed
in from a surrounding nestloop, that'd scare me too --- but I thought
we were going to disallow wCTEs except at the top level of a query,
so the case wouldn't arise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-14 Thread Greg Stark

On Sun, Nov 14, 2010 at 8:52 PM, Josh Berkus  wrote:
> For example, imagine if the hint bits were moved to a separate per-table
> bitmap outside the table instead of being stored with each row, as the
> current FSM is.

How many times do we have to keep going around the same block?

We *already* have separate bitmap outside the table for transaction
commit bits. It's the clog.

The only reason the hint bits exist is to cache that so we don't need
to do extra I/O to check tuple visibility. If the hint bits are moved
outside the table then they serve no purpose whatsover. Then you have
an additional I/O to attempt to save an additional I/O.

The only difference between the clog and your proposal is that the
clog is two bits per transaction and your proposal is 4 bits per
tuple. The per-tuple idea guarantees that the extra I/O will be very
localized which isn't necessarily true for the clog but the clog is
small enough that it probably is true anyways. And even if there's no
I/O the overhead to consult the clog/per-table fork in memory is
probably significant.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread David Fetter

On Sun, Nov 14, 2010 at 08:07:22PM +0200, Marko Tiikkaja wrote:
> On 2010-11-14 8:01 PM +0200, I wrote:
> >In my opinion, all of these should have the same effect: DELETE all rows
> >from "foo".
> 
> Since the example wasn't entirely clear on this one: in my opinion
> the DML should also only be executed once.  So:
> 
> WITH t AS (INSERT INTO foo VALUES (0) RETURNING *)
> SELECT 1 FROM t t1, t t2;
> 
> would only insert one row in any case.

Right :)

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Instrument checkpoint sync calls

Attached patch adds some logging for each individual fsync call made 
during a checkpoint, along with a summary at the end.  You need to have 
the following to see all of the detail:


log_checkpoints=on
log_min_messages=debug1

And here's a sample:

LOG:  checkpoint starting: immediate force wait
DEBUG:  checkpoint sync: file=1 time=1.946000 msec
DEBUG:  checkpoint sync: file=2 time=0.666000 msec
DEBUG:  checkpoint sync: file=3 time=0.004000 msec
LOG:  checkpoint sync: files=3 longest=1.946000 msec average=0.872000 msec
LOG:  checkpoint complete: wrote 3 buffers (0.1%); 0 transaction log 
file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.002 s, 
total=0.003 s


I think that it's reasonable for the sort of people who turn 
log_checkpoints on to also get the sync summary line, thus it being 
logged at LOG level.  The detail on individual syncs might go to DEBUG2 
or lower instead of DEBUG1 where I put it, that part I don't have a 
strong opinion on.  It's at DEBUG1 to make testing the patch without a 
gigantic amount of log data also coming in easier.


Right now the code is written such that all the calls that grab timing 
information are wrapped around "ifdef DEBUG_FSYNC", which is a variable 
set to 1 that could be a compile-time option like DEBUG_DEADLOCK, to 
allow turning this code path off at build time.  I personally think that 
if you're already making an fsync call and have log_checkpoints on, the 
additional overhead of also timing that fsync is minimal even on 
platforms where timing is slow (I don't have such a system to test that 
assumption however).  And I've seen enough value in troubleshooting 
nasty checkpoint sync problems using this patch to feel it's worth 
having even if it does add some overhead.


I'm a little concerned about log_checkpoints changing on me in the 
middle of the execution of a checkpoint, which would cause some problems 
here.  Not sure if that's something I should explicitly code for, given 
that all I think it will do is screw up one of the timing results.  It 
does seem a risk from the last minute self-review I just did of the code.


I'll give a sample program that stresses the system, generating slow 
timing results and other types of bad behavior, along with the next 
patch I submit here shortly.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us


diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 1219fcf..c72da72 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -20,6 +20,7 @@
 
 #include "catalog/catalog.h"
 #include "miscadmin.h"
+#include "portability/instr_time.h"
 #include "postmaster/bgwriter.h"
 #include "storage/fd.h"
 #include "storage/bufmgr.h"
@@ -29,6 +30,11 @@
 #include "utils/memutils.h"
 #include "pg_trace.h"
 
+/*
+ * For now just leave this on all the time, eventually this will
+ * need to be a full compiler flag like DEBUG_DEADLOCK
+ */
+#define DEBUG_FSYNC	1
 
 /* interval for calling AbsorbFsyncRequests in mdsync */
 #define FSYNCS_PER_ABSORB		10
@@ -927,6 +933,15 @@ mdsync(void)
 	PendingOperationEntry *entry;
 	int			absorb_counter;
 
+#ifdef DEBUG_FSYNC
+	/* Statistics on sync times */
+	int processed = 0;
+	instr_time sync_start,sync_end,diff;
+	double elapsed;
+	double longest = 0;
+	double total_elapsed = 0;
+#endif
+
 	/*
 	 * This is only called during checkpoints, and checkpoints should only
 	 * occur in processes that have created a pendingOpsTable.
@@ -1069,9 +1084,32 @@ mdsync(void)
 seg = _mdfd_getseg(reln, entry->tag.forknum,
 			  entry->tag.segno * ((BlockNumber) RELSEG_SIZE),
    false, EXTENSION_RETURN_NULL);
+
+#ifdef DEBUG_FSYNC
+if (log_checkpoints)
+	INSTR_TIME_SET_CURRENT(sync_start);
+#endif
 if (seg != NULL &&
 	FileSync(seg->mdfd_vfd) >= 0)
+{
+
+#ifdef DEBUG_FSYNC
+	if (log_checkpoints)
+	{
+		INSTR_TIME_SET_CURRENT(sync_end);
+		diff=sync_end;
+		INSTR_TIME_SUBTRACT(diff, sync_start);
+		elapsed = (double) INSTR_TIME_GET_MICROSEC(diff) / 1000.0;
+		if (elapsed > longest)
+			longest = elapsed;
+		total_elapsed+=elapsed;
+		processed++;
+		/* Reduce the log level here to DEBUG2 in final patch? */
+		elog(DEBUG1, "checkpoint sync: file=%d time=%f msec", processed, elapsed);
+	}
+#endif
 	break;		/* success; break out of retry loop */
+	}
 
 /*
  * XXX is there any point in allowing more than one retry?
@@ -1113,6 +1151,13 @@ mdsync(void)
 			elog(ERROR, "pendingOpsTable corrupted");
 	}			/* end loop over hashtable entries */
 
+#ifdef DEBUG_FSYNC
+	/* Report on sync performance metrics */
+	if (log_checkpoints && (processed > 0))
+		elog(LOG, "checkpoint sync: files=%d longest=%f msec average=%f msec",
+			processed, longest, total_elapsed / processed);
+#endif
+
 	/* Flag successful completion of mdsync */
 	mdsync_in_progress = false;
 }

-- 
Se

Re: [HACKERS] pg_stat_bgwriter broken?

Greg Smith  writes:
> Tom Lane wrote:
>> Worksforme.  You probably need a full recompile and/or initdb

> Yeah, sorry about the noise.  This went away after some more intensive 
> rebuilding.  I think I still want to add some regression testing of this 
> view as suggested.  If that had been there, I'd have been a lot more 
> confident it was my mistake, and not something that just slipped through 
> without being tested.

[ shrug... ]  That's one of the simplest views in the entire system.
It's very hard to conceive of a bug that would affect it and not other
views, *especially* such a bug that would be exposed by a test that
doesn't actually exercise the functionality, such as you're suggesting
here.

If the cause was what I suggested, most likely the other stats views
were broken too --- were you able to pass the regression tests in that
state?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

On Sun, Nov 14, 2010 at 1:51 PM, Yeb Havinga  wrote:
> On 2010-11-14 19:35, Robert Haas wrote:
>>
>> On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
>>   wrote:
>>>
>>> In my opinion, all of these should have the same effect: DELETE all rows
>>> from "foo".  Any other option means we're going to have trouble
>>> predicting
>>> how a query is going to behave.
>>
>> I think it's clear that's the only sensible behavior.
>
> What if CTE's ever get input parameters?

Then they'd be functions, which we already have.  As Tom recently
pointed out, you can even make them temporary with an explicit pg_temp
schema qualification.  Perhaps someday we'll have lambda-expressions,
but I have no reason to believe that they'll use any of the wCTE
syntax.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

On Sun, Nov 14, 2010 at 2:27 PM, Daniel Farina  wrote:
> On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane  wrote:
>> Daniel Farina  writes:
>>> Here are some weaknesses in the SUM aggregate that run up against the
>>> type system. Maybe they'll help crystallize some discussion:
>>
>>> SUM(int2) => int4
>>> SUM(int4) => int8
>>> SUM(int8) => numeric
>>
>>> Some weaknesses:
>>
>>> SUM, of any precision, assumes that the precision being accumulated
>>> into (which is also the return-precision) is enough to avoid overflow.
>>
>> This is not a flaw of the type system, it's just an implementation
>> choice in the SUM() aggregates.  We could easily have chosen wider
>> accumulation and/or result types.
>
> That's true, but there are downsides to escalating the precision so
> aggressively.
>
> The case I was thinking about in particular involves composition of
> SUM. If one can assume that a relation has int4s and that will never
> overflow an int8 (as is done now), I don't see a great way to optimize
> the following case without special exceptions in the optimizer for
> particular aggregates known a-priori. Here's what would happen now:
>
> SELECT SUM(x::int8)::numeric
> FROM (SELECT SUM(x::int4)::int8 AS x
>      FROM rel
>      GROUP BY y) some_name;
>
> Could be rendered, by this assumption, as:
>
> SELECT SUM(x::int8)::int8
> (same FROM clause)
>
> (Why would anyone write a query like this? Views. Possibly inlined SQL
> UDFs, too.)
>
> This can be measurably faster. It also more properly constrains the
> result type, as numeric can also handle non-integer quantities.
>
> I should have underscored that a positive aspect of having a
> type-class like facility that allows declaration things like this
> hypothetical Integer when backed by concrete types that might support
> a superset of functionality.

Like Tom, I'm not sure this is really a type-system problem.  This
sounds like a complaint that operations on "numeric" are much slower
than operations on "int4" and "int8", even for values that could be
represented by either type.  I think that's a valid complaint, but I
don't see how changing the type system would help.  I think what you'd
need to is optimize the existing numeric type, or provide a new
numeric-ish type with optimizations for dealing with
small-to-medium-sized integers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_bgwriter broken?


Tom Lane wrote:

Worksforme.  You probably need a full recompile and/or initdb


Yeah, sorry about the noise.  This went away after some more intensive 
rebuilding.  I think I still want to add some regression testing of this 
view as suggested.  If that had been there, I'd have been a lot more 
confident it was my mistake, and not something that just slipped through 
without being tested.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_stat_bgwriter broken?

Greg Smith  writes:
> I'm behind on my list mail so maybe this has been mentioned already, but 
> when I just tried pg_stat_bgwriter from a build against today's HEAD I 
> got this:

> pgbench=# select count(*) FROM pg_stat_bgwriter;
> ERROR:  did not find '}' at end of input node

Worksforme.  You probably need a full recompile and/or initdb.
This is a typical symptom when someone adds a field to a parse node
type and you have code or stored rules that haven't been updated.
(Usually though we try to bump catversion when an initdb is needed
because of such a change.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in plpython's Python Generators

2010-11-14 Thread Oleg Bartunov


Thanks, Jan.

I tested the patch and it looks fine.

Oleg
On Sun, 14 Nov 2010, Jan Urbaski wrote:


On 24/10/10 00:32, Jan UrbaЪЪski wrote:

On 21/10/10 20:48, Alvaro Herrera wrote:

... and presumably somebody can fix the real bug that Jean-Baptiste hit,
too.


AFAICS the error comes from PLy_function_handler disconnecting from SPI
after calling into the Python code and then going ahead and reading the
result from the iterator.


Here's a patch with a fix for that bug.

Cheers,
Jan



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: extensible enums

2010-11-14 Thread Josh Berkus


>> I'd say put it on and mark it with an [E].  We could use some more
>> [E]asy items for that list.
> 
> We don't need to add marginally-useful features just because they're
> easy.  If it doesn't have a real use-case, the incremental maintenance
> cost of more code is a good reason to reject it.

I'll bite.

Use-case:

1) DBA adds "Department Role" enum, with set
{'Director','Secretary','Staff','Support','Temporary','Liason'}.

2) 3-person data entry team updates all employee records with those roles.

3) First summary report is run.

4) Manager points out that "Liason" is misspelled.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

Heikki Linnakangas  writes:
> On 13.11.2010 17:07, Tom Lane wrote:
>> Robert Haas  writes:
>>> Come to think of it, I'm not really sure I understand what protects
>>> SetLatch() against memory ordering hazards.  Is that actually safe?
>> 
>> Hmm ... that's a good question.  It certainly *looks* like it could
>> malfunction on machines with weak memory ordering.

> Can you elaborate?

Weak memory ordering means that stores into shared memory initiated by
one processor are not guaranteed to be observed to occur in the same
sequence by another processor.  This implies first that the latch code
could malfunction all by itself, if two processes manipulate a latch at
about the same time, and second (probably much less likely) that there
could be a malfunction involving a process that's waited on a latch not
seeing the shared-memory status updates that another process did "before"
setting the latch.

This is not at all hypothetical --- my first attempt at rewriting the
sinval signaling code, a couple years back, failed on PPC machines in
the buildfarm because of exactly this type of issue.

The quick-and-dirty way to fix this is to attach a spinlock to each
latch, because we already have memory ordering sync instructions in
the spinlock primitives.  Doing better would probably involve developing
a new set of processor-specific primitives --- which would be pretty
easy for the processors where we have gcc inline asm, but it would take
some research for the platforms where we're relying on magic OS-provided
subroutines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

2010-11-14 Thread Darren Duncan


Daniel Farina wrote:

On Sat, Nov 13, 2010 at 7:54 PM, Darren Duncan  wrote:

You don't have to kludge things by implementing arrays as blobs for example;
you can implement them as relations instead.  Geospatial types can just be
tuples. Arrays of structured types can just be relations with an attribute
per type attribute.  Arrays of simple types can just be unary relations.


In practice, my guess is the performance of these approaches would be
terrible for a number of workloads. I don't agree that arrays having
their own storage implementation is a kludge: there are even operators
like unnest that can be used to turn them back into relations.


I'm not discounting this at all.  The data model can formally define types in 
one way and a DBMS can implement various well known cases in specialized ways 
that are more efficient.  Arrays are a good example.  But with other cases they 
still get a default implementation.  If arrays are flexible enough, then 
different arrays could be implemented differently, eg some as blobs and some as 
relations; the former could be better for small or simple arrays and the latter 
for large or complex arrays.



I have long thought (but never really gave voice to) there being value
having first-class relation values, but I would also say that's
another kettle of fish. I also want closures, and don't think that's
completely nuts.


I think that first-class functions are important at least, if not full-blown 
closures.  You can define generic relational restrictions or summaries or 
whatever with such; eg, the general case of a WHERE is a function that takes a 
relation and a function as input, and results in a relation; more restricted 
cases of WHERE can be defined with simpler functions that take 2 relation inputs 
instead.



You may want to learn more about this, first. Postgres's type system,
while relatively simple, is not as ill-conceived or primitive as you
seem to assume it is. There are also a lot of important system
details, like expanding/replacing the typmod facility that only allows
Postgres 32 bits of type-augmenting data (such as the (1) in the type
numeric(1)).


I'm not saying that Pg's system is primitive et al.  The thread starter said it 
needed an overhaul, so indicating there are deficiencies, and I'm suggesting 
some effective ways to do that.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking hint bits WAS: Protecting against unexpected zero-pages: proposal

2010-11-14 Thread Josh Berkus


> If we got rid of hint bits, we'd need workarounds for the ensuing
> massive performance loss.  There is no reason whatsoever to imagine
> that we'd come out ahead in the end.

Oh, there's no question that we need something which serves the same
purpose as the existing hit bits.  But there's a lot of question about
whether our existing implementation is optimal.

For example, imagine if the hint bits were moved to a separate per-table
bitmap outside the table instead of being stored with each row, as the
current FSM is.  Leaving aside the engineering required for this (which
would be considerable, especially when it comes to consistency and
durability), this would potentially allow solutions to the following issues:

* Index-only access
* I/O associated with hint bit setting
* Vacuum freezing old-cold data
* Page-level CRCs
* Rsyncing tables for replication

Alternately, we could attack this by hint bit purpose.  For example, if
we restructured the CLOG so that it was an efficient in-memory index
(yes, I'm being handwavy), then having the XID-is-visible hint bits
might become completely unnecessary.  We could then also improve the
visibility map to be reliable and include "frozen" bits as well.

Overall, what I'm pointing out is that our current implementation of
hint bits is blocking not one by several major features and causing our
users performance pain.  It's time to look for an implementation which
doesn't have the same problems we're familiar with.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_stat_bgwriter broken?

I'm behind on my list mail so maybe this has been mentioned already, but 
when I just tried pg_stat_bgwriter from a build against today's HEAD I 
got this:


pgbench=# select count(*) FROM pg_stat_bgwriter;
ERROR:  did not find '}' at end of input node

Can someone confirm if this broke recently, or is it just me?  Last time 
I would have tested this myself was a few weeks ago.


Regardless, I was thinking of adding some basic sanity checking on this 
view, that runs from the regression tests to catch this class of problem 
in the future.  It's kind of sloppy that this and the bgwriter counter 
reset aren't tested as part of "make check".  I think these two would 
always produce stable results:


postgres=# SELECT count(*) FROM pg_stat_bgwriter;
count
---
1
postgres=# SELECT pg_stat_reset_shared('bgwriter');
pg_stat_reset_shared
--

(1 row)

And the first one of those is similarly broken on my install.  Thoughts 
on whether adding those to the regression tests would be a worthwhile 
patch?  I'll do the work, just thinking out loud about the concept.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] max_wal_senders must die

2010-11-14 Thread Heikki Linnakangas


On 13.11.2010 17:07, Tom Lane wrote:

Robert Haas  writes:

Come to think of it, I'm not really sure I understand what protects
SetLatch() against memory ordering hazards.  Is that actually safe?


Hmm ... that's a good question.  It certainly *looks* like it could
malfunction on machines with weak memory ordering.


Can you elaborate?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Marko Tiikkaja


On 2010-11-14 8:51 PM +0200, Yeb Havinga wrote:

On 2010-11-14 19:35, Robert Haas wrote:

On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
   wrote:

In my opinion, all of these should have the same effect: DELETE all rows
from "foo".  Any other option means we're going to have trouble predicting
how a query is going to behave.

I think it's clear that's the only sensible behavior.

What if CTE's ever get input parameters?


What about input parameters?


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] a new problem in MERGE

Boxuan Zhai wrote:
I have plan to fix the above two bugs together. (in fact, I have
already started coding in merge_v202 edition). My question is how
should I make my update be consistent with yours. Is it possible for
you to give me an edition that I can work on?

I just got this reconciled with HEAD again. There have been two changes
I made in the code you'll eventually want in your working copy:

1) Fix NIL/NULL confusion:
https://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

2) Update ExecMerge to accept and pass through an oldtuple value. This
is needed to make the code compatible with the PostgreSQL git HEAD after
the changes made in the 2009-09 CommitFest. Bit rot updates made:
https://github.com/greg2ndQuadrant/postgres/commit/be03bd201720f42a666f7e356ec8507c1357f502

I'm not sure if how I did (2) is correct for all cases, but at least the
code compiles again now and the server will start. Attached is an
updated patch that applies to HEAD as of right now, and that code has
been pushed to
https://github.com/greg2ndQuadrant/postgres/tree/merge-unstable with the
changes rebased to be the last two commits.

It fails "make installcheck" on my system. But as the initial diffs I
looked at relate to enums and such, I don't think that's a problem with
your patch. Will investigate further here with some of my own patches
I'm working on today. Hopefully this is enough to unblock what you were
looking for more details from me about.

--
Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD

merge-v203-20101114.gz
Description: GNU Zip compressed data

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

On Sun, Nov 14, 2010 at 7:47 AM, Tom Lane  wrote:
> Daniel Farina  writes:
>> Here are some weaknesses in the SUM aggregate that run up against the
>> type system. Maybe they'll help crystallize some discussion:
>
>> SUM(int2) => int4
>> SUM(int4) => int8
>> SUM(int8) => numeric
>
>> Some weaknesses:
>
>> SUM, of any precision, assumes that the precision being accumulated
>> into (which is also the return-precision) is enough to avoid overflow.
>
> This is not a flaw of the type system, it's just an implementation
> choice in the SUM() aggregates.  We could easily have chosen wider
> accumulation and/or result types.

That's true, but there are downsides to escalating the precision so
aggressively.

The case I was thinking about in particular involves composition of
SUM. If one can assume that a relation has int4s and that will never
overflow an int8 (as is done now), I don't see a great way to optimize
the following case without special exceptions in the optimizer for
particular aggregates known a-priori. Here's what would happen now:

SELECT SUM(x::int8)::numeric
FROM (SELECT SUM(x::int4)::int8 AS x
  FROM rel
  GROUP BY y) some_name;

Could be rendered, by this assumption, as:

SELECT SUM(x::int8)::int8
(same FROM clause)

(Why would anyone write a query like this? Views. Possibly inlined SQL
UDFs, too.)

This can be measurably faster. It also more properly constrains the
result type, as numeric can also handle non-integer quantities.

I should have underscored that a positive aspect of having a
type-class like facility that allows declaration things like this
hypothetical Integer when backed by concrete types that might support
a superset of functionality.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Yeb Havinga


On 2010-11-14 19:35, Robert Haas wrote:

On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
  wrote:

In my opinion, all of these should have the same effect: DELETE all rows
from "foo".  Any other option means we're going to have trouble predicting
how a query is going to behave.

I think it's clear that's the only sensible behavior.

What if CTE's ever get input parameters?

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POSIX shared memory redux

On Sun, Nov 14, 2010 at 11:06 AM, Tom Lane  wrote:
> Martijn van Oosterhout  writes:
>> The only real solution seems to me to be to keep a small SysV shared
>> memory segment for the locking and allocate the rest of the shared
>> memory some other way.
>
> Yeah, that's been discussed.  It throws all the portability gains out
> the window.  It might get you out from under the need to readjust a
> machine's SHMMAX setting before you can use a large amount of shared
> memory, but it's not clear that's enough of a win to be worth the
> trouble.

One of the things that would be really nice to be able to do is resize
our shm after startup, in response to changes in configuration
parameters.  That's not so easy to make work, of course, but I feel
like this might be going in the right direction, since POSIX shms can
be resized using ftruncate().

> The other direction that we could possibly go is to find some other way
> entirely of interlocking access to the data directory.  If for example
> we could rely on a file lock held by the postmaster and all backends,
> we could check that instead of having to rely on a shmem behavior.
> The killer objection to that so far is that file locking is unreliable
> in some environments, particularly NFS.  But it'd have some advantages
> too --- in particular, in the NFS context, the fact that the lock is
> visible to would-be postmasters on different machines might be thought
> a huge safety improvement over what we do now.

I've never had a lot of luck making filesystem locks work reliably,
but I don't discount the possibility that I was doing it wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

On Sun, Nov 14, 2010 at 1:01 PM, Marko Tiikkaja
 wrote:
> In my opinion, all of these should have the same effect: DELETE all rows
> from "foo".  Any other option means we're going to have trouble predicting
> how a query is going to behave.

I think it's clear that's the only sensible behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] HOT updates in index-less tables

2010-11-14 Thread Heikki Linnakangas


On 14.11.2010 00:29, Robert Haas wrote:

On Sat, Nov 13, 2010 at 12:13 PM, Tom Lane  wrote:

Hannu Krosing  writes:

On Sat, 2010-11-13 at 10:51 -0500, Tom Lane wrote:

If a table has no indexes, we will always decide that any same-page
update operation is a HOT update, since obviously it isn't modifying
any indexed columns.  But is there any benefit to doing so?



If we do the in-page "mini vacuum" even without HOT, then there should
be no benefit from index-less HOT updates.


AFAICS we do: heap_update marks the page as prunable whether it's a HOT
update or not.  The only difference between treating the update as HOT vs
not-HOT is that if there was more than one HOT update, the intermediate
tuples could be completely reclaimed by page pruning (ie, their line
pointers go away too).  With not-HOT updates, the intermediate line
pointers would have to remain in DEAD state until vacuum, since page
pruning wouldn't know if there were index entries pointing at them.
But that seems like a pretty tiny penalty.


I'm not at all convinced that's a tiny penalty.


Me neither. It's a tiny penalty when you consider one update, but if you 
repeatedly update the same tuple, you accumulate dead line pointers 
until the next real vacuum runs. With HOT updates, you reach a steady 
state where page pruning is all you need. Then again, if you're 
repeatedly updating a row in a table with no indexes, presumably it's a 
very small table or you would create an index on it. And frequently 
autovacuuming a small index is quite cheap too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in plpython's Python Generators

2010-11-14 Thread Marti Raudsepp

On Sun, Oct 24, 2010 at 01:32, Jan Urbański  wrote:
> The error handling in plpython is well-known to be a mess, hence the
> confusing error message that OP got. Another annoying thing is that SPI
> calls are not done in a subtransaction, which means you can't trap
> errors with a try/catch Python construct. I'm currently trying to get
> try/catch to work and might end up cleaning up error handling as well...
> but this can take me some time, so maybe someone should plug this
> particular hole right away.

While on this topic, maybe you can chime in on the pgsql-docs list for
my documentation patches to document the current (broken) behavior?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Marko Tiikkaja


On 2010-11-14 8:01 PM +0200, I wrote:

In my opinion, all of these should have the same effect: DELETE all rows
from "foo".


Since the example wasn't entirely clear on this one: in my opinion the 
DML should also only be executed once.  So:


WITH t AS (INSERT INTO foo VALUES (0) RETURNING *)
SELECT 1 FROM t t1, t t2;

would only insert one row in any case.


Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Marko Tiikkaja


On 2010-11-14 5:28 PM +0200, Hitoshi Harada wrote:

2010/11/14 Marko Tiikkaja:

.. and a wild patch appears.


Could you update wiki on this feature if you think we've reached the consensus?


You're probably referring to
http://archives.postgresql.org/pgsql-hackers/2010-11/msg00660.php
which was unfortunately just me talking too soon.  There still doesn't 
appear to be a consensus on the difference (if any) between these queries:


WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1LIMIT 0; -- unreferenced CTE

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 FROM t LIMIT 0; -- referenced, but not read

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 FROM t LIMIT 1; -- referenced, but only partly read

WITH t AS (DELETE FROM foo RETURNING *)
SELECT 1 FROM t t1, t t2; -- referenced, read multiple times

In my opinion, all of these should have the same effect: DELETE all rows 
from "foo".  Any other option means we're going to have trouble 
predicting how a query is going to behave.


As far as I know, we do have a consensus that the order of execution 
should be an implementation detail, and that the statements should 
always be executed in the exact same snapshot (i.e. no CID bump between).



Also, wrapping up the discussion like pros&  cons on the different
execution models helps not only the advance discussions but also
reviews of this patch.


Do you mean between the "execute in order, bump CID" and "execute in 
whatever order but to completion" behaviours?



Regards,
Marko Tiikkaja

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

I wrote:
> I still think it's worth looking into whether the bug can be dodged
> by shortening the eval calls.

In fact, that does seem to work; I'll commit a patch after testing a
bit more.

We still need someone to add the missing build dependencies so that
make -j is trustworthy again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add label to enum syntax

2010-11-14 Thread David E. Wheeler

On Nov 14, 2010, at 7:42 AM, Andrew Dunstan wrote:

> It's fairly unscientific and inconclusive, and the discussion seems to have 
> died. I think since Tom and I did most of the work on this our voices should 
> count a little louder :-) , so I'm going to go with his suggestion of VALUE, 
> unless there are loud squawks.

+1

Maybe update the doc mentions of "label," then.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in plpython's Python Generators

2010-11-14 Thread Jan Urbański

On 24/10/10 00:32, Jan Urbański wrote:
> On 21/10/10 20:48, Alvaro Herrera wrote:
>> ... and presumably somebody can fix the real bug that Jean-Baptiste hit,
>> too.
> 
> AFAICS the error comes from PLy_function_handler disconnecting from SPI
> after calling into the Python code and then going ahead and reading the
> result from the iterator.

Here's a patch with a fix for that bug.

Cheers,
Jan
*** src/pl/plpython/expected/plpython_setof.out
--- src/pl/plpython/expected/plpython_setof.out	2010-11-14 17:38:39.0 +0100
***
*** 31,36 
--- 31,44 
  		return self.icontent
  return producer(count, content)
  $$ LANGUAGE plpythonu;
+ CREATE FUNCTION test_setof_spi_in_iterator() RETURNS SETOF text AS
+ $$
+ for s in ('Hello', 'Brave', 'New', 'World'):
+ plpy.execute('select 1')
+ yield s
+ plpy.execute('select 2')
+ $$
+ LANGUAGE plpythonu;
  -- Test set returning functions
  SELECT test_setof_as_list(0, 'list');
   test_setof_as_list 
***
*** 107,109 
--- 115,126 
   
  (2 rows)
  
+ SELECT test_setof_spi_in_iterator();
+  test_setof_spi_in_iterator 
+ 
+  Hello
+  Brave
+  New
+  World
+ (4 rows)
+ 
*** src/pl/plpython/plpython.c
--- src/pl/plpython/plpython.c	2010-11-14 17:38:39.0 +0100
***
*** 1000,1012 
  		}
  
  		/*
! 		 * Disconnect from SPI manager and then create the return values datum
! 		 * (if the input function does a palloc for it this must not be
! 		 * allocated in the SPI memory context because SPI_finish would free
! 		 * it).
  		 */
- 		if (SPI_finish() != SPI_OK_FINISH)
- 			elog(ERROR, "SPI_finish failed");
  
  		if (proc->is_setof)
  		{
--- 1000,1010 
  		}
  
  		/*
! 		 * Cannot disconnect from SPI yet, because in case of a SRF returning a
! 		 * Python iterator we will use PyIter_Next() which calls back into
! 		 * Python code, which can contain calls to SPI functions. In that case
! 		 * we will disconnect from SPI after the iterator is exhausted.
  		 */
  
  		if (proc->is_setof)
  		{
***
*** 1064,1074 
--- 1062,1085 
  			(errcode(ERRCODE_DATA_EXCEPTION),
  		  errmsg("error fetching next item from iterator")));
  
+ /* Disconnect from the SPI manager before returning. */
+ if (SPI_finish() != SPI_OK_FINISH)
+ 	elog(ERROR, "SPI_finish failed");
+ 
  fcinfo->isnull = true;
  return (Datum) NULL;
  			}
  		}
  
+ 		/*
+ 		 * Disconnect from SPI manager and then create the return values datum
+ 		 * (if the input function does a palloc for it this must not be
+ 		 * allocated in the SPI memory context because SPI_finish would free
+ 		 * it).
+ 		 */
+ 		if (SPI_finish() != SPI_OK_FINISH)
+ 			elog(ERROR, "SPI_finish failed");
+ 
  		plerrcontext.callback = plpython_return_error_callback;
  		plerrcontext.previous = error_context_stack;
  		error_context_stack = &plerrcontext;
*** src/pl/plpython/sql/plpython_setof.sql
--- src/pl/plpython/sql/plpython_setof.sql	2010-11-14 17:38:39.0 +0100
***
*** 35,40 
--- 35,50 
  return producer(count, content)
  $$ LANGUAGE plpythonu;
  
+ CREATE FUNCTION test_setof_spi_in_iterator() RETURNS SETOF text AS
+ $$
+ for s in ('Hello', 'Brave', 'New', 'World'):
+ plpy.execute('select 1')
+ yield s
+ plpy.execute('select 2')
+ $$
+ LANGUAGE plpythonu;
+ 
+ 
  
  -- Test set returning functions
  SELECT test_setof_as_list(0, 'list');
***
*** 51,53 
--- 61,65 
  SELECT test_setof_as_iterator(1, 'list');
  SELECT test_setof_as_iterator(2, 'list');
  SELECT test_setof_as_iterator(2, null);
+ 
+ SELECT test_setof_spi_in_iterator();

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

Robert Haas  writes:
> On Nov 14, 2010, at 10:44 AM, Tom Lane  wrote:
>> I still think it's worth looking into whether the bug can be dodged
>> by shortening the eval calls.  But if not, I think I'd vote for
>> reverting.  Maybe we could revisit this in a couple of years.

> +1.  The current master branch fails to build on my (rather new) Mac
> with make -j2.

I complained of the same thing, but AFAICS that's not a make bug;
it's a missing build dependency, which could be fixed if we choose to
keep this infrastructure.  It probably ought to be fixed even if we
don't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

On Nov 14, 2010, at 10:44 AM, Tom Lane  wrote:
> I still think it's worth looking into whether the bug can be dodged
> by shortening the eval calls.  But if not, I think I'd vote for
> reverting.  Maybe we could revisit this in a couple of years.

+1.  The current master branch fails to build on my (rather new) Mac with make 
-j2.  I could upgrade my toolchain but it seems like more trouble than it's 
worth, not to mention a possible obstacle to new users and developers.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

2010-11-14 Thread Andrew Dunstan




On 11/14/2010 10:44 AM, Tom Lane wrote:

  And on the fourth hand, what we're buying here is pretty marginal for
developers and of no interest whatever for users.

I still think it's worth looking into whether the bug can be dodged
by shortening the eval calls.  But if not, I think I'd vote for
reverting.  Maybe we could revisit this in a couple of years.


+1

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POSIX shared memory redux

Martijn van Oosterhout  writes:
> The only real solution seems to me to be to keep a small SysV shared
> memory segment for the locking and allocate the rest of the shared
> memory some other way.

Yeah, that's been discussed.  It throws all the portability gains out
the window.  It might get you out from under the need to readjust a
machine's SHMMAX setting before you can use a large amount of shared
memory, but it's not clear that's enough of a win to be worth the
trouble.

The other direction that we could possibly go is to find some other way
entirely of interlocking access to the data directory.  If for example
we could rely on a file lock held by the postmaster and all backends,
we could check that instead of having to rely on a shmem behavior.
The killer objection to that so far is that file locking is unreliable
in some environments, particularly NFS.  But it'd have some advantages
too --- in particular, in the NFS context, the fact that the lock is
visible to would-be postmasters on different machines might be thought
a huge safety improvement over what we do now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System

Daniel Farina  writes:
> Here are some weaknesses in the SUM aggregate that run up against the
> type system. Maybe they'll help crystallize some discussion:

> SUM(int2) => int4
> SUM(int4) => int8
> SUM(int8) => numeric

> Some weaknesses:

> SUM, of any precision, assumes that the precision being accumulated
> into (which is also the return-precision) is enough to avoid overflow.

This is not a flaw of the type system, it's just an implementation
choice in the SUM() aggregates.  We could easily have chosen wider
accumulation and/or result types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Improved parallel make support

Dave Page  writes:
> On Sat, Nov 13, 2010 at 8:13 PM, Peter Eisentraut  wrote:
>> Well, it looks like $(eval) is pretty broken in 3.80, so either we
>> require 3.81 or we abandon this line of thought.

> 3.81 might be a problem for Solaris - unless I pay for a support
> contract from Oracle, I'm not going to get any updates from them,
> which means I'll have to install a custom build. Now that's no biggie
> for me, but it does see to raise the bar somewhat for users that might
> want to build from source.

For another data point, I find make 3.80 in OS X 10.4, while 10.5 and
10.6 have 3.81.  10.4 is certainly behind the curve, but Apple still
seem to be releasing security updates for it.

I was about to draw an analogy to flex -- we are now requiring a version
of flex that's roughly contemporaneous with make 3.81.  However, we
don't require flex to build from a tarball, so on second thought that
situation isn't very comparable.  Moving the goalposts for make would
definitely affect more people.

On the third hand, gmake is very very easy to install: if you're
capable of building Postgres from source, it's hard to believe that
gmake should scare you off.  (I've installed multiple versions on my
ancient HPUX dinosaur, and it's never been any harder than ./configure,
make, make check, make install.)

And on the fourth hand, what we're buying here is pretty marginal for
developers and of no interest whatever for users.

I still think it's worth looking into whether the bug can be dodged
by shortening the eval calls.  But if not, I think I'd vote for
reverting.  Maybe we could revisit this in a couple of years.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] add label to enum syntax

2010-11-14 Thread Andrew Dunstan




On 10/29/2010 01:47 AM, Pavel Golub wrote:

Hello, Alvaro.

You wrote:

AH>  Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010:


Forgot link to poll:
http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/

AH>  Hah, there are 17 votes as of right now, no option is below 23% and no
AH>  option is above 29%.


Yeah, right now 42 votes:

  VALUE 26%
  LABEL 26%
  Just ADD 'newlabel' 24%
  ELEMENT 21%
  MEMBER 2%



It's fairly unscientific and inconclusive, and the discussion seems to 
have died. I think since Tom and I did most of the work on this our 
voices should count a little louder :-) , so I'm going to go with his 
suggestion of VALUE, unless there are loud squawks.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wCTE behaviour

2010-11-14 Thread Hitoshi Harada

2010/11/14 Marko Tiikkaja :
> On 2010-11-12 8:25 PM +0200, I wrote:
>>
>> I'm going to take some time off this weekend to get a patch with this
>> behaviour to the next commitfest.
>
> .. and a wild patch appears.
>
> This is almost exactly the patch from 2010-02 without
> CommandCounterIncrement()s.  It's still a bit rough around the edges and
> needs some more comments, but I'm posting it here anyway.
>
> This patch passes all regression tests, but feel free to try to break it,
> there are probably ways to do that.  This one also has the "always run DMLs
> to completion, and exactly once" behaviour.
>

Could you update wiki on this feature if you think we've reached the consensus?
http://wiki.postgresql.org/wiki/WriteableCTEs

Also, wrapping up the discussion like pros & cons on the different
execution models helps not only the advance discussions but also
reviews of this patch.

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] POSIX shared memory redux

2010-11-14 Thread Martijn van Oosterhout

On Sat, Nov 13, 2010 at 08:07:52PM -0500, Tom Lane wrote:
> "A.M."  writes:
> > The goal of this work is to address all of the shortcomings of previous 
> > POSIX shared memory patches as pointed out mostly by Tom Lane.
> 
> It seems like you've failed to understand the main shortcoming of this
> whole idea, which is the loss of ability to detect pre-existing backends
> still running in a cluster whose postmaster has crashed.  The nattch
> variable of SysV shmem segments is really pretty critical to us, and
> AFAIK there simply is no substitute for it in POSIX-land.

I've been looking and there really doesn't appear to be. This is
consistant as there is nothing else in POSIX where you can determine
how many other people have the same file, pipe, tty, etc open.

I asked a few people for ideas and got answers like: just walk through
/proc and check. Apart from the portability issues, this won't work if
there are different user-IDs in play.

The only real solution seems to me to be to keep a small SysV shared
memory segment for the locking and allocate the rest of the shared
memory some other way. If all backends map the SysV memory before the
other way, then you can use the non-existance of the SysV SHM to
determine the non-existance of the other segment.

Quite a bit more work, ISTM.

Haveva nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle

signature.asc
Description: Digital signature

Re: [HACKERS] Refactoring the Type System

On Fri, Nov 12, 2010 at 4:07 PM, Jeff Davis  wrote:
> I think the best we'll do is be able to hack on some of the things that
> we actively want and have clear use cases for, such as type interfaces.
> We might have to give up on some of the more ambitious ideas that
> involve propagating interesting information through the type inference
> system; or having any real type that wasn't declared with CREATE TYPE.
> Consider that right now we bundle the element type information along
> with the array _value_.

Here are some weaknesses in the SUM aggregate that run up against the
type system. Maybe they'll help crystallize some discussion:

SUM(int2) => int4
SUM(int4) => int8
SUM(int8) => numeric

Some weaknesses:

SUM, of any precision, assumes that the precision being accumulated
into (which is also the return-precision) is enough to avoid overflow.
This is generally the case, but there's no reason why it *must* be
true. I'm especially looking at the int2 to int4 conversion. One could
imagine a more interesting scenario where overflow behavior could
occur much more easily.

SUM always promotes types upwards in precision, and is unable to keep
types of the smallest possible precision should SUM expressions be
composed.

SUM is unable to maintain any supplementary information about
precision, i.e. say anything interesting about the typmod, which
defeats or makes impossible many interesting optimizations.

I think that a type-interface system moves towards solving the first
couple of problems, since SUM can return some abstract type such as
"Integer" and use that to promote more aggressively (avoiding
overflow) or keep representations small (avoiding unnecessary
promotion) at run-time. It might require Integer to be an abstract,
non-storable data type, though, so the current CREATE TYPE is not
going to make life easy.

The third problem is slightly different...it might require some
user-pluggable code to be called as part of semantic analysis. I have
felt the idea of making a Postgres Type a more robust first-class data
type and somehow being able to attach a function to another
function/aggregate that is responsible for getting called during
semantic analysis and returning the proper signature roll around in my
head, but it might be the whispers of cthulu, too.

fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Refactoring the Type System