Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-15 Thread Jaime Casanova
On Sun, May 15, 2011 at 6:03 PM, Jaime Casanova  wrote:
> On Thu, May 5, 2011 at 10:59 AM, Tom Lane  wrote:
>> Magnus Hagander  writes:
 So even if people don't believe in the rationale behind the patch,
 would allowing it harm anything at this point?
>>
>>> Adding it for the sake of upgrades seems very far fetched.
>>
>>> Adding it for the sake of giving a better error message seems like a
>>> very good idea. But in that case, the client side code to actually
>>> give a better error message should be included from the start, IMHO.
>>
>> What's not apparent to me is how we'll even get to this check; if
>> there's a mismatch, won't the database system identifier comparison
>> fail first in most scenarios?
>>
>
> that's why i didn't propose that to begin with... but thinking on
> that, we can use it to add a message in pg_basebackup, maybe just a
> warning if we are taking a basebackup from an incompatible system...
>
> but for that i will need to add xlog_internal.h and postgres.h to
> pg_basebackup and use the "#define FRONTEND 1" hack we have in
> pg_resetxlog
>

attached, comments?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 6be5a14..2235c7f 100644
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
*** The commands accepted in walsender mode
*** 1315,1321 
  
   
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing three fields:
   
  
   
--- 1315,1321 
  
   
Requests the server to identify itself. Server replies with a result
!   set of a single row, containing four fields:
   
  
   
*** The commands accepted in walsender mode
*** 1356,1361 
--- 1356,1372 



+ 
+   
+   
+xlogversion
+   
+   
+   
+Current version of xlog page format.
+   
+   
+   
  

   
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 0831b1b..7e7354b 100644
*** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
--- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
*** libpqrcv_connect(char *conninfo, XLogRec
*** 114,120 
  		"the primary server: %s",
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 3 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
--- 114,120 
  		"the primary server: %s",
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) != 4 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 470e6d1..392cf94 100644
*** a/src/backend/replication/walsender.c
--- b/src/backend/replication/walsender.c
*** IdentifySystem(void)
*** 279,289 
  	char		sysid[32];
  	char		tli[11];
  	char		xpos[MAXFNAMELEN];
  	XLogRecPtr	logptr;
  
  	/*
! 	 * Reply with a result set with one row, three columns. First col is
! 	 * system ID, second is timeline ID, and third is current xlog location.
  	 */
  
  	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
--- 279,291 
  	char		sysid[32];
  	char		tli[11];
  	char		xpos[MAXFNAMELEN];
+ 	char		xlp_magic[7];
  	XLogRecPtr	logptr;
  
  	/*
! 	 * Reply with a result set with one row, four columns. First col is
! 	 * system ID, second is timeline ID, third is current xlog location
! 	 * and fourth is XLOG_PAGE_MAGIC (WAL version)
  	 */
  
  	snprintf(sysid, sizeof(sysid), UINT64_FORMAT,
*** IdentifySystem(void)
*** 295,303 
  	snprintf(xpos, sizeof(xpos), "%X/%X",
  			 logptr.xlogid, logptr.xrecoff);
  
  	/* Send a RowDescription message */
  	pq_beginmessage(&buf, 'T');
! 	pq_sendint(&buf, 3, 2);		/* 3 fields */
  
  	/* first field */
  	pq_sendstring(&buf, "systemid");	/* col name */
--- 297,307 
  	snprintf(xpos, sizeof(xpos), "%X/%X",
  			 logptr.xlogid, logptr.xrecoff);
  
+ 	snprintf(xlp_magic, sizeof(xlp_magic), "%u", XLOG_PAGE_MAGIC);
+ 
  	/* Send a RowDescription message */
  	pq_beginmessage(&buf, 'T');
! 	pq_sendint(&buf, 4, 2);		/* 4 fields */
  
  	/* first field */
  	pq_sendstring(&buf, "systemid");	/* col name */
*** IdentifySystem(void)
*** 325,341 
  	pq_sendint(&buf, -1, 2);
  	pq_sendint(&buf, 0, 4);
  	pq_sendint(&buf, 0, 2);
  	pq_endmessage(&buf);
  
  	/* Send a DataRow message */
  	pq_beginmessage(&buf, 'D');
! 	pq_sendint(&buf, 3, 2);		/* # of columns */
  	pq_sendint(&buf, strlen(sysid), 4); /* col1 len */
  	pq_sendbytes(&buf, (char *) &sysid, strlen(sysid));
  	pq_sendint(&buf, strlen(tli), 4);	/* col2 len */
  	pq_sendbytes(&buf, (char *) tli, strlen(tli))

[HACKERS] Isolation checks under MSVC

2011-05-15 Thread Andrew Dunstan


I've committed a bunch of changes both in the Postgres code and the 
buildfarm code to enable running the isolation checks under MSVC. 
There's one hurdle that I haven't overcome: the code tries to call 
"./isolationtester" and Windows barfs on it. I think we need to remove 
that way of doing things, and instead user a full path for such commands 
like we do elsewhere.


For now I have disabled the test on the one MSVC member I had enabled it 
for, to clear the buildfarm error. I had hoped to get this completed 
before I leave for pgcon in the morning, but now it will have to wait a 
week before I am able to test it again, as I won't have a suitable 
machine with me in Ottawa.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache estimates, cache access cost

2011-05-15 Thread Greg Smith

Cédric Villemain wrote:

http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache
  


This rebases easily to make Cedric's changes move to the end; I just 
pushed a version with that change to 
https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone 
wants a cleaner one to browse.  I've attached a patch too if that's more 
your thing.


I'd recommend not getting too stuck on the particular hook Cédric has 
added here to compute the cache estimate, which uses mmap and mincore to 
figure it out.  It's possible to compute similar numbers, albeit less 
accurate, using an approach similar to how pg_buffercache inspects 
things.  And I even once wrote a background writer extension that 
collected this sort of data as it was running the LRU scan anyway.  
Discussions of this idea seem to focus on how the "what's in the cache?" 
data is collected, which as far as I'm concerned is the least important 
part.  There are multiple options, some work better than others, and 
there's no reason that can't be swapped out later.  The more important 
question is how to store the data collected and then use it for 
optimizing queries.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


diff --git a/contrib/Makefile b/contrib/Makefile
index 6967767..47652d5 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -27,6 +27,7 @@ SUBDIRS = \
 		lo		\
 		ltree		\
 		oid2name	\
+		oscache	\
 		pageinspect	\
 		passwordcheck	\
 		pg_archivecleanup \
diff --git a/contrib/oscache/Makefile b/contrib/oscache/Makefile
new file mode 100644
index 000..8d8dcc5
--- /dev/null
+++ b/contrib/oscache/Makefile
@@ -0,0 +1,15 @@
+# contrib/oscache/Makefile
+
+MODULE_big = oscache
+OBJS = oscache.o
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/oscache
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/oscache/oscache.c b/contrib/oscache/oscache.c
new file mode 100644
index 000..1ad7dc2
--- /dev/null
+++ b/contrib/oscache/oscache.c
@@ -0,0 +1,151 @@
+/*-
+ *
+ * oscache.c
+ *
+ *
+ * Copyright (c) 2011, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  contrib/oscache/oscache.c
+ *
+ *-
+ */
+/* { POSIX stuff */
+#include  /* exit, calloc, free */
+#include  /* stat, fstat */
+#include  /* size_t, mincore */
+#include  /* sysconf, close */
+#include  /* mmap, mincore */
+/* } */
+
+/* { PostgreSQL stuff */
+#include "postgres.h" /* general Postgres declarations */
+#include "utils/rel.h" /* Relation */
+#include "storage/bufmgr.h"
+#include "catalog/catalog.h" /* relpath */
+/* } */
+
+PG_MODULE_MAGIC;
+
+void		_PG_init(void);
+
+float4 oscache(Relation, ForkNumber);
+
+/*
+ * Module load callback
+ */
+void
+_PG_init(void)
+{
+	/* Install hook. */
+	OSCache_hook = &oscache;
+}
+
+/*
+ * oscache process the os cache inspection for the relation.
+ * It returns the percentage of blocks in OS cache.
+ */
+float4
+oscache(Relation relation, ForkNumber forkNum)
+{
+	int  segment = 0;
+	char *relationpath;
+	char filename[MAXPGPATH];
+	int fd;
+	int64  total_block_disk = 0;
+	int64  total_block_mem  = 0;
+
+	/* OS things */
+	int64 pageSize  = sysconf(_SC_PAGESIZE); /* Page size */
+	register int64 pageIndex;
+
+	relationpath = relpathperm(relation->rd_node, forkNum);
+
+	/*
+	 * For each segment of the relation
+	 */
+	snprintf(filename, MAXPGPATH, "%s", relationpath);
+	while ((fd = open(filename, O_RDONLY)) != -1)
+	{
+		// for stat file
+		struct stat st;
+		// for mmap file
+		void *pa = (char *)0;
+		// for calloc file
+		unsigned char *vec = (unsigned char *)0;
+		int64  block_disk = 0;
+		int64  block_mem  = 0;
+
+		if (fstat(fd, &st) == -1)
+		{
+			close(fd);
+			elog(ERROR, "Can not stat object file : %s",
+filename);
+			return 0;
+		}
+
+		/*
+		* if file ok
+		* then process
+		*/
+		if (st.st_size != 0)
+		{
+			/* number of block in the current file */
+			block_disk = st.st_size/pageSize;
+
+			/* TODO We need to split mmap size to be sure (?) to be able to mmap */
+			pa = mmap(NULL, st.st_size, PROT_NONE, MAP_SHARED, fd, 0);
+			if (pa == MAP_FAILED)
+			{
+close(fd);
+elog(ERROR, "Can not mmap object file : %s, errno = %i,%s\nThis error can happen if there is not enought space in memory to do the projection. Please mail ced...@2ndquadrant.fr with '[oscache] ENOMEM' as subject.",
+	filename, errno, strerror(errno));
+return 0;
+			}
+
+			/* Prepare our vector containing all blocks information */
+			vec = calloc(1, (st.st_size+pageSize-1)/pageSize);
+			if ((void *)0 == vec)
+			{
+munmap(pa, st.st_size);
+close(fd);
+elog(ERROR

Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 10:13 PM, Jaime Casanova  wrote:
> On Sun, May 15, 2011 at 9:01 PM, Robert Haas  wrote:
>> On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova  
>> wrote:
>>> still, we have a problem... because we are happily ignoring correctely
>>> created casts...
>>> at least, we should document that casts on domains are ignored and
>>> that we should use the base types instead, maybe even a warning or a
>>> notice when issuing the CREATE CAST command using domains...
>>>
>>> make the user think everything is fine when it's not is not a good idea
>>
>> +1.
>>
>
> ok, i will make a patch for this... btw, why is that we allow to
> create those casts at all? or we can deny them?

Before you write the patch... we should probably try to agree on which
of the various options you mention makes most sense.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Jaime Casanova
On Sun, May 15, 2011 at 9:01 PM, Robert Haas  wrote:
> On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova  wrote:
>> still, we have a problem... because we are happily ignoring correctely
>> created casts...
>> at least, we should document that casts on domains are ignored and
>> that we should use the base types instead, maybe even a warning or a
>> notice when issuing the CREATE CAST command using domains...
>>
>> make the user think everything is fine when it's not is not a good idea
>
> +1.
>

ok, i will make a patch for this... btw, why is that we allow to
create those casts at all? or we can deny them?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extension Packaging

2011-05-15 Thread Robert Haas
On Thu, May 12, 2011 at 3:29 AM, Dimitri Fontaine
 wrote:
> Robert Haas  writes:
>>> Okay, how we add a "revision" key to the control file and extrevision to
>>> the pg_extension catalog. Its type can be "TEXT" and is optional for use
>>> by extensions.
>>
>> How would pg_extension.extrevision be kept up to date?  AFAICS, the
>> whole point is that you might swap out the shared libraries without
>> doing anything at the SQL level.
>
> Well, ALTER EXTENSION UPDATE could be taught to check for control file
> changes only when there's no upgrade script.  I think it already did
> that for some fields, like require and comment, but it's no longer the
> case.
>
> Still, I would think that it should be possible to update some metadata
> of the extension without running an SQL upgrade script.

The point is that something like RPM is not going to run *any* SQL
command.  It's just going to replace the files in the filesystem.  If
the view can pull that info from some file on the fly, then it can be
pretty much guaranteed to be up-to-date and accurate.  Anything else
seems hit or miss.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova  wrote:
> still, we have a problem... because we are happily ignoring correctely
> created casts...
> at least, we should document that casts on domains are ignored and
> that we should use the base types instead, maybe even a warning or a
> notice when issuing the CREATE CAST command using domains...
>
> make the user think everything is fine when it's not is not a good idea

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Jaime Casanova
On Sun, May 15, 2011 at 1:53 PM, Robert Haas  wrote:
>
> So let's think about some harder scenarios.
>
> Given two types T1 and T2, and two domains D1 over T1 and D2 over T2,
> and a cast from a value of type D1 to type D2, then:
>

ok. a few fair questions, thanks

> (1) If there is an implicit cast from D1 to D2 and an implicit cast
> from T1 to T2, then presumably we should use the cast from D1 to D2,
> since it's more specific.  Or if none of the available casts involve
> domains, but there is a cast of some sort from T1 to T2, then it seems
> clear to use that one.  But what if we instead have a cast from D1 to
> T2 and a cast from T1 to D2?  Which one should we prefer?  Why?
>

ok, this is the specific problem i said not to touch in my patch...
but, IMHO, we should use T1->D2 on the base that that is the final
type the user wants...

> (2) What happens if there are casts T1 -> T2 and D1 -> D2, but the D1
> -> D2 cast is explicit, while the T1 -> T2 cast is on-assignment?
> Should we use the D1->D2 cast when the context is explicit and the
> T1->T2 when the context is on-assignment?  That seems confusing.
> Alternatively, we could decide that the on-assignment cast always
> beats the explicit cast, even in an explicit-cast context.  But that
> amounts to ignoring the D1->D2 cast altogether - is that what we want?
>

confusing yes. still, imho, we should use the casts based on context
as we always do...

> (3) What happens if one or both of T1 or T2 are themselves domains
> over some other types T3 and T4, respectively?  Now there are nine
> possible pairings of types (any of D1, T1, T3 with any of D2, T2, T4)
> and in each case the available cast could have one of three contexts.
> How do we decide which cast wins?  Also, we might now need 9 probes
> into pg_cast to find all the relevant casts, whereas the current code
> needs just one - even in the previous scenario, we might need 4 probes
> instead of one.  That'll be slower - is it worth it?  What's the
> performance penalty in an artificially constructed worst case?
>
> (4) What happens if T1 is a domain over T2?  Then we have another
> option - cast D1 to D2 by smashing it to its base type twice (first to
> T1, then to T2), and the re-applying any domain constraint on D2.  Is
> that preferable to applying a cast from D1 to T1 and then casting to
> D2?  Is it preferable to applying a cast from D1 to T2 and then
> re-applying the domain constraint on D2?  Is it preferable to a cast
> directly from D1 to D2?  And in each case, does the context of the
> cast matter?
>

these two are very good questions and i can't see a "right" answer for them

> I'm not throwing these questions out to be flip or to make life
> difficult - I'm just saying they need to be thought about, and the
> answers don't seem obvious (to me).
>

still, we have a problem... because we are happily ignoring correctely
created casts...
at least, we should document that casts on domains are ignored and
that we should use the base types instead, maybe even a warning or a
notice when issuing the CREATE CAST command using domains...

make the user think everything is fine when it's not is not a good idea

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM

2011-05-15 Thread Jaime Casanova
On Thu, May 5, 2011 at 10:59 AM, Tom Lane  wrote:
> Magnus Hagander  writes:
>>> So even if people don't believe in the rationale behind the patch,
>>> would allowing it harm anything at this point?
>
>> Adding it for the sake of upgrades seems very far fetched.
>
>> Adding it for the sake of giving a better error message seems like a
>> very good idea. But in that case, the client side code to actually
>> give a better error message should be included from the start, IMHO.
>
> What's not apparent to me is how we'll even get to this check; if
> there's a mismatch, won't the database system identifier comparison
> fail first in most scenarios?
>

that's why i didn't propose that to begin with... but thinking on
that, we can use it to add a message in pg_basebackup, maybe just a
warning if we are taking a basebackup from an incompatible system...

but for that i will need to add xlog_internal.h and postgres.h to
pg_basebackup and use the "#define FRONTEND 1" hack we have in
pg_resetxlog

> I'm also wondering why send WAL version number and not, say, catalog
> version number, if there's some idea that we need more tests than the
> system identifier comparison.
>

well... catversion is not that informative, we change it for a lot of
reasons, not only catalog estructure changes... so we can't swear that
xlog records will be incompatible just because catversion changes...

but yes, we need to know if catalog estructure has changed, maybe we
can change XLOG_PAGE_MAGIC when that happens?

> Given reasonable answers to these questions, I'd not object to putting
> in additional error testing.  I concur with Magnus that the patch should
> actually provide those tests, and not just put in an unused field.
>

actually, now is when we can play with that API at will when/if we can
make online upgrades work then we will be stuck with whatever we have
made. before that we know it won't affect anybody

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] cache estimates, cache access cost

2011-05-15 Thread Cédric Villemain
Hello

cache estimation and cache access cost are currently not accounted
explicitly: they have a cost associated with but no constants (other
than effective_cache_size but it has a very limited usage).

Every IO cost is build with a derivation of the seq_page_cost,
random_page_cost and the number of pages. Formulas are used in some
places to make the cost more or less, to take into account caching and
data alignment.

There are:

 * estimation of page we will find in the postgresql buffer cache
 * estimation of page we will find in the operating system cache buffer cache

and they can be compute for :

 * first access
 * several access

We currently don't make distinction between both cache areas (there is
more cache areas but we don't care here) and we 'prefer' estimate
several access instead of the first one.

There is also a point related to cost estimation, they are strong: for
example once a sort goes over work_mem, its cost jumped because page
access are accounted.

The current cost estimations are already very good, most of our
queries run well without those famous 'HINT' and the planner provide
the best plan in most cases.

But I believe that now we need more tools to improve even more the
cost estimation.
I would like to propose some ideas, not my ideas in all cases, the
topic is in the air since a long time and probably that everything has
already being said (at least around a beer or a pepsi)

Adding a new GUC "cache_page_cost":
- allows to cost the page access when it is estimated in cache
- allows to cost a sort exceeding work_mem but which should not hit disk
- allows to use random_page_cost for what it should be.
(I was tempted by a GUC "write_page_cost" but I am unsure for this one
at this stage)

Adding 2 columns to pg_class "oscache_percent" and "pgcache_percent"
(or similar names): they allow to store stats about the percentage of
a relation in each cache.
- Usage should be to estimate cost of first access to pages then use
the Mackert and Lohman formula on next access. The later only provide
a way to estimate cost of re-reading.

It is hard to advocate here with real expected performance gain other
than: we will have more options for more precise planner decision and
we may reduce the number of report for bad planning. (it is also in
the todolist to improve  cache estimation)

--

I've already hack a bit the core for that and added the 2 new columns
with hooks to update them. ANALYZE OSCACHE update one of them and a
plugin can be used to provide the estimate (so how it's filled is not
important, most OSes have solutions to estimate it accurately if
someone wonder)
It is as-is for POC, probably not clean enough to go to commit festand
not expected to go there before some consensus are done.
http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache

-- 

Hacking costsize is ... dangerous, I would say. Breaking something
which works already so well is easy. Changing only one cost function
is not enough to keep a good balance
Performance farm should help here ... and the full cycle for 9.2 too.

Comments ?
-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Urgent!

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 4:57 PM, Gelman  wrote:
> I need to be Unsubscribed!  ign...@verizon.net

There's a link for that here:

http://archives.postgresql.org/pgsql-hackers/

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Urgent!

2011-05-15 Thread Gelman

I need to be Unsubscribed!  ign...@verizon.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-15 Thread Cédric Villemain
2011/5/15 Robert Haas :
> On Fri, May 6, 2011 at 5:31 PM, Greg Smith  wrote:
>> I think that all the complexity with CRCs etc. is unlikely to lead anywhere
>> too, and those two issues are not completely unrelated.  The simplest,
>> safest thing here is the right way to approach this, not the most
>> complicated one, and a simpler format might add some flexibility here to
>> reload more cache state too.  The bottleneck on reloading the cache state is
>> reading everything from disk.  Trying to micro-optimize any other part of
>> that is moving in the wrong direction to me.  I doubt you'll ever measure a
>> useful benefit that overcomes the expense of maintaining the code.  And you
>> seem to be moving to where someone can't restore cache state when they
>> change shared_buffers.  A simpler implementation might still work in that
>> situation; reload until you run out of buffers if shared_buffers shrinks,
>> reload until you're done with the original size.
>
> I don't think there's any need for this to get data into
> shared_buffers at all.  Getting it into the OS cache oughta be plenty
> sufficient, no?
>
> ISTM that a very simple approach here would be to save the contents of
> each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
> buffers on startup.

+1
It is just an evolution of the current process if I understood the
explantions of the latest patch correctly.

>We could worry about additional complexity, like
> using fincore to probe the OS cache, in a follow-on patch.  While
> reloading only 8GB of maybe 30GB of cached data on restart would not
> be as good as reloading all of it, it would be a lot better than
> reloading none of it, and the gymnastics required seems substantially
> less.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Robert Haas
On Sun, May 15, 2011 at 3:26 AM, Jaime Casanova  wrote:
> Obviously it should run the cast from timestamp to int, why it will
> run a cast from a domain?

So let's think about some harder scenarios.

Given two types T1 and T2, and two domains D1 over T1 and D2 over T2,
and a cast from a value of type D1 to type D2, then:

(1) If there is an implicit cast from D1 to D2 and an implicit cast
from T1 to T2, then presumably we should use the cast from D1 to D2,
since it's more specific.  Or if none of the available casts involve
domains, but there is a cast of some sort from T1 to T2, then it seems
clear to use that one.  But what if we instead have a cast from D1 to
T2 and a cast from T1 to D2?  Which one should we prefer?  Why?

(2) What happens if there are casts T1 -> T2 and D1 -> D2, but the D1
-> D2 cast is explicit, while the T1 -> T2 cast is on-assignment?
Should we use the D1->D2 cast when the context is explicit and the
T1->T2 when the context is on-assignment?  That seems confusing.
Alternatively, we could decide that the on-assignment cast always
beats the explicit cast, even in an explicit-cast context.  But that
amounts to ignoring the D1->D2 cast altogether - is that what we want?

(3) What happens if one or both of T1 or T2 are themselves domains
over some other types T3 and T4, respectively?  Now there are nine
possible pairings of types (any of D1, T1, T3 with any of D2, T2, T4)
and in each case the available cast could have one of three contexts.
How do we decide which cast wins?  Also, we might now need 9 probes
into pg_cast to find all the relevant casts, whereas the current code
needs just one - even in the previous scenario, we might need 4 probes
instead of one.  That'll be slower - is it worth it?  What's the
performance penalty in an artificially constructed worst case?

(4) What happens if T1 is a domain over T2?  Then we have another
option - cast D1 to D2 by smashing it to its base type twice (first to
T1, then to T2), and the re-applying any domain constraint on D2.  Is
that preferable to applying a cast from D1 to T1 and then casting to
D2?  Is it preferable to applying a cast from D1 to T2 and then
re-applying the domain constraint on D2?  Is it preferable to a cast
directly from D1 to D2?  And in each case, does the context of the
cast matter?

I'm not throwing these questions out to be flip or to make life
difficult - I'm just saying they need to be thought about, and the
answers don't seem obvious (to me).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance-test farm

2011-05-15 Thread Josh Berkus
On 5/12/11 7:19 PM, Lou Picciano wrote:
> Josh My Man! How are you?!! 
> 
> 
> Is this the one?: http://planetdrizzle.org/ 

Since that's their blog feed, here's some durable links:

Testing tool:
http://docs.drizzle.org/testing/dbqp.html

Random query generator:
https://launchpad.net/randgen

However, looking at those now I'm not seeing response time as part of
the test, which is of course critical for us. Also, their test results
are diff-based, which is (as we know all too well) fragile.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for new feature: Buffer Cache Hibernation

2011-05-15 Thread Robert Haas
On Fri, May 6, 2011 at 5:31 PM, Greg Smith  wrote:
> I think that all the complexity with CRCs etc. is unlikely to lead anywhere
> too, and those two issues are not completely unrelated.  The simplest,
> safest thing here is the right way to approach this, not the most
> complicated one, and a simpler format might add some flexibility here to
> reload more cache state too.  The bottleneck on reloading the cache state is
> reading everything from disk.  Trying to micro-optimize any other part of
> that is moving in the wrong direction to me.  I doubt you'll ever measure a
> useful benefit that overcomes the expense of maintaining the code.  And you
> seem to be moving to where someone can't restore cache state when they
> change shared_buffers.  A simpler implementation might still work in that
> situation; reload until you run out of buffers if shared_buffers shrinks,
> reload until you're done with the original size.

I don't think there's any need for this to get data into
shared_buffers at all.  Getting it into the OS cache oughta be plenty
sufficient, no?

ISTM that a very simple approach here would be to save the contents of
each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those
buffers on startup.  We could worry about additional complexity, like
using fincore to probe the OS cache, in a follow-on patch.  While
reloading only 8GB of maybe 30GB of cached data on restart would not
be as good as reloading all of it, it would be a lot better than
reloading none of it, and the gymnastics required seems substantially
less.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reducing overhead of frequent table locks

2011-05-15 Thread Robert Haas
On Sat, May 14, 2011 at 1:33 PM, Jeff Janes  wrote:
> Would that risk be substantially worse than it currently is?  If a
> backend goes into the tank while holding access shared locks, it will
> still block access exclusive locks until it recovers.  And those
> queued access exclusive locks will block new access shared locks from
> other backends.   How much is risk magnified by the new approach,
> going from "any backend holding the lock is tanked" to "any process at
> all is tanked"?

I think that's a pretty substantial increase in risk.  Consider that
there may be 100 backends out there, one of which holds a relevant
lock.  Needing to wait for all of them to do something instead of just
one is quite different.

Also, quite apart from the possibility of hanging altogether, the
latency would probably be increased quite a bit, and not in a very
predictable fashion.

I have the impression that most of the problem comes from fighting
over CPU cache lines.  If that's correct, it may not be important to
avoid shared memory access per se; it may be good enough to arrange
things so that the shared memory which is accessed is *typically* not
being accessed by other backends.

> What I'd considered playing with in the past is having
> LockMethodLocalHash hang on to an Access Shared lock even after
> locallock->nLocks == 0, so that re-granting the lock would be a purely
> local operation.  Anyone wanting an Access Exclusive lock and not
> immediately getting it would have to send out a plea (via SINVA?) for
> other processes to release their locallock->nLocks == 0 locks.  But
> this would suffer from the same problem of tanked processes.

Yeah.  I have thought about this, too, but as with Noah's suggestion,
I think this would make the risk of things hanging up substantially
worse than it is now.   A backend that, under the present code,
wouldn't be holding an AccessShareLock at all, would now be holding
one that you'd have to convince it to release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Jaime Casanova
On Sun, May 15, 2011 at 2:13 AM, Darren Duncan  wrote:
>
>  ('1800-01-01 00:00:00'::timestamp)::int
>
> Now, since all values of a DOMAIN are also values of the base type the
> DOMAIN is defined as being a subset of, then the sub-expression within the
> parenthesis denotes a value that is both a timestamp and a datetime at the
> same time.
>
> So, if a generic "CAST(timestamp as int)" is already defined, and you define
> a "CAST(datetime as int)", then what should the above code (correct for
> misspelling) do, or should it fail?
>

Obviously it should run the cast from timestamp to int, why it will
run a cast from a domain?
the other way should be allowed, though... a cast from datetime to int
should first look for cast function using the domain and if it don't
find it then with base type

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DOMAINs and CASTs

2011-05-15 Thread Darren Duncan

Jaime Casanova wrote:

On Sat, May 14, 2011 at 8:42 PM, Darren Duncan  wrote:

First of all, what if "cast(timestamp as int)" was already defined?  Which
cast then would you expect to be invoked here?

 '1800-01-01 00:00:00'::int


i will expect an error in that case... what you're doing there is
casting an "unknown" to integer, for that to be valid you need an
intermediate cast to timestamp or in my case to datetime


Sorry, my bad; I meant to say (might be slightly misspelled):

  ('1800-01-01 00:00:00'::timestamp)::int

Now, since all values of a DOMAIN are also values of the base type the DOMAIN is 
defined as being a subset of, then the sub-expression within the parenthesis 
denotes a value that is both a timestamp and a datetime at the same time.


So, if a generic "CAST(timestamp as int)" is already defined, and you define a 
"CAST(datetime as int)", then what should the above code (correct for 
misspelling) do, or should it fail?


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers