Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Sun, May 15, 2011 at 6:03 PM, Jaime Casanova wrote: > On Thu, May 5, 2011 at 10:59 AM, Tom Lane wrote: >> Magnus Hagander writes: So even if people don't believe in the rationale behind the patch, would allowing it harm anything at this point? >> >>> Adding it for the sake of upgrades seems very far fetched. >> >>> Adding it for the sake of giving a better error message seems like a >>> very good idea. But in that case, the client side code to actually >>> give a better error message should be included from the start, IMHO. >> >> What's not apparent to me is how we'll even get to this check; if >> there's a mismatch, won't the database system identifier comparison >> fail first in most scenarios? >> > > that's why i didn't propose that to begin with... but thinking on > that, we can use it to add a message in pg_basebackup, maybe just a > warning if we are taking a basebackup from an incompatible system... > > but for that i will need to add xlog_internal.h and postgres.h to > pg_basebackup and use the "#define FRONTEND 1" hack we have in > pg_resetxlog > attached, comments? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 6be5a14..2235c7f 100644 *** a/doc/src/sgml/protocol.sgml --- b/doc/src/sgml/protocol.sgml *** The commands accepted in walsender mode *** 1315,1321 Requests the server to identify itself. Server replies with a result ! set of a single row, containing three fields: --- 1315,1321 Requests the server to identify itself. Server replies with a result ! set of a single row, containing four fields: *** The commands accepted in walsender mode *** 1356,1361 --- 1356,1372 + + + +xlogversion + + + +Current version of xlog page format. + + + diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c index 0831b1b..7e7354b 100644 *** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c --- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c *** libpqrcv_connect(char *conninfo, XLogRec *** 114,120 "the primary server: %s", PQerrorMessage(streamConn; } ! if (PQnfields(res) != 3 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); --- 114,120 "the primary server: %s", PQerrorMessage(streamConn; } ! if (PQnfields(res) != 4 || PQntuples(res) != 1) { int ntuples = PQntuples(res); int nfields = PQnfields(res); diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 470e6d1..392cf94 100644 *** a/src/backend/replication/walsender.c --- b/src/backend/replication/walsender.c *** IdentifySystem(void) *** 279,289 char sysid[32]; char tli[11]; char xpos[MAXFNAMELEN]; XLogRecPtr logptr; /* ! * Reply with a result set with one row, three columns. First col is ! * system ID, second is timeline ID, and third is current xlog location. */ snprintf(sysid, sizeof(sysid), UINT64_FORMAT, --- 279,291 char sysid[32]; char tli[11]; char xpos[MAXFNAMELEN]; + char xlp_magic[7]; XLogRecPtr logptr; /* ! * Reply with a result set with one row, four columns. First col is ! * system ID, second is timeline ID, third is current xlog location ! * and fourth is XLOG_PAGE_MAGIC (WAL version) */ snprintf(sysid, sizeof(sysid), UINT64_FORMAT, *** IdentifySystem(void) *** 295,303 snprintf(xpos, sizeof(xpos), "%X/%X", logptr.xlogid, logptr.xrecoff); /* Send a RowDescription message */ pq_beginmessage(&buf, 'T'); ! pq_sendint(&buf, 3, 2); /* 3 fields */ /* first field */ pq_sendstring(&buf, "systemid"); /* col name */ --- 297,307 snprintf(xpos, sizeof(xpos), "%X/%X", logptr.xlogid, logptr.xrecoff); + snprintf(xlp_magic, sizeof(xlp_magic), "%u", XLOG_PAGE_MAGIC); + /* Send a RowDescription message */ pq_beginmessage(&buf, 'T'); ! pq_sendint(&buf, 4, 2); /* 4 fields */ /* first field */ pq_sendstring(&buf, "systemid"); /* col name */ *** IdentifySystem(void) *** 325,341 pq_sendint(&buf, -1, 2); pq_sendint(&buf, 0, 4); pq_sendint(&buf, 0, 2); pq_endmessage(&buf); /* Send a DataRow message */ pq_beginmessage(&buf, 'D'); ! pq_sendint(&buf, 3, 2); /* # of columns */ pq_sendint(&buf, strlen(sysid), 4); /* col1 len */ pq_sendbytes(&buf, (char *) &sysid, strlen(sysid)); pq_sendint(&buf, strlen(tli), 4); /* col2 len */ pq_sendbytes(&buf, (char *) tli, strlen(tli))
[HACKERS] Isolation checks under MSVC
I've committed a bunch of changes both in the Postgres code and the buildfarm code to enable running the isolation checks under MSVC. There's one hurdle that I haven't overcome: the code tries to call "./isolationtester" and Windows barfs on it. I think we need to remove that way of doing things, and instead user a full path for such commands like we do elsewhere. For now I have disabled the test on the one MSVC member I had enabled it for, to clear the buildfarm error. I had hoped to get this completed before I leave for pgcon in the morning, but now it will have to wait a week before I am able to test it again, as I won't have a suitable machine with me in Ottawa. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] cache estimates, cache access cost
Cédric Villemain wrote: http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache This rebases easily to make Cedric's changes move to the end; I just pushed a version with that change to https://github.com/greg2ndQuadrant/postgres/tree/analyze_cache if anyone wants a cleaner one to browse. I've attached a patch too if that's more your thing. I'd recommend not getting too stuck on the particular hook Cédric has added here to compute the cache estimate, which uses mmap and mincore to figure it out. It's possible to compute similar numbers, albeit less accurate, using an approach similar to how pg_buffercache inspects things. And I even once wrote a background writer extension that collected this sort of data as it was running the LRU scan anyway. Discussions of this idea seem to focus on how the "what's in the cache?" data is collected, which as far as I'm concerned is the least important part. There are multiple options, some work better than others, and there's no reason that can't be swapped out later. The more important question is how to store the data collected and then use it for optimizing queries. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us diff --git a/contrib/Makefile b/contrib/Makefile index 6967767..47652d5 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -27,6 +27,7 @@ SUBDIRS = \ lo \ ltree \ oid2name \ + oscache \ pageinspect \ passwordcheck \ pg_archivecleanup \ diff --git a/contrib/oscache/Makefile b/contrib/oscache/Makefile new file mode 100644 index 000..8d8dcc5 --- /dev/null +++ b/contrib/oscache/Makefile @@ -0,0 +1,15 @@ +# contrib/oscache/Makefile + +MODULE_big = oscache +OBJS = oscache.o + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/oscache +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif diff --git a/contrib/oscache/oscache.c b/contrib/oscache/oscache.c new file mode 100644 index 000..1ad7dc2 --- /dev/null +++ b/contrib/oscache/oscache.c @@ -0,0 +1,151 @@ +/*- + * + * oscache.c + * + * + * Copyright (c) 2011, PostgreSQL Global Development Group + * + * IDENTIFICATION + * contrib/oscache/oscache.c + * + *- + */ +/* { POSIX stuff */ +#include /* exit, calloc, free */ +#include /* stat, fstat */ +#include /* size_t, mincore */ +#include /* sysconf, close */ +#include /* mmap, mincore */ +/* } */ + +/* { PostgreSQL stuff */ +#include "postgres.h" /* general Postgres declarations */ +#include "utils/rel.h" /* Relation */ +#include "storage/bufmgr.h" +#include "catalog/catalog.h" /* relpath */ +/* } */ + +PG_MODULE_MAGIC; + +void _PG_init(void); + +float4 oscache(Relation, ForkNumber); + +/* + * Module load callback + */ +void +_PG_init(void) +{ + /* Install hook. */ + OSCache_hook = &oscache; +} + +/* + * oscache process the os cache inspection for the relation. + * It returns the percentage of blocks in OS cache. + */ +float4 +oscache(Relation relation, ForkNumber forkNum) +{ + int segment = 0; + char *relationpath; + char filename[MAXPGPATH]; + int fd; + int64 total_block_disk = 0; + int64 total_block_mem = 0; + + /* OS things */ + int64 pageSize = sysconf(_SC_PAGESIZE); /* Page size */ + register int64 pageIndex; + + relationpath = relpathperm(relation->rd_node, forkNum); + + /* + * For each segment of the relation + */ + snprintf(filename, MAXPGPATH, "%s", relationpath); + while ((fd = open(filename, O_RDONLY)) != -1) + { + // for stat file + struct stat st; + // for mmap file + void *pa = (char *)0; + // for calloc file + unsigned char *vec = (unsigned char *)0; + int64 block_disk = 0; + int64 block_mem = 0; + + if (fstat(fd, &st) == -1) + { + close(fd); + elog(ERROR, "Can not stat object file : %s", +filename); + return 0; + } + + /* + * if file ok + * then process + */ + if (st.st_size != 0) + { + /* number of block in the current file */ + block_disk = st.st_size/pageSize; + + /* TODO We need to split mmap size to be sure (?) to be able to mmap */ + pa = mmap(NULL, st.st_size, PROT_NONE, MAP_SHARED, fd, 0); + if (pa == MAP_FAILED) + { +close(fd); +elog(ERROR, "Can not mmap object file : %s, errno = %i,%s\nThis error can happen if there is not enought space in memory to do the projection. Please mail ced...@2ndquadrant.fr with '[oscache] ENOMEM' as subject.", + filename, errno, strerror(errno)); +return 0; + } + + /* Prepare our vector containing all blocks information */ + vec = calloc(1, (st.st_size+pageSize-1)/pageSize); + if ((void *)0 == vec) + { +munmap(pa, st.st_size); +close(fd); +elog(ERROR
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 10:13 PM, Jaime Casanova wrote: > On Sun, May 15, 2011 at 9:01 PM, Robert Haas wrote: >> On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova >> wrote: >>> still, we have a problem... because we are happily ignoring correctely >>> created casts... >>> at least, we should document that casts on domains are ignored and >>> that we should use the base types instead, maybe even a warning or a >>> notice when issuing the CREATE CAST command using domains... >>> >>> make the user think everything is fine when it's not is not a good idea >> >> +1. >> > > ok, i will make a patch for this... btw, why is that we allow to > create those casts at all? or we can deny them? Before you write the patch... we should probably try to agree on which of the various options you mention makes most sense. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 9:01 PM, Robert Haas wrote: > On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova wrote: >> still, we have a problem... because we are happily ignoring correctely >> created casts... >> at least, we should document that casts on domains are ignored and >> that we should use the base types instead, maybe even a warning or a >> notice when issuing the CREATE CAST command using domains... >> >> make the user think everything is fine when it's not is not a good idea > > +1. > ok, i will make a patch for this... btw, why is that we allow to create those casts at all? or we can deny them? -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Extension Packaging
On Thu, May 12, 2011 at 3:29 AM, Dimitri Fontaine wrote: > Robert Haas writes: >>> Okay, how we add a "revision" key to the control file and extrevision to >>> the pg_extension catalog. Its type can be "TEXT" and is optional for use >>> by extensions. >> >> How would pg_extension.extrevision be kept up to date? AFAICS, the >> whole point is that you might swap out the shared libraries without >> doing anything at the SQL level. > > Well, ALTER EXTENSION UPDATE could be taught to check for control file > changes only when there's no upgrade script. I think it already did > that for some fields, like require and comment, but it's no longer the > case. > > Still, I would think that it should be possible to update some metadata > of the extension without running an SQL upgrade script. The point is that something like RPM is not going to run *any* SQL command. It's just going to replace the files in the filesystem. If the view can pull that info from some file on the fly, then it can be pretty much guaranteed to be up-to-date and accurate. Anything else seems hit or miss. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 7:43 PM, Jaime Casanova wrote: > still, we have a problem... because we are happily ignoring correctely > created casts... > at least, we should document that casts on domains are ignored and > that we should use the base types instead, maybe even a warning or a > notice when issuing the CREATE CAST command using domains... > > make the user think everything is fine when it's not is not a good idea +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 1:53 PM, Robert Haas wrote: > > So let's think about some harder scenarios. > > Given two types T1 and T2, and two domains D1 over T1 and D2 over T2, > and a cast from a value of type D1 to type D2, then: > ok. a few fair questions, thanks > (1) If there is an implicit cast from D1 to D2 and an implicit cast > from T1 to T2, then presumably we should use the cast from D1 to D2, > since it's more specific. Or if none of the available casts involve > domains, but there is a cast of some sort from T1 to T2, then it seems > clear to use that one. But what if we instead have a cast from D1 to > T2 and a cast from T1 to D2? Which one should we prefer? Why? > ok, this is the specific problem i said not to touch in my patch... but, IMHO, we should use T1->D2 on the base that that is the final type the user wants... > (2) What happens if there are casts T1 -> T2 and D1 -> D2, but the D1 > -> D2 cast is explicit, while the T1 -> T2 cast is on-assignment? > Should we use the D1->D2 cast when the context is explicit and the > T1->T2 when the context is on-assignment? That seems confusing. > Alternatively, we could decide that the on-assignment cast always > beats the explicit cast, even in an explicit-cast context. But that > amounts to ignoring the D1->D2 cast altogether - is that what we want? > confusing yes. still, imho, we should use the casts based on context as we always do... > (3) What happens if one or both of T1 or T2 are themselves domains > over some other types T3 and T4, respectively? Now there are nine > possible pairings of types (any of D1, T1, T3 with any of D2, T2, T4) > and in each case the available cast could have one of three contexts. > How do we decide which cast wins? Also, we might now need 9 probes > into pg_cast to find all the relevant casts, whereas the current code > needs just one - even in the previous scenario, we might need 4 probes > instead of one. That'll be slower - is it worth it? What's the > performance penalty in an artificially constructed worst case? > > (4) What happens if T1 is a domain over T2? Then we have another > option - cast D1 to D2 by smashing it to its base type twice (first to > T1, then to T2), and the re-applying any domain constraint on D2. Is > that preferable to applying a cast from D1 to T1 and then casting to > D2? Is it preferable to applying a cast from D1 to T2 and then > re-applying the domain constraint on D2? Is it preferable to a cast > directly from D1 to D2? And in each case, does the context of the > cast matter? > these two are very good questions and i can't see a "right" answer for them > I'm not throwing these questions out to be flip or to make life > difficult - I'm just saying they need to be thought about, and the > answers don't seem obvious (to me). > still, we have a problem... because we are happily ignoring correctely created casts... at least, we should document that casts on domains are ignored and that we should use the base types instead, maybe even a warning or a notice when issuing the CREATE CAST command using domains... make the user think everything is fine when it's not is not a good idea -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] adding a new column in IDENTIFY_SYSTEM
On Thu, May 5, 2011 at 10:59 AM, Tom Lane wrote: > Magnus Hagander writes: >>> So even if people don't believe in the rationale behind the patch, >>> would allowing it harm anything at this point? > >> Adding it for the sake of upgrades seems very far fetched. > >> Adding it for the sake of giving a better error message seems like a >> very good idea. But in that case, the client side code to actually >> give a better error message should be included from the start, IMHO. > > What's not apparent to me is how we'll even get to this check; if > there's a mismatch, won't the database system identifier comparison > fail first in most scenarios? > that's why i didn't propose that to begin with... but thinking on that, we can use it to add a message in pg_basebackup, maybe just a warning if we are taking a basebackup from an incompatible system... but for that i will need to add xlog_internal.h and postgres.h to pg_basebackup and use the "#define FRONTEND 1" hack we have in pg_resetxlog > I'm also wondering why send WAL version number and not, say, catalog > version number, if there's some idea that we need more tests than the > system identifier comparison. > well... catversion is not that informative, we change it for a lot of reasons, not only catalog estructure changes... so we can't swear that xlog records will be incompatible just because catversion changes... but yes, we need to know if catalog estructure has changed, maybe we can change XLOG_PAGE_MAGIC when that happens? > Given reasonable answers to these questions, I'd not object to putting > in additional error testing. I concur with Magnus that the patch should > actually provide those tests, and not just put in an unused field. > actually, now is when we can play with that API at will when/if we can make online upgrades work then we will be stuck with whatever we have made. before that we know it won't affect anybody -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] cache estimates, cache access cost
Hello cache estimation and cache access cost are currently not accounted explicitly: they have a cost associated with but no constants (other than effective_cache_size but it has a very limited usage). Every IO cost is build with a derivation of the seq_page_cost, random_page_cost and the number of pages. Formulas are used in some places to make the cost more or less, to take into account caching and data alignment. There are: * estimation of page we will find in the postgresql buffer cache * estimation of page we will find in the operating system cache buffer cache and they can be compute for : * first access * several access We currently don't make distinction between both cache areas (there is more cache areas but we don't care here) and we 'prefer' estimate several access instead of the first one. There is also a point related to cost estimation, they are strong: for example once a sort goes over work_mem, its cost jumped because page access are accounted. The current cost estimations are already very good, most of our queries run well without those famous 'HINT' and the planner provide the best plan in most cases. But I believe that now we need more tools to improve even more the cost estimation. I would like to propose some ideas, not my ideas in all cases, the topic is in the air since a long time and probably that everything has already being said (at least around a beer or a pepsi) Adding a new GUC "cache_page_cost": - allows to cost the page access when it is estimated in cache - allows to cost a sort exceeding work_mem but which should not hit disk - allows to use random_page_cost for what it should be. (I was tempted by a GUC "write_page_cost" but I am unsure for this one at this stage) Adding 2 columns to pg_class "oscache_percent" and "pgcache_percent" (or similar names): they allow to store stats about the percentage of a relation in each cache. - Usage should be to estimate cost of first access to pages then use the Mackert and Lohman formula on next access. The later only provide a way to estimate cost of re-reading. It is hard to advocate here with real expected performance gain other than: we will have more options for more precise planner decision and we may reduce the number of report for bad planning. (it is also in the todolist to improve cache estimation) -- I've already hack a bit the core for that and added the 2 new columns with hooks to update them. ANALYZE OSCACHE update one of them and a plugin can be used to provide the estimate (so how it's filled is not important, most OSes have solutions to estimate it accurately if someone wonder) It is as-is for POC, probably not clean enough to go to commit festand not expected to go there before some consensus are done. http://git.postgresql.org/gitweb?p=users/c2main/postgres.git;a=shortlog;h=refs/heads/analyze_cache -- Hacking costsize is ... dangerous, I would say. Breaking something which works already so well is easy. Changing only one cost function is not enough to keep a good balance Performance farm should help here ... and the full cycle for 9.2 too. Comments ? -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Urgent!
On Sun, May 15, 2011 at 4:57 PM, Gelman wrote: > I need to be Unsubscribed! ign...@verizon.net There's a link for that here: http://archives.postgresql.org/pgsql-hackers/ -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Urgent!
I need to be Unsubscribed! ign...@verizon.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
2011/5/15 Robert Haas : > On Fri, May 6, 2011 at 5:31 PM, Greg Smith wrote: >> I think that all the complexity with CRCs etc. is unlikely to lead anywhere >> too, and those two issues are not completely unrelated. The simplest, >> safest thing here is the right way to approach this, not the most >> complicated one, and a simpler format might add some flexibility here to >> reload more cache state too. The bottleneck on reloading the cache state is >> reading everything from disk. Trying to micro-optimize any other part of >> that is moving in the wrong direction to me. I doubt you'll ever measure a >> useful benefit that overcomes the expense of maintaining the code. And you >> seem to be moving to where someone can't restore cache state when they >> change shared_buffers. A simpler implementation might still work in that >> situation; reload until you run out of buffers if shared_buffers shrinks, >> reload until you're done with the original size. > > I don't think there's any need for this to get data into > shared_buffers at all. Getting it into the OS cache oughta be plenty > sufficient, no? > > ISTM that a very simple approach here would be to save the contents of > each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those > buffers on startup. +1 It is just an evolution of the current process if I understood the explantions of the latest patch correctly. >We could worry about additional complexity, like > using fincore to probe the OS cache, in a follow-on patch. While > reloading only 8GB of maybe 30GB of cached data on restart would not > be as good as reloading all of it, it would be a lot better than > reloading none of it, and the gymnastics required seems substantially > less. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 3:26 AM, Jaime Casanova wrote: > Obviously it should run the cast from timestamp to int, why it will > run a cast from a domain? So let's think about some harder scenarios. Given two types T1 and T2, and two domains D1 over T1 and D2 over T2, and a cast from a value of type D1 to type D2, then: (1) If there is an implicit cast from D1 to D2 and an implicit cast from T1 to T2, then presumably we should use the cast from D1 to D2, since it's more specific. Or if none of the available casts involve domains, but there is a cast of some sort from T1 to T2, then it seems clear to use that one. But what if we instead have a cast from D1 to T2 and a cast from T1 to D2? Which one should we prefer? Why? (2) What happens if there are casts T1 -> T2 and D1 -> D2, but the D1 -> D2 cast is explicit, while the T1 -> T2 cast is on-assignment? Should we use the D1->D2 cast when the context is explicit and the T1->T2 when the context is on-assignment? That seems confusing. Alternatively, we could decide that the on-assignment cast always beats the explicit cast, even in an explicit-cast context. But that amounts to ignoring the D1->D2 cast altogether - is that what we want? (3) What happens if one or both of T1 or T2 are themselves domains over some other types T3 and T4, respectively? Now there are nine possible pairings of types (any of D1, T1, T3 with any of D2, T2, T4) and in each case the available cast could have one of three contexts. How do we decide which cast wins? Also, we might now need 9 probes into pg_cast to find all the relevant casts, whereas the current code needs just one - even in the previous scenario, we might need 4 probes instead of one. That'll be slower - is it worth it? What's the performance penalty in an artificially constructed worst case? (4) What happens if T1 is a domain over T2? Then we have another option - cast D1 to D2 by smashing it to its base type twice (first to T1, then to T2), and the re-applying any domain constraint on D2. Is that preferable to applying a cast from D1 to T1 and then casting to D2? Is it preferable to applying a cast from D1 to T2 and then re-applying the domain constraint on D2? Is it preferable to a cast directly from D1 to D2? And in each case, does the context of the cast matter? I'm not throwing these questions out to be flip or to make life difficult - I'm just saying they need to be thought about, and the answers don't seem obvious (to me). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] performance-test farm
On 5/12/11 7:19 PM, Lou Picciano wrote: > Josh My Man! How are you?!! > > > Is this the one?: http://planetdrizzle.org/ Since that's their blog feed, here's some durable links: Testing tool: http://docs.drizzle.org/testing/dbqp.html Random query generator: https://launchpad.net/randgen However, looking at those now I'm not seeing response time as part of the test, which is of course critical for us. Also, their test results are diff-based, which is (as we know all too well) fragile. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch for new feature: Buffer Cache Hibernation
On Fri, May 6, 2011 at 5:31 PM, Greg Smith wrote: > I think that all the complexity with CRCs etc. is unlikely to lead anywhere > too, and those two issues are not completely unrelated. The simplest, > safest thing here is the right way to approach this, not the most > complicated one, and a simpler format might add some flexibility here to > reload more cache state too. The bottleneck on reloading the cache state is > reading everything from disk. Trying to micro-optimize any other part of > that is moving in the wrong direction to me. I doubt you'll ever measure a > useful benefit that overcomes the expense of maintaining the code. And you > seem to be moving to where someone can't restore cache state when they > change shared_buffers. A simpler implementation might still work in that > situation; reload until you run out of buffers if shared_buffers shrinks, > reload until you're done with the original size. I don't think there's any need for this to get data into shared_buffers at all. Getting it into the OS cache oughta be plenty sufficient, no? ISTM that a very simple approach here would be to save the contents of each shared buffer on clean shutdown, and to POSIX_FADV_WILLNEED those buffers on startup. We could worry about additional complexity, like using fincore to probe the OS cache, in a follow-on patch. While reloading only 8GB of maybe 30GB of cached data on restart would not be as good as reloading all of it, it would be a lot better than reloading none of it, and the gymnastics required seems substantially less. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead of frequent table locks
On Sat, May 14, 2011 at 1:33 PM, Jeff Janes wrote: > Would that risk be substantially worse than it currently is? If a > backend goes into the tank while holding access shared locks, it will > still block access exclusive locks until it recovers. And those > queued access exclusive locks will block new access shared locks from > other backends. How much is risk magnified by the new approach, > going from "any backend holding the lock is tanked" to "any process at > all is tanked"? I think that's a pretty substantial increase in risk. Consider that there may be 100 backends out there, one of which holds a relevant lock. Needing to wait for all of them to do something instead of just one is quite different. Also, quite apart from the possibility of hanging altogether, the latency would probably be increased quite a bit, and not in a very predictable fashion. I have the impression that most of the problem comes from fighting over CPU cache lines. If that's correct, it may not be important to avoid shared memory access per se; it may be good enough to arrange things so that the shared memory which is accessed is *typically* not being accessed by other backends. > What I'd considered playing with in the past is having > LockMethodLocalHash hang on to an Access Shared lock even after > locallock->nLocks == 0, so that re-granting the lock would be a purely > local operation. Anyone wanting an Access Exclusive lock and not > immediately getting it would have to send out a plea (via SINVA?) for > other processes to release their locallock->nLocks == 0 locks. But > this would suffer from the same problem of tanked processes. Yeah. I have thought about this, too, but as with Noah's suggestion, I think this would make the risk of things hanging up substantially worse than it is now. A backend that, under the present code, wouldn't be holding an AccessShareLock at all, would now be holding one that you'd have to convince it to release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
On Sun, May 15, 2011 at 2:13 AM, Darren Duncan wrote: > > ('1800-01-01 00:00:00'::timestamp)::int > > Now, since all values of a DOMAIN are also values of the base type the > DOMAIN is defined as being a subset of, then the sub-expression within the > parenthesis denotes a value that is both a timestamp and a datetime at the > same time. > > So, if a generic "CAST(timestamp as int)" is already defined, and you define > a "CAST(datetime as int)", then what should the above code (correct for > misspelling) do, or should it fail? > Obviously it should run the cast from timestamp to int, why it will run a cast from a domain? the other way should be allowed, though... a cast from datetime to int should first look for cast function using the domain and if it don't find it then with base type -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DOMAINs and CASTs
Jaime Casanova wrote: On Sat, May 14, 2011 at 8:42 PM, Darren Duncan wrote: First of all, what if "cast(timestamp as int)" was already defined? Which cast then would you expect to be invoked here? '1800-01-01 00:00:00'::int i will expect an error in that case... what you're doing there is casting an "unknown" to integer, for that to be valid you need an intermediate cast to timestamp or in my case to datetime Sorry, my bad; I meant to say (might be slightly misspelled): ('1800-01-01 00:00:00'::timestamp)::int Now, since all values of a DOMAIN are also values of the base type the DOMAIN is defined as being a subset of, then the sub-expression within the parenthesis denotes a value that is both a timestamp and a datetime at the same time. So, if a generic "CAST(timestamp as int)" is already defined, and you define a "CAST(datetime as int)", then what should the above code (correct for misspelling) do, or should it fail? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers