Re: [HACKERS] reviewing the Reduce sinval synchronization overhead patch / b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4
This is really late, but ... On 08/21/12 11:20 PM, Robert Haas wrote: Our sinval synchronization mechanism has a somewhat weird design that makes this OK. ... I don't want to miss the change to thank you, Robert, for the detailed explanation. I have backported b4fbe392f8ff6ff1a66b488eb7197eef9e1770a4 to 9.1.3 and it is working well in production for ~2 weeks, but I must admit that I had put in the unnecessary read barrier into SIGetDataEntries just to be on the safe side. I will take it out for the next builds. Thanks, Nils -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting plpython 2+3 builds better
On Sun, 2012-09-09 at 03:35 -0400, Tom Lane wrote: Another problem is that Makefile.shlib isn't designed to build more than one shared library per directory, That's the main problem, but fixing it would be very useful in other places as well. I had it on my radar to do something about that. This would be a good thing. Got any ideas how to do it? Here is a very rough patch. It obviously will need a lot of fine-tuning, but it's the idea. diff --git a/src/Makefile.shlib b/src/Makefile.shlib index 4da2f10..790fbf4 100644 --- a/src/Makefile.shlib +++ b/src/Makefile.shlib @@ -77,6 +77,11 @@ COMPILER = $(CC) $(CFLAGS) LINK.static = $(AR) $(AROPT) +# legacy interface +ifeq ($(words $(NAMES)),1) +NAMES = $(NAME) +$(NAME)_OBJS = $(OBJS) +endif ifdef SO_MAJOR_VERSION @@ -89,10 +94,20 @@ shlib_bare = lib$(NAME)$(DLSUFFIX) soname = $(shlib_major) else # Naming convention for dynamically loadable modules -shlib = $(NAME)$(DLSUFFIX) +shlib_pattern = %$(DLSUFFIX) endif stlib = lib$(NAME).a +define shlib_template +_fullname = $$(patsubst %,$$(shlib_pattern),$(1)) +shlibs := $(shlibs) $$(_fullname) +$$(_fullname)_OBJS = $$($(1)_OBJS) +ALL_OBJS += $$($(1)_OBJS) +endef + +$(foreach name,$(NAMES),$(eval $(call shlib_template,$(name + + ifndef soname # additional flags for backend modules SHLIB_LINK += $(BE_DLLLIBS) @@ -309,7 +324,7 @@ endif all-static-lib: $(stlib) -all-shared-lib: $(shlib) +all-shared-lib: $(shlibs) ifndef haslibarule $(stlib): $(OBJS) | $(SHLIB_PREREQS) @@ -321,9 +336,10 @@ endif #haslibarule ifeq (,$(filter cygwin win32,$(PORTNAME))) ifneq ($(PORTNAME), aix) +.SECONDEXPANSION: # Normal case -$(shlib): $(OBJS) | $(SHLIB_PREREQS) - $(LINK.shared) -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) +$(shlibs): %: $$(%_OBJS) | $(SHLIB_PREREQS) + $(LINK.shared) -o $@ $^ $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) ifdef shlib_major # If we're using major and minor versions, then make a symlink to major-version-only. ifneq ($(shlib), $(shlib_major)) @@ -495,7 +511,7 @@ endif # no soname .PHONY: clean-lib clean-lib: - rm -f $(shlib) $(shlib_bare) $(shlib_major) $(stlib) $(exports_file) + rm -f $(shlibs) $(shlib_bare) $(shlib_major) $(stlib) $(exports_file) ifneq (,$(SHLIB_EXPORTS)) maintainer-clean-lib: diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile index e3fef84..182cd8e 100644 --- a/src/pl/plpgsql/src/Makefile +++ b/src/pl/plpgsql/src/Makefile @@ -11,13 +11,14 @@ top_builddir = ../../../.. include $(top_builddir)/src/Makefile.global # Shared library parameters -NAME= plpgsql +NAMES= plpgsql foobar override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) SHLIB_LINK = $(filter -lintl, $(LIBS)) rpath = -OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o +plpgsql_OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o +foobar_OBJS = foo.o bar.o DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql @@ -74,7 +75,7 @@ distprep: pl_gram.h pl_gram.c plerrcodes.h # pl_gram.c, pl_gram.h and plerrcodes.h are in the distribution tarball, # so they are not cleaned here. clean distclean: clean-lib - rm -f $(OBJS) + rm -f $(ALL_OBJS) maintainer-clean: clean rm -f pl_gram.c pl_gram.h plerrcodes.h -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting plpython 2+3 builds better
Excerpts from Peter Eisentraut's message of lun sep 10 09:50:42 -0300 2012: On Sun, 2012-09-09 at 03:35 -0400, Tom Lane wrote: Another problem is that Makefile.shlib isn't designed to build more than one shared library per directory, That's the main problem, but fixing it would be very useful in other places as well. I had it on my radar to do something about that. This would be a good thing. Got any ideas how to do it? Here is a very rough patch. It obviously will need a lot of fine-tuning, but it's the idea. I remember trying to do this for the mb/conversion_procs subdir years ago, to make them build in parallel to save some time. It didn't go anywhere but the basic idea seems similar in spirit. Maybe we can use this there too to make it fast. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Enum binary access
Hi there, we tried to create the libpqtypes enum binary send but it doesn't work: // register types PGregisterType user_def[] = { {seqtype, enum_put, enum_get} }; PQregisterTypes(connector-getConn(), PQT_USERDEFINED, user_def, 1, 0); // enum_put throws format error int enum_put (PGtypeArgs *args ) { char *val = va_arg(args-ap, char *); char *out = NULL; int vallen = 0, len = 0, oid = 0; float sortorder = 0.0; if (!args || !val) return 0; /* expand buffer enough */ vallen = strlen(val); len = sizeof(int) + sizeof(float) + (vallen * sizeof(char)); if (args-put.expandBuffer(args, len) == -1) return -1; /* put header (oid, sortorder) and value */ out = args-put.out; memcpy(out, oid, sizeof(int)); out += sizeof(int); memcpy(out, sortorder, sizeof(float)); out += sizeof(float); memcpy(out, val, vallen); return len; } // enum_get (FYI, get works OK) int enum_get (PGtypeArgs *args) { char *val = PQgetvalue(args-get.result, args-get.tup_num, args-get.field_num); int len = PQgetlength(args-get.result, args-get.tup_num, args-get.field_num); char **result = va_arg(args-ap, char **); *result = (char *) PQresultAlloc((PGresult *) args-get.result, len * sizeof(char)); memcpy(*result, val, len * sizeof(char)); return 0; } Postgres doesn't accept enum sent like this and throws format error. This should be used as a prototype for derived types. There is no real enum named type. Libpqypes doesn't seem to provide simplified binary manipulation for enum types. What should we do, please? Can you fix it? I think there is more people that need access enum types in binary mode. Cheers, Petr and Vojtech P.S. We have created the 9.1's cube extension send/receive functionality as in https://gitorious.org/vtapi/vtapi/trees/master/postgres/cube -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Enum binary access
On Mon, Sep 10, 2012 at 8:42 AM, Petr Chmelar chmel...@fit.vutbr.cz wrote: Hi there, we tried to create the libpqtypes enum binary send but it doesn't work: // register types PGregisterType user_def[] = { {seqtype, enum_put, enum_get} }; PQregisterTypes(connector-getConn(), PQT_USERDEFINED, user_def, 1, 0); // enum_put throws format error int enum_put (PGtypeArgs *args ) { char *val = va_arg(args-ap, char *); char *out = NULL; int vallen = 0, len = 0, oid = 0; float sortorder = 0.0; if (!args || !val) return 0; /* expand buffer enough */ vallen = strlen(val); len = sizeof(int) + sizeof(float) + (vallen * sizeof(char)); if (args-put.expandBuffer(args, len) == -1) return -1; /* put header (oid, sortorder) and value */ out = args-put.out; memcpy(out, oid, sizeof(int)); out += sizeof(int); memcpy(out, sortorder, sizeof(float)); out += sizeof(float); memcpy(out, val, vallen); return len; } // enum_get (FYI, get works OK) int enum_get (PGtypeArgs *args) { char *val = PQgetvalue(args-get.result, args-get.tup_num, args-get.field_num); int len = PQgetlength(args-get.result, args-get.tup_num, args-get.field_num); char **result = va_arg(args-ap, char **); *result = (char *) PQresultAlloc((PGresult *) args-get.result, len * sizeof(char)); memcpy(*result, val, len * sizeof(char)); return 0; } Postgres doesn't accept enum sent like this and throws format error. This should be used as a prototype for derived types. There is no real enum named type. Libpqypes doesn't seem to provide simplified binary manipulation for enum types. What should we do, please? Can you fix it? I think there is more people that need access enum types in binary mode. I was able to get it to work what I did: *) your 'get' routine should probably be allocating a terminating byte. In binary situations, PQgetlength does not return the length of the null terminator. *) backend binary format for enums is just the label text string both on get and put side. So your putting the oid and sort order was breaking the put side. Removed that and everything worked. *) see (very messy quickly written) code below: #include libpq-fe.h #include libpqtypes.h #include string.h // enum_put throws format error int enum_put (PGtypeArgs *args ) { char *val = va_arg(args-ap, char *); char *out = NULL; int vallen = 0, len = 0, oid = 0; float sortorder = 0.0; if (!args || !val) return 0; /* expand buffer enough */ vallen = strlen(val); if (args-put.expandBuffer(args, len) == -1) return -1; out = args-put.out; memcpy(out, val, vallen); return len; } // enum_get (FYI, get works OK) int enum_get (PGtypeArgs *args) { char *val = PQgetvalue(args-get.result, args-get.tup_num, args-get.field_num); int len = PQgetlength(args-get.result, args-get.tup_num, args-get.field_num) + 1; char **result = va_arg(args-ap, char **); *result = (char *) PQresultAlloc((PGresult *) args-get.result, len * sizeof(char)); memcpy(*result, val, len * sizeof(char)); result[len] = 0; return 0; } int main() { PGtext t = b; PGconn *conn = PQconnectdb(port=5492 host=localhost); PQinitTypes(conn); PGregisterType user_def[] = { {e, enum_put, enum_get} }; if(!PQregisterTypes(conn, PQT_USERDEFINED, user_def, 1, 0)) fprintf(stderr, *ERROR: %s\n, PQgeterror()); PGresult *res = PQexecf(conn, select %e, t); if(!res) fprintf(stderr, *ERROR: %s\n, PQgeterror()); if (!PQgetf(res, 0, %e, 0, t)) { fprintf(stderr, *ERROR: %s\n, PQgeterror()); } printf(%s\n, t); PQclear(res); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Sunday, September 09, 2012 1:37 PM Amit Kapila wrote: On Friday, September 07, 2012 11:19 PM Tom Lane wrote: Heikki Linnakangas hlinn...@iki.fi writes: Would socketpair(2) be simpler? I've not done anything yet about the potential security issues associated with untrusted libpq connection strings. I think this is still at the proof-of-concept stage; in particular, it's probably time to see if we can make it work on Windows before we worry more about that. I have started working on this patch to make it work on Windows. The 3 main things to make it work are: 1. Windows equivalent for socketpair - This as suggested previously in this thread earlier code of pgpipe can suffice the need. Infact I have checked on net as well, most implementations are similar to pgpipe implementation. So I prefered to use the existing code which was removed. 2. Windows equivalent for fork-execv - This part can be done by CreateProcess,it can be similar to internal_forkexec except for path where it uses shared memory to pass parameters, I am trying to directly pass parameters to CreateProcess. Directly passing parameters doesn't suffice for all parameters, as for socket we need to duplicate the socket using WSADuplicateSocket() which returns little big structure which is better to be passed via shared memory. 3. Windows equivalent for waitpid - Actually there can be 2 ways to accomplish this a. use waitforsingleobject with process handle, but in some places it is mentioned it might not work for all windows versions. Can someone pls confirm about. I shall try on my PC to test the same. b. use existing infrastructure of waitpid, however it is not for single process and it might need some changes to make it work for single process or may be we can use it directly. However currently it is in postmaster.c, so it need to be moved so that we can access it from fe-connect.c in libpq as well. c. suggest if you know of other ways to handle it or which from above 2 would be better? I have used method - a (waitforsingleobject) and it worked fine. With the above implementation, it is working on Windows. Now the work left is as follows: 1. Refactoring of code 2. Error handling in paths 3. Check if anything is missing and implement for same. 4. Test the patch for Windows. Any comments/suggestions? With Regards, Amit Kapila -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Sun, Sep 9, 2012 at 08:52:37PM +0200, Stefan Kaltenbrunner wrote: On 09/06/2012 12:13 AM, Peter Eisentraut wrote: On 8/29/12 11:52 PM, Andrew Dunstan wrote: Why does this need to be tied into the build farm? Someone can surely set up a script that just runs the docs build at every check-in, like it used to work. What's being proposed now just sounds like a lot of complication for little or no actual gain -- net loss in fact. It doesn't just build the docs. It makes the dist snapshots too. Thus making the turnaround time on a docs build even slower ... ? And the old script often broke badly, IIRC. The script broke on occasion, but the main problem was that it wasn't monitored. Which is something that could have been fixed. The current setup doesn't install anything if the build fails, which is a distinct improvement. You mean it doesn't build the docs if the code build fails? Would that really be an improvement? why would we want to publish docs for something that fails to build and/or fails to pass regression testing - to me code and the docs for it are a combined thing and there is no point in pushing docs for something that fails even basic testing... Most of the cases I care about are doc-only commits. Frankly, there is a 99.9% chance thta if it was committed, it compiles. We are only displaying the docs, so why not just test for the docs. It is this kind of run-around that caused me to generate my own doc build in the past; maybe I need to return to doing my own doc build. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
Excerpts from Bruce Momjian's message of lun sep 10 11:55:58 -0300 2012: On Sun, Sep 9, 2012 at 08:52:37PM +0200, Stefan Kaltenbrunner wrote: why would we want to publish docs for something that fails to build and/or fails to pass regression testing - to me code and the docs for it are a combined thing and there is no point in pushing docs for something that fails even basic testing... Most of the cases I care about are doc-only commits. Frankly, there is a 99.9% chance thta if it was committed, it compiles. We are only displaying the docs, so why not just test for the docs. I see no reason for a code failure to cause the docs not to be refreshed, if they still build. Many buildfarm failures are platform dependencies that the original developer did not notice. That doesn't mean that the code is utterly broken so much that docs suck and should not be published at all or we risk eternal embarrasment. Such failures tend to be short-lived anyway, and it's useful to be able to check that the docs are fine regardless of them. It is this kind of run-around that caused me to generate my own doc build in the past; maybe I need to return to doing my own doc build. You keep threatening with that. You are free, of course, to do anything you want, and no one will break sweat about it. I already said I will work on getting this up and running, but I can't give you a deadline for when it'll be working. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Sun, Sep 2, 2012 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Notably, while the lack of any background processes is just what you want for pg_upgrade and disaster recovery, an ordinary application is probably going to want to rely on autovacuum; and we need bgwriter and other background processes for best performance. So I'm speculating about having a postmaster process that isn't listening on any ports, but is managing background processes in addition to a single child backend. That's for another day though. Since we are forking a child process anyway, and potentially other auxiliary processes too, would it make sense to allow multiple backends too (allow multiple local applications connect to this instance)? I believe (I may be wrong) that embedded databases (SQLLite et. al.) use a library interface, in that the application makes a library call and waits for that API call to finish (unless, of course, the library supports async operations or the application uses threads). The implementation you are proposing uses socket communication, which lends itself very easily to client-server model, and if possible, it should be leveraged to provide for multiple applications talking to one local DB. I have this use case in mind: An application is running using this interface, and an admin now wishes to do some maintenance, or inspect something, so they can launch local pgAdmin using the same connection string as used by the original application. This will allow an admin to perform tuning, etc. without having to first shutdown the application. Here's how this might impact the design (I may very well be missing many other things, and I have no idea if this is implementable or not): .) Database starts when the first such application is launched. .) Database shuts down when the last such application disconnects. .) Postgres behaves much like a regular Postgres installation, except that it does not accept connections over TCP/IP or Unix Doamin Sockets. .) The above implies that we use regular Postmaster machinery, and not the --sinlgle machinery. .) Second and subsequent applications use the postmaster.pid (or something similar) to find an already running instance, and connect to it. .) There's a race condition where the second application is starting up, hoping to connect to an already running insatnce, but the first application disconnects (and hence shuts down the DB) before the second one can successfully connect. I haven't thought much about the security implications of this yet. Maybe the socket permissions would restrict an unauthorized user user from connecting to this instance. -- Gurjeet Singh http://gurjeet.singh.im/
Re: [HACKERS] Draft release notes complete
On Mon, Sep 10, 2012 at 12:06:18PM -0300, Alvaro Herrera wrote: It is this kind of run-around that caused me to generate my own doc build in the past; maybe I need to return to doing my own doc build. You keep threatening with that. You are free, of course, to do anything you want, and no one will break sweat about it. I already said I will work on getting this up and running, but I can't give you a deadline for when it'll be working. My point is that this frequent doc build feature was removed with no discussion, and adding it seems to be some herculean job that requires red tape only a government worker would love. I have already started working on updating my script for git --- should be done shortly, so you can remove my request. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Mon, Sep 10, 2012 at 11:12 AM, Gurjeet Singh singh.gurj...@gmail.com wrote: On Sun, Sep 2, 2012 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Notably, while the lack of any background processes is just what you want for pg_upgrade and disaster recovery, an ordinary application is probably going to want to rely on autovacuum; and we need bgwriter and other background processes for best performance. So I'm speculating about having a postmaster process that isn't listening on any ports, but is managing background processes in addition to a single child backend. That's for another day though. Since we are forking a child process anyway, and potentially other auxiliary processes too, would it make sense to allow multiple backends too (allow multiple local applications connect to this instance)? I believe (I may be wrong) that embedded databases (SQLLite et. al.) use a library interface, in that the application makes a library call and waits for that API call to finish (unless, of course, the library supports async operations or the application uses threads). The implementation you are proposing uses socket communication, which lends itself very easily to client-server model, and if possible, it should be leveraged to provide for multiple applications talking to one local DB. I have this use case in mind: An application is running using this interface, and an admin now wishes to do some maintenance, or inspect something, so they can launch local pgAdmin using the same connection string as used by the original application. This will allow an admin to perform tuning, etc. without having to first shutdown the application. Here's how this might impact the design (I may very well be missing many other things, and I have no idea if this is implementable or not): .) Database starts when the first such application is launched. .) Database shuts down when the last such application disconnects. .) Postgres behaves much like a regular Postgres installation, except that it does not accept connections over TCP/IP or Unix Doamin Sockets. .) The above implies that we use regular Postmaster machinery, and not the --sinlgle machinery. .) Second and subsequent applications use the postmaster.pid (or something similar) to find an already running instance, and connect to it. .) There's a race condition where the second application is starting up, hoping to connect to an already running insatnce, but the first application disconnects (and hence shuts down the DB) before the second one can successfully connect. I haven't thought much about the security implications of this yet. Maybe the socket permissions would restrict an unauthorized user user from connecting to this instance. That's kind of the reason why I suggested up thread tring to decouple the *starting* of the backend with the options to PQ connect... A Helper function in libpq could easily start the backend, and possibly return a conninfostring to give PQconnectdb... But if they are decoupled, I could easily envision an app that pauses it's use of the backend to allow some other libpq access to it for a period. You'd have to trust whatever else you let talk on the FD to the backend, but it might be useful... -- Aidan Van Dyk Create like a god, ai...@highrise.ca command like a king, http://www.highrise.ca/ work like a slave. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On 10.09.2012 18:12, Gurjeet Singh wrote: On Sun, Sep 2, 2012 at 8:23 PM, Tom Lanet...@sss.pgh.pa.us wrote: Notably, while the lack of any background processes is just what you want for pg_upgrade and disaster recovery, an ordinary application is probably going to want to rely on autovacuum; and we need bgwriter and other background processes for best performance. So I'm speculating about having a postmaster process that isn't listening on any ports, but is managing background processes in addition to a single child backend. That's for another day though. Since we are forking a child process anyway, and potentially other auxiliary processes too, would it make sense to allow multiple backends too (allow multiple local applications connect to this instance)? I believe (I may be wrong) that embedded databases (SQLLite et. al.) use a library interface, in that the application makes a library call and waits for that API call to finish (unless, of course, the library supports async operations or the application uses threads). The implementation you are proposing uses socket communication, which lends itself very easily to client-server model, and if possible, it should be leveraged to provide for multiple applications talking to one local DB. [scratches head] How's that different from the normal postmaster mode? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts
Jeff Davis pg...@j-davis.com wrote: This question comes about after reading the VLDB paper Serializable Snapshot Isolation in PostgreSQL. ... and I know Jeff read that quite closely because he raised a question off-list about an error he found in it which managed to survive the many editing and review passes that paper went through. :-) We release predicate locks after a transaction abort, but not after a subtransaction abort. The paper says that the reason is: We do not drop SIREAD locks acquired during a subtransaction if the subtransaction is aborted (i.e. all SIREAD locks belong to the top-level transaction). This is because data read during the subtransaction may have been reported to the user or otherwise externalized. (section 7.3). But that doesn't make sense to me, because that reasoning would also apply to top-level transactions that are aborted, but we release the SIREAD locks for those. In other words, this introduces an inconsistency between: BEGIN ISOLATION LEVEL SERIALIZABLE; SAVEPOINT s1; ... ROLLBACK TO s1; COMMIT; and: BEGIN ISOLATION LEVEL SERIALIZABLE; ... ROLLBACK; I'm not suggesting this is a correctness problem: holding SIREAD locks for longer never causes incorrect results. But it does seem a little inconsistent. For top-level transactions, I don't think it's possible to preserve SIREAD locks after an abort, because we rely on aborts to alleviate conflicts (and when using 2PC, we may need to abort a read-only transaction to correct the situation). So it seems like users must not rely on any answers they get from a transaction (or subtransaction) unless it commits. Does that make sense? I think the behavior is correct because a function's control flow might be directed by what it reads in a subtransaction, even if it rolls back -- and the transaction as a whole might leave the database in a different state based on that than if it had read different data (from a later snapshot). For example, if a plpgsql function has a BEGIN/EXCEPTION/END block, it might read something from the database and use what it reads to attempt some write. If that write fails and the EXCEPTION code writes something, then the database could be put into a state which is dependent on the data read in the subtransaction, even though that subtransaction is rolled back without the client ever directly seeing what was read. This strikes me as significantly different from returning some rows to a client application and then throwing an error for the transaction as a whole, because the client will certainly have an opportunity to see the failure (or at worst, see a broken connection before being notified of a successful commit). If so, I think we need a documentation update. The serializable isolation level docs don't quite make it clear that serializability only applies to transactions that commit. It might not be obvious to a user that there's a difference between commit and abort for a RO transaction. I think that, in S2PL, serializability applies even to aborted transactions (though I haven't spent much time thinking about it), so users accustomed to other truly-serializable implementations might be surprised. That's a fair point. Do you have any suggested wording, or suggestions for exactly where in the documentation you think it would be most helpful? The subsection on serializable transactions seems like the most obvious location: http://www.postgresql.org/docs/current/interactive/transaction-iso.html#XACT-SERIALIZABLE Does any other section seem like it needs work? -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Mon, Sep 10, 2012 at 11:43 AM, Heikki Linnakangas hlinn...@iki.fiwrote: On 10.09.2012 18:12, Gurjeet Singh wrote: On Sun, Sep 2, 2012 at 8:23 PM, Tom Lanet...@sss.pgh.pa.us wrote: Notably, while the lack of any background processes is just what you want for pg_upgrade and disaster recovery, an ordinary application is probably going to want to rely on autovacuum; and we need bgwriter and other background processes for best performance. So I'm speculating about having a postmaster process that isn't listening on any ports, but is managing background processes in addition to a single child backend. That's for another day though. Since we are forking a child process anyway, and potentially other auxiliary processes too, would it make sense to allow multiple backends too (allow multiple local applications connect to this instance)? I believe (I may be wrong) that embedded databases (SQLLite et. al.) use a library interface, in that the application makes a library call and waits for that API call to finish (unless, of course, the library supports async operations or the application uses threads). The implementation you are proposing uses socket communication, which lends itself very easily to client-server model, and if possible, it should be leveraged to provide for multiple applications talking to one local DB. [scratches head] How's that different from the normal postmaster mode? As I described in later paragraphs, it'd behave like an embedded database, like SQLite etc., so the database will startup and shutdown with the application, and provide other advantages we're currently trying to provide, like zero-maintenance. But it will not mandate that only one application talk to it at a time, and allow as many applications as it would in postmaster mode. So the database would be online as long as any application is connected to it, and it will shutdown when the last application disconnects. As being implemented right now, there's very little difference between --single and --child modes. I guess I am asking for a --child mode implementation that is closer to a postmaster than --single. Best regards, -- Gurjeet Singh http://gurjeet.singh.im/
Re: [HACKERS] pg_dump transaction's read-only mode
On Fri, Sep 7, 2012 at 6:06 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: That makes sense to me. The reason I didn't make that change when I added the serializable special case to pg_dump was that it seemed like a separate question; I didn't want to complicate an already big patch with unnecessary changes to non-serializable transactions. If we agree, should we change that now ? Thanks, Pavan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Gurjeet Singh singh.gurj...@gmail.com writes: On Mon, Sep 10, 2012 at 11:43 AM, Heikki Linnakangas hlinn...@iki.fiwrote: [scratches head] How's that different from the normal postmaster mode? As I described in later paragraphs, it'd behave like an embedded database, like SQLite etc., so the database will startup and shutdown with the application, and provide other advantages we're currently trying to provide, like zero-maintenance. But it will not mandate that only one application talk to it at a time, and allow as many applications as it would in postmaster mode. So the database would be online as long as any application is connected to it, and it will shutdown when the last application disconnects. I am having a hard time getting excited about that. To me it sounds like it's a regular postmaster, except with a response-time problem for connections that occur when there had been no active client before. The point of the proposal that I am making is to have a simple, low-maintenance solution for people who need a single-application database. A compromise somewhere in the middle isn't likely to be an improvement for anybody. For instance, if you want to have additional connections, you open up a whole collection of communication and authentication issues, which potential users of a single-application database don't want to cope with. As being implemented right now, there's very little difference between --single and --child modes. I guess I am asking for a --child mode implementation that is closer to a postmaster than --single. There are good reasons for wanting something that is closer to --single: pg_upgrade being one, and having a friendlier user interface for disaster recovery in --single mode being another. In these cases, you not only don't need the capability for additional applications to connect, it is actually important that it's impossible for them to do so. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
The point of the proposal that I am making is to have a simple, low-maintenance solution for people who need a single-application database. A compromise somewhere in the middle isn't likely to be an improvement for anybody. For instance, if you want to have additional connections, you open up a whole collection of communication and authentication issues, which potential users of a single-application database don't want to cope with. Yes, exactly. In fact, most of the folks who would want an embedded PostgreSQL either want no authentication at all, or only a single password. So supporting authentication options other than trust or md5 is not required, or desired AFAIK. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting plpython 2+3 builds better
On 9/10/12 9:26 AM, Alvaro Herrera wrote: I remember trying to do this for the mb/conversion_procs subdir years ago, to make them build in parallel to save some time. It didn't go anywhere but the basic idea seems similar in spirit. Maybe we can use this there too to make it fast. Parallel builds across subdirectories should work now, so that shouldn't be necessary anymore. Although removing some directory depth might be nice, but there is a general hesitation about renaming files. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
Josh Berkus j...@agliodbs.com wrote: In fact, most of the folks who would want an embedded PostgreSQL either want no authentication at all, or only a single password. So supporting authentication options other than trust or md5 is not required, or desired AFAIK. I don't know whether it's worth the trouble of doing so, but serializable transactions could skip all the SSI predicate locking and conflict checking when in single-connection mode. With only one connection the transactions could never overlap, so there would be no chance of serialization anomalies when running snapshot isolation. The reason I wonder whether it is worth the trouble is that it would only really matter if someone had code they wanted to run under both normal and single-connection modes. For single-connection only, they could just choose REPEATABLE READ to get exactly the same semantics. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Extend argument of OAT_POST_CREATE
The attached patch adds argument of OAT_POST_CREATE hook; to inform extensions type of the context of this object creation. It allows extensions to know whether the new object is indirectly created apart from user's operations, or not. I found out this flag is necessary to add feature to support selinux checks on ALTER statement (with reasonably simple code) during my investigation. A table has various kind of properties; some of them are inlined in pg_class but others are stored in extra catalogs such as pg_trigger, pg_constraint and so on. It might take an extra discussion whether trigger or constraint is an independent object or an attribute of table. But, anyway, the default permission checks table's ownership or ACLs when we create or drop them. I don't think sepgsql should establish its own object model here. So, I want sepgsql to check table's setattr permission when user create, drop or alter these objects. In case of index creation, here are two cases a) user's operation intend to create index, thus, checks permission of the table being indexed on b) index is indirectly created as a result of other operations like change of column's data type. Due to same reason why we don't check permissions for cleanup of temporary object, I don't want to apply checks on the later case. Right now, sepgsql determines the current context using command tag being saved at ProceddUtility_hook; to avoid permission checks on table creation due to CLUSTER command for example. But, it is not easy to apply this approach for the case of index creation because it can be defined as a part of ALTER TABLE which may have multiple sub-commands. So, I want OAT_POST_CREATE hook to inform the current context of the object creation; whether it is internal / indirect creation, or not. This patch includes hook enhancement and setattr permission checks on index creation / deletion. Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp sepgsql-v9.3-extend-post-create-hook.v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts
On Mon, 2012-09-10 at 11:15 -0500, Kevin Grittner wrote: ... and I know Jeff read that quite closely because he raised a question off-list about an error he found in it which managed to survive the many editing and review passes that paper went through. :-) Well, I need to keep up with the discussion on the interaction of temporal and SSI :-) I think the behavior is correct because a function's control flow might be directed by what it reads in a subtransaction, even if it rolls back -- and the transaction as a whole might leave the database in a different state based on that than if it had read different data (from a later snapshot). For example, if a plpgsql function has a BEGIN/EXCEPTION/END block, it might read something from the database and use what it reads to attempt some write. If that write fails and the EXCEPTION code writes something, then the database could be put into a state which is dependent on the data read in the subtransaction, even though that subtransaction is rolled back without the client ever directly seeing what was read. On reflection, I agree with that. Trying to puzzle through your transactions (and application logic) to see if you are depending on any information read in an aborted subtransaction is exactly the kind of thing SSI was meant to avoid. This strikes me as significantly different from returning some rows to a client application and then throwing an error for the transaction as a whole, because the client will certainly have an opportunity to see the failure (or at worst, see a broken connection before being notified of a successful commit). Oh, I see the distinction you're making: in PL/pgSQL, the exception mechanism involves *implicit* subtransaction rollbacks. That's more of a language issue, but a valid point. I'm still not sure I see a theoretical difference, but it does seem wise to keep predicate locks for aborted subtransactions. If so, I think we need a documentation update. The serializable isolation level docs don't quite make it clear that serializability only applies to transactions that commit. It might not be obvious to a user that there's a difference between commit and abort for a RO transaction. I think that, in S2PL, serializability applies even to aborted transactions (though I haven't spent much time thinking about it), so users accustomed to other truly-serializable implementations might be surprised. That's a fair point. Do you have any suggested wording... I'll write something up. Can I document that you may depend on the results read in aborted subtransactions, or should I leave that undefined for now? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts
Jeff Davis pg...@j-davis.com wrote: Oh, I see the distinction you're making: in PL/pgSQL, the exception mechanism involves *implicit* subtransaction rollbacks. That's more of a language issue, but a valid point. I think it holds for the general case of functions -- there's no reason to believe that you are aware of all subtransactions within a function or will know what was read by an aborted subtransaction within any function. It's pretty easy to describe in plpgsql, but I doubt the issue is specific to that language. I'm still not sure I see a theoretical difference, but it does seem wise to keep predicate locks for aborted subtransactions. I think that if it was guaranteed that application software was aware of all subtransactions and their completion states, there would still be a subtle issue as long as what was read in the subtransaction could in any way influence the behavior of subsequent steps in the enclosing transaction (or subtransaction). In essence, you have no reasonable way of knowing what the outer transaction would have done had it been able to see the work of a concurrent transaction, so you can't know whether the behavior of a set of transactions is the same as it would have been had they run one-at-a-time. A really stringent analysis of the logic of the code might be able to answer that for some cases (maybe even all cases?) but not at a reasonable cost. SSI admits that it might cause rollbacks in some cases where correctness doesn't require it, but it ensures that it will roll back enough transactions to ensure correctness and tries to do so at a reasonable cost. I'll write something up. Can I document that you may depend on the results read in aborted subtransactions, or should I leave that undefined for now? Hmm. They will be read with the correct snapshot, and since we're holding predicate locks they can't show any anomalies if the final transaction complete, so I sure can't see any reason it is a problem to depend on data viewed in an aborted subtransaction. If you think that is a property that could be useful to users, I guess it should be documented. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
I wrote: And the answer is ... it's a gmake bug. Apparently introduced in 3.82. http://savannah.gnu.org/bugs/?30653 https://bugzilla.redhat.com/show_bug.cgi?id=835424 So I think .NOTPARALLEL is just masking the true problem, but nonetheless it's a problem. And given that the bug report on savannah has been ignored for two years, we should not hold our breath for a fix to appear upstream (much less propagate to everyone using 3.82). So no sooner do I complain about that, than the upstream maintainers wake up and commit it: http://lists.gnu.org/archive/html/bug-make/2012-09/msg00016.html No idea when a fixed release might appear, but at least somebody who knows the gmake code has signed off on the fix now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] build farm machine using make -j 8 mixed results
On 09/10/2012 02:44 PM, Tom Lane wrote: I wrote: And the answer is ... it's a gmake bug. Apparently introduced in 3.82. http://savannah.gnu.org/bugs/?30653 https://bugzilla.redhat.com/show_bug.cgi?id=835424 So I think .NOTPARALLEL is just masking the true problem, but nonetheless it's a problem. And given that the bug report on savannah has been ignored for two years, we should not hold our breath for a fix to appear upstream (much less propagate to everyone using 3.82). So no sooner do I complain about that, than the upstream maintainers wake up and commit it: http://lists.gnu.org/archive/html/bug-make/2012-09/msg00016.html No idea when a fixed release might appear, but at least somebody who knows the gmake code has signed off on the fix now. When it does appear in a release I guess we could make the .NOTPARALLEL conditional on make version. If not, we'll have to wait a long time before removing it. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Mon, Sep 10, 2012 at 9:59 AM, Josh Berkus j...@agliodbs.com wrote: The point of the proposal that I am making is to have a simple, low-maintenance solution for people who need a single-application database. A compromise somewhere in the middle isn't likely to be an improvement for anybody. For instance, if you want to have additional connections, you open up a whole collection of communication and authentication issues, which potential users of a single-application database don't want to cope with. Yes, exactly. In fact, most of the folks who would want an embedded PostgreSQL either want no authentication at all, or only a single password. So supporting authentication options other than trust or md5 is not required, or desired AFAIK. I agree people who want embedded postgres probably want no authentication. For the sake of test/local development use cases, I still question the usefulness of a quasi-embeddable mode that does not support multiple executors as SQLite does -- it's pretty murderous for the testing use case for a number of people if they intend to have parity with their production environment. I could see this being useful for, say, iOS (although there is a definite chance that one will want to use multiple threads/simultaneous xacts in applications), or a network router, except that pg_xlog and the catalog are rather enormous for those use cases. So while the embedded use case is really appealing in general, I'm still intuitively quite skeptical of the omission of multi-executor support, especially considering that people have gotten used to being able to that with the incredibly popular SQLite. Could EXEC_BACKEND be repurposed -- even on *nix-- to make this work someday? -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ossp-uuid Contrib Patch
Hackers, I ran into an issue building 9.2 with the OSSP UUID module today. A bit of Googling and I found that the MacPorts guys ran into the same issue a few weeks ago. Their discussion: https://trac.macports.org/ticket/35153 And the fix: https://trac.macports.org/browser/trunk/dports/databases/postgresql91/files/postgresql-uuid-ossp.patch?rev=96142 So should we do this in core, too? Oh, I see Tom already commented on it here: http://archives.postgresql.org/pgsql-general/2012-07/msg00656.php I had installed 9.2rc1 a few weeks ago, but since then I upgraded to Mountain Lion. I suspect that is what mucked up the unistd.h ordering issue. The patch: diff --git a/contrib/uuid-ossp/uuid-ossp.c b/contrib/uuid-ossp/uuid-ossp.c index d4fc62b..62b28ca 100644 --- a/contrib/uuid-ossp/uuid-ossp.c +++ b/contrib/uuid-ossp/uuid-ossp.c @@ -9,6 +9,7 @@ *- */ +#define _XOPEN_SOURCE #include postgres.h #include fmgr.h #include utils/builtins.h Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ossp-uuid Contrib Patch
David E. Wheeler da...@justatheory.com writes: I ran into an issue building 9.2 with the OSSP UUID module today. A bit of Googling and I found that the MacPorts guys ran into the same issue a few weeks ago. Their discussion: The long and the short of it is that the OSSP guys need to fix their code. I'm not excited about kluges like +#define _XOPEN_SOURCE which might band-aid around their mistake, but at what price? We have no idea what side-effects that will have. It would not be unlikely for that to result in an ossp-uuid.so that is subtly incompatible with the core backend. (We've seen such effects in the past, though I'm too lazy to trawl the archives for examples right now.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ossp-uuid Contrib Patch
On Sep 10, 2012, at 4:16 PM, Tom Lane t...@sss.pgh.pa.us wrote: The long and the short of it is that the OSSP guys need to fix their code. I'm not excited about kluges like +#define _XOPEN_SOURCE which might band-aid around their mistake, but at what price? We have no idea what side-effects that will have. It would not be unlikely for that to result in an ossp-uuid.so that is subtly incompatible with the core backend. (We've seen such effects in the past, though I'm too lazy to trawl the archives for examples right now.) Well given that OSSP seems to be abandon ware (no activity since July 2008), it might be time to dump it in favor of something else. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ossp-uuid Contrib Patch
David E. Wheeler da...@justatheory.com writes: Well given that OSSP seems to be abandon ware (no activity since July 2008), it might be time to dump it in favor of something else. Yeah, maybe. It doesn't even seem to be the standard implementation on Linux or Mac. A bit of research says that Theodore Ts'o's libuuid is what comes native with the OS on those platforms. No idea whether the functionality is equivalent, though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ossp-uuid Contrib Patch
On Mon, 2012-09-10 at 16:23 -0700, David E. Wheeler wrote: Well given that OSSP seems to be abandon ware (no activity since July 2008), it might be time to dump it in favor of something else. Are there any outstanding issues that would require an update? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ossp-uuid Contrib Patch
On Mon, 2012-09-10 at 20:15 -0400, Tom Lane wrote: David E. Wheeler da...@justatheory.com writes: Well given that OSSP seems to be abandon ware (no activity since July 2008), it might be time to dump it in favor of something else. Yeah, maybe. It doesn't even seem to be the standard implementation on Linux or Mac. A bit of research says that Theodore Ts'o's libuuid is what comes native with the OS on those platforms. No idea whether the functionality is equivalent, though. They have different interfaces that would also affect the exposed SQL interfaces. We could provide two different extensions, wrapping each library. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts
On Sat, Sep 08, 2012 at 11:34:56AM -0700, Jeff Davis wrote: If so, I think we need a documentation update. The serializable isolation level docs don't quite make it clear that serializability only applies to transactions that commit. It might not be obvious to a user that there's a difference between commit and abort for a RO transaction. I think that, in S2PL, serializability applies even to aborted transactions (though I haven't spent much time thinking about it), so users accustomed to other truly-serializable implementations might be Yes, I agree that this is probably worth mentioning in the documentation. It might be worth noting that serializable mode will not cause read-only transactions to fail to commit (as might be possible in some optimistic concurrency control systems). However, it might require other transactions to be aborted to ensure serializability. If the user aborts the read-only transaction, that won't necessarily happen. Figure 2 of the aforementioned paper is actually a nice example of this. The read-only transaction T1 is allowed to commit, but as a result T2 has to be aborted. If T1 had ABORTed instead of COMMIT, T2 would be allowed to proceed. Dan -- Dan R. K. PortsUW CSEhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Draft release notes complete
On Mon, Sep 10, 2012 at 11:19:00AM -0400, Bruce Momjian wrote: On Mon, Sep 10, 2012 at 12:06:18PM -0300, Alvaro Herrera wrote: It is this kind of run-around that caused me to generate my own doc build in the past; maybe I need to return to doing my own doc build. You keep threatening with that. You are free, of course, to do anything you want, and no one will break sweat about it. I already said I will work on getting this up and running, but I can't give you a deadline for when it'll be working. My point is that this frequent doc build feature was removed with no discussion, and adding it seems to be some herculean job that requires red tape only a government worker would love. I have already started working on updating my script for git --- should be done shortly, so you can remove my request. Here is my documentation build: http://momjian.postgresql.org/pgsql_docs/ It is updated every five minutes. (It checks git every 4 minutes, and the build takes 41 seconds.) -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 64-bit API for large object
Ok, here is the patch to implement 64-bit API for large object, to allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to 32KB). The patch is based on Jeremy Drake's patch posted on September 23, 2005 (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php) and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai for the backend part and Yugo Nagata for the rest(including documentation patch). Here are changes made in the patch: 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata) lo_initialize() gathers backend 64-bit large object handling function's oid, namely lo_lseek64, lo_tell64, lo_truncate64. If client calls lo_*64 functions and backend does not support them, lo_*64 functions return error to caller. There might be an argument since calls to lo_*64 functions can automatically be redirected to 32-bit older API. I don't know this is worth the trouble though. Currently lo_initialize() throws an error if one of oids are not available. I doubt we do the same way for 64-bit functions since this will make 9.3 libpq unable to access large objects stored in pre-9.2 PostgreSQL servers. To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr is a pointer to 64-bit integer and actual data is placed somewhere else. There might be other way: add new member to union u to store 64-bit integer: typedef struct { int len; int isint; union { int*ptr;/* can't use void (dec compiler barfs) */ int integer; int64 bigint; /* 64-bit integer */ } u; } PQArgBlock; I'm a little bit worried about this way because PQArgBlock is a public interface. Also we add new type pg_int64: #ifndef NO_PG_INT64 #define HAVE_PG_INT64 1 typedef long long int pg_int64; #endif in postgres_ext.h per suggestion from Tom Lane: http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai) Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle 64-bit seek position and data length. loread64 and lowrite64 are not added because if a program tries to read/write more than 2GB at once, it would be a sign that the program need to be re-designed anyway. 3) Backend inv_api.c functions(Nozomi Anzai) No need to add new functions. Just extend them to handle 64-bit data. BTW , what will happen if older 32-bit libpq accesses large objects over 2GB? lo_read and lo_write: they can read or write lobjs using 32-bit API as long as requested read/write data length is smaller than 2GB. So I think we can safely allow them to access over 2GB lobjs. lo_lseek: again as long as requested offset is smaller than 2GB, there would be no problem. lo_tell:if current seek position is beyond 2GB, returns an error. 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata) Comments and suggestions are welcome. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp lobj64.patch.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Question about SSI, subxacts, and aborted read-only xacts
On Mon, 2012-09-10 at 21:59 -0400, Dan Ports wrote: It might be worth noting that serializable mode will not cause read-only transactions to fail to commit For the archives, and for those not following the paper in detail, there is one situation in which SSI will abort a read-only transaction. When there are three transactions forming a dangerous pattern where T1 (read-only) has a conflict out to T2, and T2 has a conflict out to T3; and T3 is committed and T2 is prepared (for two-phase commit). In that situation, SSI can't roll back the committed or prepared transactions, so it must roll back the read-only transaction (T1). Even in that case, SSI will ordinarily prevent T2 from preparing. It's only if T1 takes its snapshot after T2 prepares and before T2 commits that the situation can happen (I think). Fortunately, for two-phase commit, that's not a big problem because the window between PREPARE TRANSACTION and COMMIT PREPARED is supposed to be narrow (and if it's not, you have bigger problems anyway). As long as the window is narrow, than it's reasonable to retry the transaction T1, and expect it to succeed after a short interval. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] prefetching and asynchronous io
On 08/18/2012 10:11 AM, John Lumby wrote: I've recently tried extending the postgresql prefetch mechanism on linux to use the posix (i.e. librt) aio_read and friends where possible. In other words, in PrefetchBuffer(), try getting a buffer and issuing aio_read before falling back to fposix_advise(). It gives me about 8% improvement in throughput relative to the fposix-advise variety, for a workload of 16 highly-disk-read-intensive applications running to 16 backends. For my test each application runs a query chosen to have plenty of bitmap heap scans. I can provide more details on my changes if interested. On whether this technique might improve sort performance : First, the disk access pattern for sorting is mostly sequential (although I think the sort module does some tricky work with reuse of pages in its logtape files which maybe is random-like), and there are several claims on the net that linux buffered file handling already does a pretty good job of read-ahead for a sequential access pattern without any need for the application to help it. I can half-confirm that in that I tried adding calls to PrefetchBuffer in regular heap scan and did not see much improvement.But I am still pursuing that area. But second, it would be easy enough to add some fposix_advise calls to sort and see whether that helps.(Can't make use of PrefetchBuffer since sort does not use the regular relation buffer pool) I have also added prefetching calls to regular index scans (non-bitmap, non-index-only, for btree only) and see a 25% reduction in total elapsed time for a heavy index-scan workload. That 25% is with just the basic posix_fadvise, and then extending that with asynch io gives a small extra improvement again. John -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers