Re: [HACKERS] jsonb and nested hstore
On 02/05/2014 06:48 PM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: On 02/05/2014 11:40 AM, Tom Lane wrote: switching to binary is the same as text may well be the most prudent path here. If we do that we're going to have to live with that forever, aren't we? Yeah, but the other side of that coin is that we'll have to live forever with whatever binary format we pick, too. If it turns out to be badly designed, that could be much worse than eating some parsing costs during dump/restore. The fastest and lowest parsing cost format for JSON is tnetstrings http://tnetstrings.org/ why not use it as the binary wire format ? It would be as binary as it gets and still be generally parse-able by lots of different platforms, at leas by all of these we care about. Cheers Hannu -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Hi, On 2014-02-06 18:47:31 -0500, Andrew Dunstan wrote: * switching to using text representation in jsonb send/recv +/* + * jsonb type recv function + * + * the type is sent as text in binary mode, so this is almost the same + * as the input function. + */ +Datum +jsonb_recv(PG_FUNCTION_ARGS) +{ + StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + text *result = cstring_to_text_with_len(buf-data, buf-len); + + return deserialize_json_text(result); +} +/* + * jsonb type send function + * + * Just send jsonb as a string of text + */ +Datum +jsonb_send(PG_FUNCTION_ARGS) +{ + Jsonb *jb = PG_GETARG_JSONB(0); + StringInfoData buf; + char *out; + + out = JsonbToCString(NULL, (JB_ISEMPTY(jb)) ? NULL : VARDATA(jb), VARSIZE(jb)); + + pq_begintypsend(buf); + pq_sendtext(buf, out, strlen(out)); + PG_RETURN_BYTEA_P(pq_endtypsend(buf)); +} I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/10/2014 11:05 AM, Andres Freund wrote: Hi, On 2014-02-06 18:47:31 -0500, Andrew Dunstan wrote: * switching to using text representation in jsonb send/recv +/* + * jsonb type recv function + * + * the type is sent as text in binary mode, so this is almost the same + * as the input function. + */ +Datum +jsonb_recv(PG_FUNCTION_ARGS) +{ +StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); +text *result = cstring_to_text_with_len(buf-data, buf-len); + +return deserialize_json_text(result); +} +/* + * jsonb type send function + * + * Just send jsonb as a string of text + */ +Datum +jsonb_send(PG_FUNCTION_ARGS) +{ +Jsonb *jb = PG_GETARG_JSONB(0); +StringInfoData buf; +char *out; + +out = JsonbToCString(NULL, (JB_ISEMPTY(jb)) ? NULL : VARDATA(jb), VARSIZE(jb)); + +pq_begintypsend(buf); +pq_sendtext(buf, out, strlen(out)); +PG_RETURN_BYTEA_P(pq_endtypsend(buf)); +} I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. +10 Especially as this is one type where we may want add type-specific compression options at some point Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PoC: Partial sort
On Sun, Feb 9, 2014 at 7:37 PM, Alexander Korotkov aekorot...@gmail.com wrote: This is not only place that worry me about planning overhead. See get_cheapest_fractional_path_for_pathkeys. I had to estimate number of groups for each sorting column in order to get right fractional path. AFAICT this only happens once per plan and the overhead is O(n) to the number of pathkeys? I can't get worried about that, but I guess it's better to test anyway. PS: You didn't answer my questions about splitting the patch. I guess I'll have to do that anyway to run the tests. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Breaking compile-time dependency cycles of Postgres subdirs?
On Sun, Feb 9, 2014 at 8:06 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 7, 2014 at 7:39 AM, Christian Convey christian.con...@gmail.com wrote: This question is mostly just curiosity... As someone very new to this code base, I think these cycles make it a little harder to figure out the runtime and compile-time dependencies between the subsystems these directories seem to represent. I wonder if that's a problem others face as well? There are probably some cases that could be improved, but I have my doubts about whether eliminating cycles is a reasonable goal. Sometimes, two modules really do depend on each other. And, you're talking about this not just on the level of individual files but entire subtrees. There are 90,000 lines of code in src/backend/access (whose headers are in src/include/access) and more than 38,000 in src/backend/storage (whose headers are in src/include/storage); expecting all dependencies between those modules to go in one direction doesn't feel terribly reasonable. If it could be done at all, you'd probably end up separating code into lots of little tiny directories, splitting apart modules with logically related functionality into chunks living in entirely different parts of the source tree - and I don't think that would be an improvement. Thanks Robert. IMHO, whether or not it would be beneficial depends on which files (or definitions within files) had to be broken out into additional subdirectories in order to break the cycles. If it could be accomplished with at most a few additional subdirectories that were also intuitively meaningful groupings of files/definitions, it could be a win. But if not, I agree it would be a step backwards. Still, I'm thinking this might be a problem we need to partially solve if we're going to support a pluggable storage manager, particularly if we allow a pluggable storage manager to use the system's buffer system and/or block I/O system. I guess it depends on exactly what we want from a pluggable storage manager. - Christian
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On 02/07/2014 01:27 PM, Peter Geoghegan wrote: On Thu, Jan 16, 2014 at 12:35 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: I think you should consider breaking off the relcache parts of my patch and committing them, because they're independently useful. Makes sense. Can you extract that into a separate patch, please? Perhaps you can take a look at this again, when you get a chance. The relcache parts? I don't think a separate patch ever appeared that could be reviewed. Looking again at the last emails in this whole thread, I don't have anything to add. At this point, I think it's pretty clear this won't make it into 9.4, so I'm going to mark this as returned with feedback. If someone else thinks this still has a chance and is willing to review this and beat it into shape, please resurrect it quickly. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] inherit support for foreign tables
(2014/02/07 21:31), Etsuro Fujita wrote: So, I've modified the patch so that we continue to disallow SET STORAGE on a foreign table *in the same manner as before*, but, as your patch does, allow it on an inheritance hierarchy that contains foreign tables, with the semantics that we quietly ignore the foreign tables and apply the operation to the plain tables, by modifying the ALTER TABLE simple recursion mechanism. Attached is the updated version of the patch. I'll continue the patch review a bit more, though the patch looks good generally to me except for the abobe issue and the way that the ANALYZE command works. While reviwing the patch, I've found some issues on interactions with other commands, other than the SET STORAGE command. * ADD table_constraint NOT VALID: the patch allows ADD table_constraint *NOT VALID* to be set on a foreign table as well as an inheritance hierarchy that contains foreign tables. But I think it would be better to disallow ADD table_constraint *NOT VALID* on a foreign table, but allow it on an inheritance hierarchy that contains foreign tables, with the semantics that we apply the operation to the plain tables and apply the transformed operation *ADD table_constraint* to the foreign tables. * VALIDATE CONSTRAINT constraint_name: though the patch disallow the direct operation on foreign tables, it allows the operation on an inheritance hierarchy that contains foreign tables, and the operation fails if there is at least one foreign table that has at least one NOT VALID constraint. I think it would be better to modify the patch so that we allow the operation on an inheritance hierarchy that contains foreign tables, with the semantics that we quietly ignore the foreign tables and apply the operation on the plain tables just like the semantics of SET STORAGE. (Note that the foreign tables wouldn't have NOT VALID constraints by diallowing ADD table_constraint *NOT VALID* on foreign tables as mentioned above.) * SET WITH OIDS: though the patch disallow the direct operation on foreign tables, it allows the operation on an inheritance hierarchy that contains foreign tables, and that operation is successfully done on foreign tables. I think it would be better to modify the patch so that we allow it on an inheritance hierarchy that contains foreign tables, with the semantics that we quietly ignore the foreign tables and apply the operation to the plain tables just like the semantics of SET STORAGE. Comments are wellcome! Thanks, Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/10/2014 05:05 AM, Andres Freund wrote: Hi, On 2014-02-06 18:47:31 -0500, Andrew Dunstan wrote: * switching to using text representation in jsonb send/recv +/* + * jsonb type recv function + * + * the type is sent as text in binary mode, so this is almost the same + * as the input function. + */ +Datum +jsonb_recv(PG_FUNCTION_ARGS) +{ + StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + text *result = cstring_to_text_with_len(buf-data, buf-len); + + return deserialize_json_text(result); +} +/* + * jsonb type send function + * + * Just send jsonb as a string of text + */ +Datum +jsonb_send(PG_FUNCTION_ARGS) +{ + Jsonb *jb = PG_GETARG_JSONB(0); + StringInfoData buf; + char *out; + + out = JsonbToCString(NULL, (JB_ISEMPTY(jb)) ? NULL : VARDATA(jb), VARSIZE(jb)); + + pq_begintypsend(buf); + pq_sendtext(buf, out, strlen(out)); + PG_RETURN_BYTEA_P(pq_endtypsend(buf)); +} I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/10/2014 07:39 AM, Andres Freund wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. OK, works for me. I'm tied up for a couple of days, will do this when I'm back on deck. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Wait free LW_SHARED acquisition - v0.2
On 01/31/2014 11:54 AM, Andres Freund wrote: Hi, On 2014-01-28 21:27:29 -0800, Peter Geoghegan wrote: On Fri, Nov 15, 2013 at 11:47 AM, Andres Freund and...@2ndquadrant.com wrote: 1) I've added an abstracted atomic ops implementation. Needs a fair amount of work, also submitted as a separate CF entry. (Patch 1 2) Commit 220b34331f77effdb46798ddd7cca0cffc1b2858 caused bitrot when applying 0002-Very-basic-atomic-ops-implementation.patch. Please rebase. I've pushed a rebased version of the patchset to http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git branch rwlock contention. 220b34331f77effdb46798ddd7cca0cffc1b2858 actually was the small problem, ea9df812d8502fff74e7bc37d61bdc7d66d77a7f was the major PITA. I plan to split the atomics patch into smaller chunks before reposting. Imo the Convert the PGPROC-lwWaitLink list into a dlist instead of open coding it. is worth being applied independently from the rest of the series, it simplies code and it fixes a bug... I committed a fix for the WakeupWaiters() bug now, without the rest of the open coding patch. Converting lwWaitLInk into a dlist is probably a good idea, but seems better to fix the bug separately, for the sake of git history if nothing else. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
(2014/02/09 8:06), Andrew Dunstan wrote: On 02/08/2014 05:34 PM, Tom Lane wrote: Hiroshi Inoue in...@tpf.co.jp writes: Though I'm not a MINGW expert at all, I know dllwrap is a deprecated tool and dlltool is almost a deprecated tool. Cygwin port is removing the use of dllwrap and dlltool now. Isn't it better for MINGW port to follow it? Only way to make that happen is to prepare and test a patch ... Yeah. Incidentally, we didn't quite get rid of dlltool for Cygwin. We did get rid of dllwrap. But I agree this is worth trying for Mingw. I tried MINGW port with the attached change and successfully built src and contrib and all pararell regression tests were OK. regards, Hiroshi Inoue diff --git a/src/Makefile.shlib b/src/Makefile.shlib index a95e4d6..367f585 100644 --- a/src/Makefile.shlib +++ b/src/Makefile.shlib @@ -371,7 +371,20 @@ $(stlib): $(OBJS) | $(SHLIB_PREREQS) $(RANLIB) $@ +else #cygwin +ifeq ($(PORTNAME), win32) +ifeq (,$(SHLIB_EXPORTS)) +$(shlib) $(stlib): $(OBJS) | $(SHLIB_PREREQS) + $(CC) $(CFLAGS) -shared -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) -Wl,--out-implib=$(stlib) -Wl,--export-all-symbols + else +DLL_DEFFILE = lib$(NAME)dll.def +$(shlib) $(stlib): $(OBJS) $(DLL_DEFFILE) | $(SHLIB_PREREQS) + $(CC) $(CFLAGS) -shared -o $@ $(OBJS) $(LDFLAGS) $(LDFLAGS_SL) $(SHLIB_LINK) $(LIBS) $(DLL_DEFFILE) -Wl,--out-implib=$(stlib) +endif + + +else #win32 ifeq (,$(SHLIB_EXPORTS)) DLL_DEFFILE = lib$(NAME)dll.def exports_file = $(DLL_DEFFILE) @@ -388,6 +401,7 @@ $(shlib): $(OBJS) $(DLL_DEFFILE) | $(SHLIB_PREREQS) $(stlib): $(shlib) $(DLL_DEFFILE) | $(SHLIB_PREREQS) $(DLLTOOL) --dllname $(shlib) $(DLLTOOL_LIBFLAGS) --def $(DLL_DEFFILE) --output-lib $@ +endif # PORTNAME == win32 endif # PORTNAME == cgywin endif # PORTNAME == cygwin || PORTNAME == win32 diff --git a/src/backend/Makefile b/src/backend/Makefile index 356890d..41a4e41 100644 --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -80,18 +80,8 @@ endif # cygwin ifeq ($(PORTNAME), win32) LIBS += -lsecur32 -postgres: $(OBJS) postgres.def libpostgres.a $(WIN32RES) - $(DLLTOOL) --dllname $@$(X) --output-exp $@.exp --def postgres.def - $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) -o $@$(X) -Wl,--base-file,$@.base $@.exp $(call expand_subsys,$(OBJS)) $(WIN32RES) $(LIBS) - $(DLLTOOL) --dllname $@$(X) --base-file $@.base --output-exp $@.exp --def postgres.def - $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) -Wl,--stack=$(WIN32_STACK_RLIMIT) -o $@$(X) $@.exp $(call expand_subsys,$(OBJS)) $(WIN32RES) $(LIBS) - rm -f $@.exp $@.base - -postgres.def: $(OBJS) - $(DLLTOOL) --export-all --output-def $@ $(call expand_subsys,$^) - -libpostgres.a: postgres.def - $(DLLTOOL) --dllname postgres.exe --def postgres.def --output-lib $@ +postgres: $(OBJS) $(WIN32RES) + $(CC) $(CFLAGS) $(LDFLAGS) $(LDFLAGS_EX) -Wl,--stack=$(WIN32_STACK_RLIMIT) -o $@$(X) -Wl,--out-implib=libpostgres.a -Wl,--export-all-symbols $(call expand_subsys,$(OBJS)) $(WIN32RES) $(LIBS) endif # win32 diff --git a/src/makefiles/Makefile.win32 b/src/makefiles/Makefile.win32 index 1aae9e9..b18621b 100644 --- a/src/makefiles/Makefile.win32 +++ b/src/makefiles/Makefile.win32 @@ -72,6 +72,4 @@ win32ver.o: win32ver.rc # Rule for building a shared library from a single .o file %.dll: %.o - $(DLLTOOL) --export-all --output-def $*.def $ - $(DLLWRAP) -o $@ --def $*.def $ $(LDFLAGS) $(LDFLAGS_SL) $(BE_DLLLIBS) - rm -f $*.def + $(CC) $(CFLAGS) -shared -o $@ $ -Wl,--export-all-symbols $(LDFLAGS) $(LDFLAGS_SL) $(BE_DLLLIBS) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Hi, During the lwlock scalability work I noticed a longstanding issue with the lwlock code. LWLockRelease() and the other mentioned locations do the following to wake up any waiters, without holding the lock's spinlock: /* * Awaken any waiters I removed from the queue. */ while (head != NULL) { LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } which means they manipulate the lwWaitLink queue without protection. That's done intentionally. The code tries to protect against corruption of the list to do a woken up backend acquiring a lock (this or an independent one) by only continuing when the lwWaiting flag is set to false. Unfortunately there's absolutely no guarantee that a) the assignment to lwWaitLink and lwWaiting are done in that order b) that the stores are done in-order from the POV of other backends. So what we need to do is to acquire a write barrier between the assignments to lwWaitLink and lwWaiting, i.e. proc-lwWaitLink = NULL; pg_write_barrier(); proc-lwWaiting = false; the reader side already uses an implicit barrier by using spinlocks. I've fixed this as part 1 of the lwlock scalability work in [1], but Heikki rightfully suggested that a) this should be backpatched b) done in a separate commit. There is the question what to do about the branches without barriers? I guess a SpinLockAcquire()/Release() would do? Or do we decide it's not important enough to matter, since it's not an issue on x86? [1] http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=commitdiff;h=2de11eb11bb3e3777f6d384de0af9c2f77960637 Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] GiST, getting the record in table that the leaf entry points to
Hello, What I would like to do is to get the record in the table that a leaf GISTENTRY points to, if that is possible. I notice that GISTENTRY contains these members: Relation rel, Page page, and OffsetNumber offset, but are these referring to the table or the index? Thank you, Marios Vodas
Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation
On Thu, Feb 6, 2014 at 5:57 PM, Amit Kapila amit.kapil...@gmail.com wrote: On Thu, Feb 6, 2014 at 9:13 AM, Amit Kapila amit.kapil...@gmail.com wrote: Considering above change as correct, I have tried to see the worst case overhead for this patch by having new tuple such that after 25% or so of suffix/prefix match, there is a small change in tuple and kept rest of tuple same as old tuple and it shows overhead for this patch as well. Updated test script is attached. Unpatched testname | wal_generated | duration --+---+-- ten long fields, 8 bytes changed | 348843824 | 5.56866788864136 ten long fields, 8 bytes changed | 348844800 | 5.84434294700623 ten long fields, 8 bytes changed | 35050 | 5.92329406738281 (3 rows) wal-update-prefix-suffix-encode-1.patch testname | wal_generated | duration --+---+-- ten long fields, 8 bytes changed | 348845624 | 6.92243480682373 ten long fields, 8 bytes changed | 348847000 | 8.35828399658203 ten long fields, 8 bytes changed | 350204752 | 7.61826491355896 (3 rows) One minor point, can we avoid having prefix tag if prefixlen is 0. + /* output prefix as a tag */ + pgrb_out_tag(ctrlp, ctrlb, ctrl, bp, prefixlen, hlen); I think generating out tag for suffix/prefix has one bug i.e it doesn't consider the max length of 273 bytes (PGLZ_MAX_MATCH ) which is mandatory for LZ format. One more point about this patch is that in function pgrb_delta_encode(), is it mandatory to return at end in below check: if (result_size result_max) return false; I mean to say as before starting for copying literal bytes we have already ensured that the compressed data is greater than 25%, so may be we can avoid this check. I have tried to take the data by removing this check and found that it reduces overhead and improves WAL reduction as well. The data is as below (compare this with data in above mail for unpatched version data): testname | wal_generated | duration --+---+-- ten long fields, 8 bytes changed | 300705552 | 6.51416897773743 ten long fields, 8 bytes changed | 300703816 | 6.85267090797424 ten long fields, 8 bytes changed | 300701840 | 7.15832996368408 (3 rows) If we want to go with this approach, then I think apart from above points there is no major change required (may be some comments, function names etc. can be improved). But if we really just want to do prefix/suffix compression, this is a crappy and expensive way to do it. We needn't force everything through the pglz tag format just because we elide a common prefix or suffix. Here are you bothered about below code where the patch is doing byte-by-byte copy after prefix/suffix match? /* output bytes between prefix and suffix as literals */ dp = source[prefixlen]; dend = source[slen - suffixlen]; while (dp dend) { pgrb_out_literal(ctrlp, ctrlb, ctrl, bp, *dp); dp++; /* Do not do this ++ in the line above! */ } I think if we want to change LZ format, it will be bit more work and verification for decoding has to be done much more strenuously. Note - During performance test, I have focussed mainly on worst case, because we already know that this idea is good for best and average cases. However if we decide that this is better and good to proceed, I can take the data for other cases as well. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Craig Ringer cr...@2ndquadrant.com writes: On 02/06/2014 01:48 AM, Tom Lane wrote: switching to binary is the same as text may well be the most prudent path here. Can't we just reject attempts to transfer these via binary copy, allowing only a text format? So rather than sending text when the binary is requested, we just require clients to use text for this type. That used to be the case, back when we didn't have send/recv functions for all built-in types; and client-code authors complained bitterly about it. It's pretty much unworkable if the text/binary choice is being made by a code level that doesn't have complete understanding of the queries it's transmitting. Consider SELECT * FROM ...; how are you going to know which columns to request in binary and which in text? Even if you're willing to do trial and error (ie, it's okay to cause transaction rollbacks), the backend isn't very helpful about telling you exactly which column(s) would need to be requested as text. I think the downthread solution of prepending a type-specific format ID byte is a better answer for giving us flexibility down the road. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Breaking compile-time dependency cycles of Postgres subdirs?
Robert Haas robertmh...@gmail.com writes: On Fri, Feb 7, 2014 at 7:39 AM, Christian Convey christian.con...@gmail.com wrote: As someone very new to this code base, I think these cycles make it a little harder to figure out the runtime and compile-time dependencies between the subsystems these directories seem to represent. I wonder if that's a problem others face as well? There are probably some cases that could be improved, but I have my doubts about whether eliminating cycles is a reasonable goal. Aside from Robert's points, I have a couple of thoughts: I think if it had been a clear, enforced goal all along, it might've been possible to build the system with such a restriction (for the most part at least). At this point though, the amount of work and code churn involved seems like it'd far exceed the benefits. It's also fair to question how much improvement in comprehensibility we'd really get. It's not like code's been dropped into completely random places where it doesn't belong. In the end, Postgres is a pretty big system and it's necessarily going to take time for newbies to learn their way around it. I believe there are some cases where circularity is just about unavoidable. As an example, the error reporting code in elog.c depends on memory management in mcxt.c, which itself uses elog.c's reporting facilities. There's another mutual dependency between error reporting and GUC (server configuration control). And on and on. I think the coding rule you're suggesting would require that each such dependency loop be confined to one major backend subsystem, which seems rather arbitrary. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Breaking compile-time dependency cycles of Postgres subdirs?
On Mon, Feb 10, 2014 at 10:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: I think if it had been a clear, enforced goal all along, it might've been possible to build the system with such a restriction (for the most part at least). At this point though, the amount of work and code churn involved seems like it'd far exceed the benefits. That makes sense to me. I certainly didn't think it was a slam-dunk that what I was proposing would be an improvement. It just seemed like a question worth asking. Thanks for your thoughts. - Christian
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Andres Freund and...@2ndquadrant.com writes: So what we need to do is to acquire a write barrier between the assignments to lwWaitLink and lwWaiting, i.e. proc-lwWaitLink = NULL; pg_write_barrier(); proc-lwWaiting = false; You didn't really explain why you think that ordering is necessary? Each proc being awoken will surely see both fields updated, and other procs won't be examining these fields at all, since we already delinked all these procs from the LWLock's queue. There is the question what to do about the branches without barriers? I guess a SpinLockAcquire()/Release() would do? Or do we decide it's not important enough to matter, since it's not an issue on x86? Given the lack of trouble reports that could be traced to this, I don't feel a need to worry about it in branches that don't have any barrier support. But in any case, I'm not convinced there's a bug here at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 2014-02-10 11:11:28 -0500, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: So what we need to do is to acquire a write barrier between the assignments to lwWaitLink and lwWaiting, i.e. proc-lwWaitLink = NULL; pg_write_barrier(); proc-lwWaiting = false; You didn't really explain why you think that ordering is necessary? Each proc being awoken will surely see both fields updated, and other procs won't be examining these fields at all, since we already delinked all these procs from the LWLock's queue. The problem is that one the released backends could wake up concurrently because of a unrelated, or previous PGSemaphoreUnlock(). It could see lwWaiting = false, and thus wakeup and acquire the lock, even if the store for lwWaitLink hasn't arrived (or performed, there's no guaranteed ordering here) yet. Now, it may well be that there's no practical consequence of that, but I am not prepared to bet on it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 2014-02-10 11:20:30 -0500, Tom Lane wrote: I wrote: You didn't really explain why you think that ordering is necessary? Actually, after grepping to check my memory of what those fields are being used for, I have a bigger question: WTF is xlog.c doing being so friendly with the innards of LWLocks? Surely this needs to get refactored so that most of WakeupWaiters() and friends is in lwlock.c. Or has all notion of modularity gone out the window while I wasn't looking? Well, it's not actually using any lwlock.c code, it's a special case locking logic, just reusing the datastructures. That said, I am not particularly happy about the amount of code it's duplicating from lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of WALInsertSlotAcquireOne() is a copied. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal, patch: allow multiple plpgsql plugins
Hello Marko 2014-01-16 23:54 GMT+01:00 Marko Tiikkaja ma...@joh.to: Hi Pavel, First of all, thanks for working on this! On 1/12/14, 8:58 PM, Pavel Stehule wrote: I still not happy with plugin_info - it is only per plugin now and should be per plugin and per function. I'm not sure I understand the point of plugin_info in the first place, but what would having a separate info per (plugin, function) achieve that can't be done otherwise? As for the current patch, I'd like to see improvements on a few things: 1) It doesn't currently compile because of extra semicolons in the PLpgSQL_plugin struct. fixed 2) The previous comment above the same struct still talk about the rendezvous variable which is now gone. The comment should be updated to reflect the new API. removed 3) The same comment talks about how important it is to unregister a plugin if its _PG_fini() is ever called, but the current API does not support unregistering. That should probably be added? I'm not sure when _PG_fini() would be called. removed These plugins should not be removed - there is no any mechanism how to remove active plugin without close session Regards Pavel 4) The comment /* reserved for use by optional plugin */ seems a bit weird in its new context. Regards, Marko Tiikkaja commit bf0820786f08b08812bba3d86233cbac30922054 Author: Pavel Stehule pavel.steh...@gooddata.com Date: Mon Feb 10 17:50:31 2014 +0100 initial diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile index 852b0c7..37d17a8 100644 --- a/src/pl/plpgsql/src/Makefile +++ b/src/pl/plpgsql/src/Makefile @@ -19,7 +19,7 @@ rpath = OBJS = pl_gram.o pl_handler.o pl_comp.o pl_exec.o pl_funcs.o pl_scanner.o -DATA = plpgsql.control plpgsql--1.0.sql plpgsql--unpackaged--1.0.sql +DATA = plpgsql.control plpgsql--1.1.sql plpgsql--1.0--1.1.sql plpgsql--unpackaged--1.0.sql all: all-lib diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 3749fac..e6510f1 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -86,6 +86,22 @@ typedef struct SimpleEcontextStackEntry static EState *shared_simple_eval_estate = NULL; static SimpleEcontextStackEntry *simple_econtext_stack = NULL; +/* + * List of pointers and info of registered plugins. + */ +typedef struct PluginPtrEntry +{ + PLpgSQL_plugin *plugin_ptr; + struct PluginPtrEntry *next; +} PluginPtrEntry; + +/* + * Allocated in TopMemoryContext + */ +static PluginPtrEntry *plugins = NULL; +static int nplugins = 0; +static int used_plugin_hook_types = 0; + / * Local function forward declarations / @@ -235,6 +251,18 @@ static char *format_expr_params(PLpgSQL_execstate *estate, const PLpgSQL_expr *expr); static char *format_preparedparamsdata(PLpgSQL_execstate *estate, const PreparedParamsData *ppd); +static void **get_plugin_info(PLpgSQL_execstate *estate, int plugin_number); + + +/* Bits for used plugin callback types */ +#define PLUGIN_HOOK_TYPE_FUNC_SETUP (1 0) +#define PLUGIN_HOOK_TYPE_FUNC_BEG (1 2) +#define PLUGIN_HOOK_TYPE_FUNC_END (1 3) +#define PLUGIN_HOOK_TYPE_STMT_BEG (1 4) +#define PLUGIN_HOOK_TYPE_STMT_END (1 5) + +#define EXEC_PLUGIN_HOOK_TYPE(type) \ + ((used_plugin_hook_types (PLUGIN_HOOK_TYPE_ ## type)) == (PLUGIN_HOOK_TYPE_ ## type)) /* -- @@ -331,10 +359,28 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo, exec_set_found(estate, false); /* - * Let the instrumentation plugin peek at this function + * Let the instrumentation plugins peek at this function */ - if (*plugin_ptr (*plugin_ptr)-func_beg) - ((*plugin_ptr)-func_beg) (estate, func); + if (EXEC_PLUGIN_HOOK_TYPE(FUNC_BEG)) + { + PluginPtrEntry *pe; + int i; + + Assert(plugins != NULL); + + for (i = 0, pe = plugins; pe != NULL; pe = pe-next, i++) + { + PLpgSQL_plugin *pl_ptr = pe-plugin_ptr; + + if (pl_ptr-func_beg) + { +void **plugin_info; + +plugin_info = get_plugin_info(estate, i); +(pl_ptr-func_beg) (estate, func, plugin_info); + } + } + } /* * Now call the toplevel block of statements @@ -479,10 +525,28 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo, estate.err_text = gettext_noop(during function exit); /* - * Let the instrumentation plugin peek at this function + * Let the instrumentation plugins peek at this function */ - if (*plugin_ptr (*plugin_ptr)-func_end) - ((*plugin_ptr)-func_end) (estate, func); + if (EXEC_PLUGIN_HOOK_TYPE(FUNC_END)) + { + PluginPtrEntry *pe; + int i; + + Assert(plugins != NULL); + + for (i = 0, pe = plugins; pe != NULL; i++, pe = pe-next) + { + PLpgSQL_plugin *pl_ptr = pe-plugin_ptr; + + if (pl_ptr-func_end) + { +void **plugin_info; + +
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
I wrote: You didn't really explain why you think that ordering is necessary? Actually, after grepping to check my memory of what those fields are being used for, I have a bigger question: WTF is xlog.c doing being so friendly with the innards of LWLocks? Surely this needs to get refactored so that most of WakeupWaiters() and friends is in lwlock.c. Or has all notion of modularity gone out the window while I wasn't looking? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink
On Sun, Feb 9, 2014 at 4:56 PM, Robert Haas robertmh...@gmail.com wrote: On Sun, Feb 9, 2014 at 6:33 PM, Jeff Janes jeff.ja...@gmail.com wrote: Since this commit (17676c785a95b2598c573), pgbench no longer uses .pgpass to obtain passwords, but instead prompts for a password This problem is in 9.3 and 9.4dev According to strace, it is reading the .pgpass file, it just seem like it is not using it. Hmm. I tried pgbench with the following .pgpass file and it worked OK. Removing the file caused pgbench to prompt for a password. *:*:*:*:foo OK, that works for me. I had it completely specified. Playing with variations on this, I see that the key is pgport. Set to * it works, set to 5432 it prompts for the password. (If I specify -p 5432 to pgbench, that would work with the original file) Presumably whatever behavior difference you are seeing is somehow related to the use of PQconnectdbParams() rather than PQsetdbLogin(), but the fine manual does not appear to document a different between those functions as regards password-prompting behavior or .pgpass usage. It looks like PQsetdbLogin() has either NULL or empty string passed to it match 5432 in pgpass, while PQconnectdbParams() only has NULL match 5432 and empty string does not. pgbench uses empty string if no -p is specified. This make pgbench behave the way I think is correct, but it hardly seems like the right way to fix it. *** a/contrib/pgbench/pgbench.c --- b/contrib/pgbench/pgbench.c *** doConnect(void) *** 528,533 --- 528,535 new_pass = false; + if (values[1][0] == 0) values[1]=NULL; + conn = PQconnectdbParams(keywords, values, true); if (!conn) Cheers, Jeff
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 02/10/2014 06:41 PM, Andres Freund wrote: On 2014-02-10 11:20:30 -0500, Tom Lane wrote: I wrote: You didn't really explain why you think that ordering is necessary? Actually, after grepping to check my memory of what those fields are being used for, I have a bigger question: WTF is xlog.c doing being so friendly with the innards of LWLocks? Surely this needs to get refactored so that most of WakeupWaiters() and friends is in lwlock.c. Or has all notion of modularity gone out the window while I wasn't looking? Well, it's not actually using any lwlock.c code, it's a special case locking logic, just reusing the datastructures. That said, I am not particularly happy about the amount of code it's duplicating from lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of WALInsertSlotAcquireOne() is a copied. I'm not too happy with the amount of copy-paste myself, but there was enough difference to regular lwlocks that I didn't want to bother all lwlocks with the xlog-specific stuff either. The WAL insert slots do share the LWLock-related PGPROC fields though, and semaphore. I'm all ears if you have ideas on that.. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. The server version *is* the version basically. If a broader solution exists I think it should be addressed broadly. Versioning one type only IMNSHO is a complete hack. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
Heikki Linnakangas hlinnakan...@vmware.com writes: On 02/10/2014 06:41 PM, Andres Freund wrote: Well, it's not actually using any lwlock.c code, it's a special case locking logic, just reusing the datastructures. That said, I am not particularly happy about the amount of code it's duplicating from lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of WALInsertSlotAcquireOne() is a copied. I'm not too happy with the amount of copy-paste myself, but there was enough difference to regular lwlocks that I didn't want to bother all lwlocks with the xlog-specific stuff either. The WAL insert slots do share the LWLock-related PGPROC fields though, and semaphore. I'm all ears if you have ideas on that.. I agree that if the behavior is considerably different, we don't really want to try to make LWLockAcquire/Release cater to both this and their standard behavior. But why not add some additional functions in lwlock.c that do what xlog wants? If we're going to have mostly-copy-pasted logic, it'd at least be better if it was in the same file, and not somewhere that's not even in the same major subtree. Also, the reason that LWLock isn't an abstract struct is because we wanted to be able to embed it in other structs; *not* as license for other modules to fool with its contents. If we were working in C++ we'd certainly have made all its fields private. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Mon, Feb 10, 2014 at 11:57 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: The relcache parts? I don't think a separate patch ever appeared that could be reviewed. I posted the patch on January 18th: http://www.postgresql.org/message-id/cam3swzth4vkesot7dcrwbprn7zzhnz-wa6zmvo1ff7gbnoj...@mail.gmail.com I was under the impression that you agreed that this was independently valuable, regardless of the outcome here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 02/10/2014 03:46 PM, Andres Freund wrote: Hi, During the lwlock scalability work I noticed a longstanding issue with the lwlock code. LWLockRelease() and the other mentioned locations do the following to wake up any waiters, without holding the lock's spinlock: /* * Awaken any waiters I removed from the queue. */ while (head != NULL) { LOG_LWDEBUG(LWLockRelease, T_NAME(l), T_ID(l), release waiter); proc = head; head = proc-lwWaitLink; proc-lwWaitLink = NULL; proc-lwWaiting = false; PGSemaphoreUnlock(proc-sem); } which means they manipulate the lwWaitLink queue without protection. That's done intentionally. The code tries to protect against corruption of the list to do a woken up backend acquiring a lock (this or an independent one) by only continuing when the lwWaiting flag is set to false. Unfortunately there's absolutely no guarantee that a) the assignment to lwWaitLink and lwWaiting are done in that order b) that the stores are done in-order from the POV of other backends. So what we need to do is to acquire a write barrier between the assignments to lwWaitLink and lwWaiting, i.e. proc-lwWaitLink = NULL; pg_write_barrier(); proc-lwWaiting = false; the reader side already uses an implicit barrier by using spinlocks. I've fixed this as part 1 of the lwlock scalability work in [1], but Heikki rightfully suggested that a) this should be backpatched b) done in a separate commit. There is the question what to do about the branches without barriers? I guess a SpinLockAcquire()/Release() would do? The other alternative we discussed on IM is to unlink the waiters from the linked list, while still holding the spinlock. We can't do the PGSemaphoreUnlock() call to actually wake up the waiters while holding the spinlock, but all the shared memory manipulations we could. It would move all the modifications of the shared structures under the spinlock, which feels comforting. It would require using some-sort of a backend-private data structure to hold the list of processes to wake up. We don't want to palloc() in LWLockRelease(), but we could malloc() a large-enough array once at process initialization, and use that on all LWLockRelease() calls. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. Really? How would you expect to do that, exactly? In particular, how would you propose that a binary pg_dump file be reloadable if we redefine the binary format down the road without having made provision like this? Versioning one type only IMNSHO is a complete hack. I don't feel a need for versioning int, or float8, or most other types; and that includes the ones for which we've previously defined binary format as equivalent to text (enums). In this case we know that we're not totally satisfied with the binary format we're defining today, so I think a type-specific escape hatch is a reasonable solution. Moreover, I don't especially buy tying it to server version, even if we had an information pathway that would provide that reliably in all contexts. Granting the presumption that more than one data type would want such versioning, it's still possible that different data types would have different ideas about what they needed to do and where the cutover points were. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On 02/09/2014 02:17 PM, Magnus Hagander wrote: If an error occurs in the foreground (backup) process of pg_basebackup, and we exit in a controlled way, the background process (streaming xlog process) would stay around and keep streaming. This can happen for example if disk space runs out and there is very low activity on the server. (If there is activity on the server, the background streamer will also run out of disk space and exit) Attached patch kills it off in disconnect_and_exit(), which seems like the right thing to do to me. Any objections to applying and backpatching that for the upcoming minor releases? Do you get a different error message with this patch than before? Is the new one better than the old one? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 02/10/2014 08:03 PM, Tom Lane wrote: Heikki Linnakangas hlinnakan...@vmware.com writes: On 02/10/2014 06:41 PM, Andres Freund wrote: Well, it's not actually using any lwlock.c code, it's a special case locking logic, just reusing the datastructures. That said, I am not particularly happy about the amount of code it's duplicating from lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of WALInsertSlotAcquireOne() is a copied. I'm not too happy with the amount of copy-paste myself, but there was enough difference to regular lwlocks that I didn't want to bother all lwlocks with the xlog-specific stuff either. The WAL insert slots do share the LWLock-related PGPROC fields though, and semaphore. I'm all ears if you have ideas on that.. I agree that if the behavior is considerably different, we don't really want to try to make LWLockAcquire/Release cater to both this and their standard behavior. But why not add some additional functions in lwlock.c that do what xlog wants? If we're going to have mostly-copy-pasted logic, it'd at least be better if it was in the same file, and not somewhere that's not even in the same major subtree. Ok, I'll try to refactor it that way, so that we can see if it looks better. Also, the reason that LWLock isn't an abstract struct is because we wanted to be able to embed it in other structs; *not* as license for other modules to fool with its contents. If we were working in C++ we'd certainly have made all its fields private. Um, xlog.c is doing no such thing. The insertion slots use a struct of their own, which is also copy-pasted from LWLock (with one additional field). - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
On Sun, Jan 19, 2014 at 2:17 AM, Peter Geoghegan p...@heroku.com wrote: I'm just throwing an error when locking the tuple returns HeapTupleInvisible, and the xmin of the tuple is our xid. I would like some feedback on this point. We need to consider how exactly to avoid updating the same tuple inserted by our command. Updating a tuple we inserted cannot be allowed to happen, not least because to do so causes livelock. A related consideration that I raised in mid to late January that hasn't been commented on is avoiding updating the same tuple twice, and where we come down on that with respect to where our responsibility to the user starts and ends. For example, SQL MERGE officially forbids this, but MySQL's INSERT...ON DUPLICATE KEY UPDATE seems not to, probably due to implementation considerations. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Terminating pg_basebackup background streamer
On Mon, Feb 10, 2014 at 7:29 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/09/2014 02:17 PM, Magnus Hagander wrote: If an error occurs in the foreground (backup) process of pg_basebackup, and we exit in a controlled way, the background process (streaming xlog process) would stay around and keep streaming. This can happen for example if disk space runs out and there is very low activity on the server. (If there is activity on the server, the background streamer will also run out of disk space and exit) Attached patch kills it off in disconnect_and_exit(), which seems like the right thing to do to me. Any objections to applying and backpatching that for the upcoming minor releases? Do you get a different error message with this patch than before? Is the new one better than the old one? Previously you got double error messages - one from the foreground, and a second one from the background sometime in the future (whenever it eventually failed, and for whatever reason - so if it was out of disk space, it would complain about that once it got enough xlog for it to happen). With the patch you just get the error message from the first process. The background process doesn't give an error on SIGTERM, it just exists. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] PoC: Partial sort
On Mon, Feb 10, 2014 at 2:33 PM, Marti Raudsepp ma...@juffo.org wrote: On Sun, Feb 9, 2014 at 7:37 PM, Alexander Korotkov aekorot...@gmail.com wrote: This is not only place that worry me about planning overhead. See get_cheapest_fractional_path_for_pathkeys. I had to estimate number of groups for each sorting column in order to get right fractional path. AFAICT this only happens once per plan and the overhead is O(n) to the number of pathkeys? I can't get worried about that, but I guess it's better to test anyway. PS: You didn't answer my questions about splitting the patch. I guess I'll have to do that anyway to run the tests. Done. Patch is splitted. -- With best regards, Alexander Korotkov. partial-sort-basic-1.patch.gz Description: GNU Zip compressed data partial-sort-merge-1.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 12:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. Really? How would you expect to do that, exactly? In particular, how would you propose that a binary pg_dump file be reloadable if we redefine the binary format down the road without having made provision like this? Versioning one type only IMNSHO is a complete hack. I don't feel a need for versioning int, or float8, or most other types; and that includes the ones for which we've previously defined binary format as equivalent to text (enums). In this case we know that we're not totally satisfied with the binary format we're defining today, so I think a type-specific escape hatch is a reasonable solution. Moreover, I don't especially buy tying it to server version, even if we had an information pathway that would provide that reliably in all contexts. Why not? Furthermore what are we doing now? If we need a binary format contract that needs to be separated from this discussion. I've written (along with Andrew C) the only serious attempt to deal with client side binary format handling (http://libpqtypes.esilo.com/) and in all interesting cases it depends on the server version to define binary parsing behaviors. I agree WRT float8, etc but other types have changed in a couple of cases and it's always been with the version. I find it highly unlikely that any compatibility behaviors are going to be defined *for each and every returned datum* now, or ever...so even if it's a few bytes lost, why do it? Intra-version compatibility issues should they ever have to be handled would be more likely handled at connection- or query- time. Point being, if an escape hatch is needed, I'm near 100% certain this is not the right place to do it. Binary wire format compatibility is a complex topic and proposed solution ISTM is not at all fleshed out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2014 - mentors, students and admins
Hi! On Tue, Jan 28, 2014 at 9:34 PM, Thom Brown t...@linux.com wrote: And I'd be fine with being admin again this year, unless there's anyone else who would like to take up the mantle? Thanks for your work. I would like to see you as admin this year again. Who would be up for mentoring this year? And are there any project ideas folk would like to suggest? I would like to be mentor. -- With best regards, Alexander Korotkov.
Re: [HACKERS] Memory ordering issue in LWLockRelease, WakeupWaiters, WALInsertSlotRelease
On 2014-02-10 19:48:47 +0200, Heikki Linnakangas wrote: On 02/10/2014 06:41 PM, Andres Freund wrote: On 2014-02-10 11:20:30 -0500, Tom Lane wrote: I wrote: You didn't really explain why you think that ordering is necessary? Actually, after grepping to check my memory of what those fields are being used for, I have a bigger question: WTF is xlog.c doing being so friendly with the innards of LWLocks? Surely this needs to get refactored so that most of WakeupWaiters() and friends is in lwlock.c. Or has all notion of modularity gone out the window while I wasn't looking? Well, it's not actually using any lwlock.c code, it's a special case locking logic, just reusing the datastructures. That said, I am not particularly happy about the amount of code it's duplicating from lwlock.c. Pretty much all of WALInsertSlotReleaseOne and most of WALInsertSlotAcquireOne() is a copied. I'm not too happy with the amount of copy-paste myself, but there was enough difference to regular lwlocks that I didn't want to bother all lwlocks with the xlog-specific stuff either. The WAL insert slots do share the LWLock-related PGPROC fields though, and semaphore. I'm all ears if you have ideas on that.. The lwlock scalability stuff has separated out the enqueue/wakeup code, that probably should work here as well? And that's a fair portion of the code. I think it should be doable to make that generic enough that the actual difference of the struct doesn't matter. It'd also reduce duplication of LWLockAcquire, ConditionalAcquire, OrWait. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. The server version *is* the version basically. If a broader solution exists I think it should be addressed broadly. Versioning one type only IMNSHO is a complete hack. I don't find that very convincing. The entire reason jsonb exists is because the parsing overhead of text json is significant, so it stands to reason that soon somebody will try to work on a better wire protocol, even if the current code cannot be made ready for 9.4. And I don't think past instability of binary type's formats is a good reason for *needlessly* breaking stuff like binary COPYs. And it's not like one prefixed byte has any real-world relevant cost. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:02 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. The server version *is* the version basically. If a broader solution exists I think it should be addressed broadly. Versioning one type only IMNSHO is a complete hack. I don't find that very convincing. The entire reason jsonb exists is because the parsing overhead of text json is significant, so it stands to reason that soon somebody will try to work on a better wire protocol, even if the current code cannot be made ready for 9.4. And I don't think past instability of binary type's formats is a good reason for *needlessly* breaking stuff like binary COPYs. And it's not like one prefixed byte has any real-world relevant cost. The point is, why does this one type get a version id? Imagine a hypothetical program that sent/received the binary format for jsonb. All you have to to is manage the version flag appropriately, right? Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. I guess you could argue that having a version flag could, say, allow libpq clients to gracefully error out if, say, a old non-exotic-format speaking libpq happens to connect to a newer sever -- assuming the client actually bothered to check the flag. That's zero help to the client though -- regardless the compatibility isn't established and that's zero help to other binary formats that we have=, and probably will continue to-, change. What about them? Are we now, at the upteenth hour of the final commit fest, suddenly deciding that binary wire formats going to be compatible across versions? The kinda low effort way to deal with binary format compatibility is to simply document the existing formats and document format changes in some convenient place. The 'real' long term path to doing it IMO is to abstract out a shared/client server type library with some protocol negotiation features. Then, at connection time, the client/server agree on what's the optimal way to send things -- perhaps the client can signal things like 'want compression for long datums'. The only case for a version flag at the data point level is if the server is sending version X at this tuple and version Y at that tuple. I don't think that's a makable case. Some might say, what about a compression bit based on compressibility/length? and to that I'd answer: why is that handling specific to the json type...are text/bytea/arrays not worth that feature too? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 17:35:12 -0600, Merlin Moncure wrote: Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. think of binary COPY outputting data in 9.4 and then trying to import that data into 9.5. That's the interesting case here. What about them? Are we now, at the upteenth hour of the final commit fest, suddenly deciding that binary wire formats going to be compatible across versions? It has been a concern before. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:38 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 17:35:12 -0600, Merlin Moncure wrote: Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. think of binary COPY outputting data in 9.4 and then trying to import that data into 9.5. That's the interesting case here. right, json could be made work, but any other format change introduced to any other already existing type will break. That's not a real solution unless we decree henceforth that no formats will change from here on in, in which case I withdraw my objection. I think COPY binary has exactly the same set of considerations as the client side. If you want to operate cleanly between versions (which has never been promised in the past), you have to encode in a header the kinds of things the server would need to parse it properly. Starting with, but not necessarily limited to, the encoding server's version. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 17:48:32 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 5:38 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 17:35:12 -0600, Merlin Moncure wrote: Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. think of binary COPY outputting data in 9.4 and then trying to import that data into 9.5. That's the interesting case here. right, json could be made work, but any other format change introduced to any other already existing type will break. That's not a real solution unless we decree henceforth that no formats will change from here on in, in which case I withdraw my objection. Sure, it's not a full solution. But it's better than nothing, and it's likely that we'll see breakage soonish. I don't think there's been much recent mucking around with incompatible binary formats? I think COPY binary has exactly the same set of considerations as the client side. If you want to operate cleanly between versions (which has never been promised in the past), you have to encode in a header the kinds of things the server would need to parse it properly. Starting with, but not necessarily limited to, the encoding server's version. It works in enough cases atm that it's worthwile trying to keep it working. Sure, it could be better, but it's what we have right now. Atm it's e.g. the only realistic way to copy larger amounts of bytea between servers without copying the entire cluster. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: It works in enough cases atm that it's worthwile trying to keep it working. Sure, it could be better, but it's what we have right now. Atm it's e.g. the only realistic way to copy larger amounts of bytea between servers without copying the entire cluster. That's the thing -- it might work today, but what about tomorrow? We'd be sending the wrong signals. People start building processes around all of this and now we've painted ourselves into a box. Better in my mind to simply educate users that this practice is dangerous and unsupported, as we used to do. I guess until now. It seems completely odd to me that we're attaching a case to the jsonb type, in the wrong way -- something that we've never attached to any other type before. For example, why didn't we attach a version code to the json type send function? Wasn't the whole point of this is that jsonb send/recv be more spiritually closer to json? If we want to introduce alternative type formats in the 9.5 cycle, why can't we attach version based encoding handling to *that* problem? The more angles I look at this from the more it looks messy and rushed. Notwithstanding all the above, I figure here enough smart people disagree (once again, heh) to call it consensus. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] dynamic shared memory and locks
2014-02-08 4:52 GMT+09:00 Robert Haas robertmh...@gmail.com: On Tue, Jan 21, 2014 at 11:37 AM, Robert Haas robertmh...@gmail.com wrote: One idea I just had is to improve the dsm_toc module so that it can optionally set up a tranche of lwlocks for you, and provide some analogues of RequestAddinLWLocks and LWLockAssign for that case. That would probably make this quite a bit simpler to use, at least for people using it with dynamic shared memory. But I think that's a separate patch. I played with this a bit today and it doesn't actually seem to simplify things very much. The backend that creates the DSM needs to do this: lwlocks = shm_toc_allocate(toc, sizeof(LWLockPadded) * nlwlocks); for (i = 0; i nlwlocks; ++i) LWLockInitialize(lwlocks[i].lock, tranche_id); Since that's all of three lines, encapsulating it doesn't look all that helpful. Then each backend needs to do something like this: static LWLockTranche mytranche; mytranche.name = some descriptive module name; mytranche.array_base = lwlocks; mytranche.array_stride = sizeof(LWLockPadded); LWLockRegisterTranche(tranche_id, mytranche); That's not a lot of code either, and there's no obvious way to reduce it much by hooking into shm_toc. I thought maybe we needed some cleanup when the dynamic shared memory segment is unmapped, but looks like we really don't. One can do something like LWLockRegisterTranche(tranche_id, NULL) for debugging clarity, but it isn't strictly needed; one can release all lwlocks from the tranche or assert that none are held, but that should really only be a problem if the user does something like LWLockAcquire(lock_in_the_segment); dsm_detach(seg), because error recovery starts by releasing *all* lwlocks. And if the user does it explicitly, I'm kinda OK with that just seg faulting. After all, the user could equally well have done dsm_detach(seg); LWLockAcquire(lock_in_the_segment) and there's no way at all to cross-check for that sort of mistake. Does it make another problem if dsm_detach() also releases lwlocks being allocated on the dsm segment to be released? Lwlocks being held is tracked in the held_lwlocks[] array; its length is usually 100. In case when dsm_detach() is called towards the segment between addr ~ (addr + length), it seems to me an easy job to walk on this array to find out lwlocks on the range. I do see one thing about the status quo that does look reasonably annoying: the code expects that tranche IDs are going to stay relatively small. For example, consider a module foobar that uses DSM + LWLocks. It won't do at all for each backend, on first use of the foobar module, to do LWLockNewTrancheId() and then reuse that tranche_id repeatedly for each new DSM - because over a system lifetime of months, tranche IDs could grow into the millions, causing LWLockTrancheArray to get really big (and eventually repalloc will fail). Rather, the programmer needs to ensure that LWLockNewTrancheId() gets called *once per postmaster lifetime*, perhaps by allocating a chunk of permanent shared memory and using that to store the tranche_id that should be used each time an individual backend fires up a DSM. Considering that one of the goals of dynamic shared memory is to allow modules to be loaded after postmaster startup and still be able to do useful stuff, that's kind of obnoxious. I don't have a good idea what to do about it, though; making LWLockTrancheArray anything other than a flat array looks likely to slow down --enable-dtrace builds unacceptably. Just my rough idea. Doesn't it make sense to have an offset from the head of DSM segment that contains the lwlock, instead of the identifier form, to indicate a cstring datum? It does not prevent to allocate it later; after the postmaster starting up, and here is no problem if number of dynamic lwlocks grows millions or more. If lwlock has a tranche_offset, not tranche_id, all it needs to do is: 1. find out either of DSM segment or regular SHM segment that contains the supplied lwlock. 2. Calculate mapped_address + tranche_offset; that is the local location where cstring form is put. Probably, it needs a common utility routine on dsm.c to find out a particular DSM segment that contains the supplied address. (Inverse function towards dsm_segment_address) How about my ideas? Thanks, -- KaiGai Kohei kai...@kaigai.gr.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 18:16:15 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: It works in enough cases atm that it's worthwile trying to keep it working. Sure, it could be better, but it's what we have right now. Atm it's e.g. the only realistic way to copy larger amounts of bytea between servers without copying the entire cluster. That's the thing -- it might work today, but what about tomorrow? We'd be sending the wrong signals. People start building processes around all of this and now we've painted ourselves into a box. That ship has sailed. Better in my mind to simply educate users that this practice is dangerous and unsupported, as we used to do. But we don't have any alternatives for such scenarios, so that just amounts to screw you. If there are good reason for just breaking binary protocol compatibility, I can live with that, but that's really not the case here. The additional amount of code is *miniscule*, even after adding a real binary protocol format since all the code has to be there for the plain send/recv functions anyway. The amount of interesting and acceptable binary protocol changes has gotten lower in step with the acceptance of on-disk compatibility changes, which isn't particularly surprising. I guess until now. It seems completely odd to me that we're attaching a case to the jsonb type, in the wrong way -- something that we've never attached to any other type before. For example, why didn't we attach a version code to the json type send function? Wasn't the whole point of this is that jsonb send/recv be more spiritually closer to json? If we want to introduce alternative type formats in the 9.5 cycle, why can't we attach version based encoding handling to *that* problem? That doesn't make any sense to me. jsonb is a separate type because it behaves differently than json. So I don't see how that plays a role here. And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Merlin Moncure mmonc...@gmail.com writes: right, json could be made work, but any other format change introduced to any other already existing type will break. That's not a real solution unless we decree henceforth that no formats will change from here on in, in which case I withdraw my objection. Well, I don't recall that we've made a practice of changing binary formats a lot. Doing so would break existing dumps, which is something we strenuously avoid. Even granting that sometime in the future we invent infrastructure to do the kind of protocol negotiation you're talking about, one byte per JSON value seems like a cheap and worthwhile cross-check that both ends came to the same conclusion about what to send. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. That's completely unfair. I'm arguing *not* to attach version dependency expectations to the jsonb type, at all, not the other way around. If you want to do that, fine, but do it *later* as in, 9.5, or beyond. I just gave an example of how binary format changes could be worked in later. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 19:01:48 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. That's completely unfair. I'm arguing *not* to attach version dependency expectations to the jsonb type, at all, not the other way around. If you want to do that, fine, but do it *later* as in, 9.5, or beyond. I just gave an example of how binary format changes could be worked in later. Comeon. Your way requires building HEAPS of new and generic infrastructure in 9.5 and would only work for binary copy. The proposed way requires about two lines of code. Without the generic infrastructure we'd end up relying on some intracacies like the meaning of high bit in the first byte or such. Anyway, that's it on this subthread from me, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 10 February 2014 20:11, Hannu Krosing ha...@krosing.net wrote: The fastest and lowest parsing cost format for JSON is tnetstrings http://tnetstrings.org/ why not use it as the binary wire format ? It would be as binary as it gets and still be generally parse-able by lots of different platforms, at leas by all of these we care about. If we do go down the binary encoding path in a future release, can I please suggest *not* using something like tnetstrings, which suffers the same problem that a few binary transport formats suffer, particularly when they're developed by people whose native language doesn't distinguish between byte arrays and strings - all strings are considered byte arrays and it's up to an application to decide on character encoding and which things are data vs strings in the application. This makes writing a parser in a language which does treat byte arrays and strings differently very difficult, see e.g. the java tnetstrings API [1] which is forced into treating strings as byte arrays until the programmer then asks it to parse the thing again, but please treat everything as a string this time. The msgpack people after much wrangling have ended up issuing a new version of the protocol which avoids this issue and which they are strongly encouraging users to switch to, see [2] for the gory details. While we may not ever store types in our jsonb format other than the standard json data types (I can foresee people wanting to do it, though), I would strongly recommend picking a format which at least is clear that a value is a string (text, whatever), and preferably makes it clear what the character encoding is. Or maybe it should just follow whatever the client encoding is at the time - as long as that is completely unambiguous to a client. Cheers Tom [1] https://github.com/asinger/tnetstringsj [2] https://github.com/msgpack/msgpack/issues/128 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Monday, February 10, 2014, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 19:01:48 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.usjavascript:; wrote: Merlin Moncure mmonc...@gmail.com javascript:; writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com javascript:; wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. That's completely unfair. I'm arguing *not* to attach version dependency expectations to the jsonb type, at all, not the other way around. If you want to do that, fine, but do it *later* as in, 9.5, or beyond. I just gave an example of how binary format changes could be worked in later. Comeon. Your way requires building HEAPS of new and generic infrastructure in 9.5 and would only work for binary copy. The proposed way requires about two lines of code. Without the generic infrastructure we'd end up relying on some intracacies like the meaning of high bit in the first byte or such. Anyway, that's it on this subthread from me Fair enough. I'll concede the point. merlin
Re: [HACKERS] jsonb and nested hstore
Hi, Is it just me or is jsonapi.h not very well documented? On 2014-02-06 18:47:31 -0500, Andrew Dunstan wrote: +/* + * for jsonb we always want the de-escaped value - that's what's in token + */ +static void +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) +{ + JsonbInState *_state = (JsonbInState *) state; + JsonbValue v; + + v.size = sizeof(JEntry); + + switch (tokentype) + { + + case JSON_TOKEN_STRING: + v.type = jbvString; + v.string.len = token ? checkStringLen(strlen(token)) : 0; + v.string.val = token ? pnstrdup(token, v.string.len) : NULL; + v.size += v.string.len; + break; + case JSON_TOKEN_NUMBER: + v.type = jbvNumeric; + v.numeric = DatumGetNumeric(DirectFunctionCall3(numeric_in, CStringGetDatum(token), 0, -1)); + + v.size += VARSIZE_ANY(v.numeric) +sizeof(JEntry) /* alignment */ ; missing space. Why does + sizeof(JEntry) change anything about alignment? If it was aligned before, adding a statically sized value doesn't give any new guarantees about alignment? +/* + * jsonb type recv function + * + * the type is sent as text in binary mode, so this is almost the same + * as the input function. + */ +Datum +jsonb_recv(PG_FUNCTION_ARGS) +{ + StringInfo buf = (StringInfo) PG_GETARG_POINTER(0); + text *result = cstring_to_text_with_len(buf-data, buf-len); + + return deserialize_json_text(result); +} This is a bit absurd, we're receiving a string in a StringInfo buffer, just to copy it into text, and then in makeJsonLexContext() access the raw chars again. +static void +putEscapedValue(StringInfo out, JsonbValue *v) +{ + switch (v-type) + { + case jbvNull: + appendBinaryStringInfo(out, null, 4); + break; + case jbvString: + escape_json(out, pnstrdup(v-string.val, v-string.len)); + break; + case jbvBool: + if (v-boolean) + appendBinaryStringInfo(out, true, 4); + else + appendBinaryStringInfo(out, false, 5); + break; + case jbvNumeric: + appendStringInfoString(out, DatumGetCString(DirectFunctionCall1(numeric_out, PointerGetDatum(v-numeric; + break; + default: + elog(ERROR, unknown jsonb scalar type); + } +} Hm, will the jbvNumeric always result in correct correct quoting? datum_to_json() does extra hangups for that case, any reason we don't need that here? +char * +JsonbToCString(StringInfo out, char *in, int estimated_len) +{ ... + while (redo_switch || ((type = JsonbIteratorGet(it, v, false)) != 0)) + { + redo_switch = false; Not sure if I see the advantage over the goto here. A comment explaining what the reason for the goto is wouldhave sufficed. + case WJB_KEY: + if (first == false) + appendBinaryStringInfo(out, , , 2); + first = true; + + putEscapedValue(out, v); + appendBinaryStringInfo(out, : , 2); putEscapedValue doesn't gurantee only strings are output, but datum_to_json does extra hangups for that case. + type = JsonbIteratorGet(it, v, false); + if (type == WJB_VALUE) + { + first = false; + putEscapedValue(out, v); + } + else + { + Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY); + /* + * We need to rerun current switch() due to put + * in current place object which we just got + * from iterator. + */ due to put? +/* + * jsonb type send function + * + * Just send jsonb as a string of text + */ +Datum +jsonb_send(PG_FUNCTION_ARGS) +{ + Jsonb *jb = PG_GETARG_JSONB(0); + StringInfoData buf; + char *out; + + out = JsonbToCString(NULL, (JB_ISEMPTY(jb)) ? NULL : VARDATA(jb), VARSIZE(jb)); + + pq_begintypsend(buf); + pq_sendtext(buf, out, strlen(out)); + PG_RETURN_BYTEA_P(pq_endtypsend(buf)); +} Why aren't you using using the stringbuf passing JsonbToCString convention here to avoid the
Re: [HACKERS] newlines at end of generated SQL
committed -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/10/2014 08:50 PM, Tom Dunstan wrote: On 10 February 2014 20:11, Hannu Krosing ha...@krosing.net wrote: The fastest and lowest parsing cost format for JSON is tnetstrings http://tnetstrings.org/ why not use it as the binary wire format ? It would be as binary as it gets and still be generally parse-able by lots of different platforms, at leas by all of these we care about. If we do go down the binary encoding path in a future release, can I please suggest *not* using something like tnetstrings, which suffers the same problem that a few binary transport formats suffer, particularly when they're developed by people whose native language doesn't distinguish between byte arrays and strings - all strings are considered byte arrays and it's up to an application to decide on character encoding and which things are data vs strings in the application. This makes writing a parser in a language which does treat byte arrays and strings differently very difficult, see e.g. the java tnetstrings API [1] which is forced into treating strings as byte arrays until the programmer then asks it to parse the thing again, but please treat everything as a string this time. The msgpack people after much wrangling have ended up issuing a new version of the protocol which avoids this issue and which they are strongly encouraging users to switch to, see [2] for the gory details. While we may not ever store types in our jsonb format other than the standard json data types (I can foresee people wanting to do it, though), I would strongly recommend picking a format which at least is clear that a value is a string (text, whatever), and preferably makes it clear what the character encoding is. Or maybe it should just follow whatever the client encoding is at the time - as long as that is completely unambiguous to a client. Its treatment of numbers is also broken from my POV (numbers are not just integers or floats), so no, we're not going to use tnetstrings. Plus, the whole idea of us moving to text for send/recv was to save code, not to have to write new code, so to suggest using it now is to ignore the discussion that went on before. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 02/10/2014 09:11 PM, Andres Freund wrote: diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index e1d8aae..50ddf50 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c there's lots of whitespace/tab damage in this file. Check git log/diff --check or such. I don't know exactly what you're looking at. Here's what I get: [andrew@emma pg_jsonb]$ git diff --check master contrib/hstore/hstore--1.3.sql:465: trailing whitespace. + WITHOUT FUNCTION AS IMPLICIT; contrib/hstore/hstore--1.3.sql:468: trailing whitespace. + WITHOUT FUNCTION AS IMPLICIT; [andrew@emma pg_jsonb]$ I'll have a look at some of your other complaints when I get back home in a two or three of days, weather permitting. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On 2014-02-10 22:15:21 -0500, Andrew Dunstan wrote: On 02/10/2014 09:11 PM, Andres Freund wrote: diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index e1d8aae..50ddf50 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c there's lots of whitespace/tab damage in this file. Check git log/diff --check or such. I don't know exactly what you're looking at. Here's what I get: Sorry, forget that bit. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
(2014/02/10 22:42), Hiroshi Inoue wrote: (2014/02/09 8:06), Andrew Dunstan wrote: On 02/08/2014 05:34 PM, Tom Lane wrote: Hiroshi Inoue in...@tpf.co.jp writes: Though I'm not a MINGW expert at all, I know dllwrap is a deprecated tool and dlltool is almost a deprecated tool. Cygwin port is removing the use of dllwrap and dlltool now. Isn't it better for MINGW port to follow it? Only way to make that happen is to prepare and test a patch ... Yeah. Incidentally, we didn't quite get rid of dlltool for Cygwin. We did get rid of dllwrap. But I agree this is worth trying for Mingw. I tried MINGW port with the attached change and successfully built src and contrib and all pararell regression tests were OK. I forgot to mention the environment. I tried the change in 2 machines and both worked. 1. 32bit Windows7 GCC 4.6.1 2. 32bit Windows Vista GCC 3.4.5 I didn't test 64bit platform. regards, Hiroshi Inoue -- I am using the free version of SPAMfighter. SPAMfighter has removed 3567 of my spam emails to date. Get the free SPAMfighter here: http://www.spamfighter.com/len Do you have a slow PC? Try a Free scan http://www.spamfighter.com/SLOW-PCfighter?cid=sigen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
Inoue, Hiroshi in...@tpf.co.jp writes: (2014/02/10 22:42), Hiroshi Inoue wrote: I tried MINGW port with the attached change and successfully built src and contrib and all pararell regression tests were OK. I forgot to mention the environment. I tried the change in 2 machines and both worked. If there are no objections, I'll push this patch into HEAD tomorrow, along with the upthread patches from Craig Ringer and Marco Atzeri. We might as well see if this stuff is going to work ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] narwhal and PGDLLIMPORT
On 02/11/2014 01:28 PM, Tom Lane wrote: If there are no objections, I'll push this patch into HEAD tomorrow, along with the upthread patches from Craig Ringer and Marco Atzeri. We might as well see if this stuff is going to work ... I'd love to test my patch properly before pushing it, but my dev machine is going to need a total teardown and rebuild, and I'm currently focused on polishing off urgent open work. So let's see what the buildfarm makes of it, I guess. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers