Re: [HACKERS] jsonb and nested hstore
On Thu, Mar 6, 2014 at 10:33 PM, Bruce Momjian br...@momjian.us wrote: On Thu, Mar 6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote: Hi there, Looks like consensus is done. I and Teodor are not happy with it, but what we can do :) One thing I want to do is to reserve our contribution to the flagship feature (jsonb), particularly, binary storage for nested structures and indexing. Their work was sponsored by Engine Yard. OK, if we are going with an unchanged hstore in contrib and a new JSONB, there is no reason to wack around JSONB to be binary compatible with the old hstore format. What sacrifices did we need to make to have JSBONB be binary compatible with hstore, can those sacrifices be removed, and can that be done in time for 9.4? Also, *) what hstore2 features (if any) that are not already reflected in the jsonb type are going to be moved to josnb for 9.4? *) if the answer above is anything but 'nothing', what hstore-isms are going to be adjusted in the process of doing so? Presumably there would be same function name changes to put them in the jsonb style but also the hstore sytnax ('=') is going to be embedded in some of the search operators and possibly other things. Is that going change? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Mar 6, 2014 at 11:28 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: So here my opinion on what we should do: 1. Forget about hstore2 2. Add GIN and GIST operator classes to jsonb, if they're ready for commit pretty darn soon. If not, punt them to next release. For #2, would we maintain the hstore syntax for the searching operators. For example, SELECT count(*) FROM jsonb_schema WHERE tabledata @ 'columns = {{column_name=total_time}}'; Note the hstore-ish = in the searching operator. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 8:39 AM, Bruce Momjian br...@momjian.us wrote: So, I am going to ask a back-track question and ask why we can't move hstore into core. This is exactly the opposite of what should be happening. Now, jsonb might make it into core because of the json precedent but the entire purpose of the extension system is stop dumping everything in the public namespace. Stuff 'in core' becomes locked in stone, forever, because of backwards compatibility concerns, which are IMNSHO, a bigger set of issues than even pg_upgrade related issues. Have we gone through all the new hstore functions and made sure they don't break existing applications? Putting things in core welds your only escape hatch shut. *All* non-sql standard types ought to be in extensions in an ideal world. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 9:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 03/05/2014 09:39 AM, Bruce Momjian wrote: So, I am going to ask a back-track question and ask why we can't move hstore into core. Is this a problem with the oids of the hstore data type and functions? Is this a pg_upgrade-only problem? Can this be fixed? Yes, pg_upgrade is the problem, and no, I can't see how it can be fixed. Builtin types have Oids in a certain range. Non-builtin types have Oids outside that range. If you have a clever way to get over that I'd be all ears, but it seems to me insurmountable right now. More to the point: 1. Built-in types have predetermined, fixed OIDs. Types made by extensions do not, and almost certainly will have different OIDs in different existing databases. 2. There's no easy way to change the OID of an existing type during pg_upgrade, because it may be on-disk in (at least) array headers. We could possibly get around #2, if we could think of a secure way for array_out and sibling functions to identify the array type without use of the embedded OID value. I don't know how we could do that though, particularly in polymorphic-function contexts. Also, there might be other cases besides arrays where we've embedded type OIDs in on-disk data; anyone remember? composite types. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 9:52 AM, Bruce Momjian br...@momjian.us wrote: On Wed, Mar 5, 2014 at 09:19:33AM -0600, Merlin Moncure wrote: On Wed, Mar 5, 2014 at 8:39 AM, Bruce Momjian br...@momjian.us wrote: So, I am going to ask a back-track question and ask why we can't move hstore into core. This is exactly the opposite of what should be happening. Now, jsonb might make it into core because of the json precedent but the entire purpose of the extension system is stop dumping everything in the public namespace. Stuff 'in core' becomes locked in stone, forever, because of backwards compatibility concerns, which are IMNSHO, a bigger set of issues than even pg_upgrade related issues. Have we gone through all the new hstore functions and made sure they don't break existing applications? Putting things in core welds your only escape hatch shut. *All* non-sql standard types ought to be in extensions in an ideal world. I have seen your opinion on this but there have been enough counter-arguments that I am not ready to head in that direction. The only counter argument given is that this will prevent people from being able to use extensions because they A: can't or won't install contrib packages or B: are too stupid or lazy to type 'create extension json'. Note I'm discussing 'in core extension vs in core built in'. 'out of core extension' loosely translates to 'can't be used by the vast majority of systems. Most corporate IT departments (including mine) have a policy of only installing packages through the operating system packaging to simplify management of deploying updates. Really strict companies might not even allow anything but packages supplied by a vendor like redhat (which in practice keeps you some versions back from the latest). Now, if some crappy hosting company blocks contrib I don't believe at all that this should drive our project management decisions. Postgresql is both a database and increasingly a development language platform. Most good stacks have a system (cpan, npm, etgc) to manage the scope of the installed runtime and it's a routine and expected exercise to leverage that system. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 10:19 AM, Andres Freund and...@2ndquadrant.com wrote: There's the absolutely significant issue that you cannot reasonably write extensions that interact on a C level. You can't call from extension to extension directly, but you can from extension to pg core provided ones. Certainly. Note I never said that the internal .so can't be in core that both extensions interface with and perhaps wrap. It would be nice to have a intra-extension call system worked out but that in no way plays to the bigger issues at stake. This is all about management of the public API; take a good skeptical look at the history of types like xml, json, geo, money and others. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: *All* non-sql standard types ought to be in extensions in an ideal world. While there's certainly much to be said for the idea that jsonb should be an extension, I don't think we have the technology to package it as a *separate* extension; it'd have to be included in the hstore extension. I disagree with that. The shared C bits can live inside the core system and the SQL level hooks and extension specific behaviors could live in an extension. AFAICT moving jsonb to extension is mostly a function of migrating the hard coded SQL hooks out to an extension (I'm probably oversimplifying though). Just out of curiosity, exactly what features are missing from jsonb today that are available with hstore? How long would it take to copy-and-paste all that code, if someone were to decide to do the work instead of argue about it? Basically opclasses, operators (particularly search operators) and functions/operators to manipulate the hstore in place. Personally I'm not inclined to copy/paste the code. I'd also like to see this stuff committed, and don't want to hold up the patch for that unless it's determined for other reasons (and by other people) this is the only reasonable path for 9.4. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 10:43 AM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: *All* non-sql standard types ought to be in extensions in an ideal world. While there's certainly much to be said for the idea that jsonb should be an extension, I don't think we have the technology to package it as a *separate* extension; it'd have to be included in the hstore extension. I disagree with that. The shared C bits can live inside the core system and the SQL level hooks and extension specific behaviors could live in an extension. AFAICT moving jsonb to extension is mostly a function of migrating the hard coded SQL hooks out to an extension (I'm probably oversimplifying though). Yeah, from what I gather you're suggesting, that's more-or-less move it all to core, except that all of the actual interface bits end up in an extension that has to be installed to use what would have to already be there. I don't see that as any kind of improvement. If you don't then you simply have not been paying attention to the endless backwards compatibility problems we've faced which are highly ameliorated in an extension heavy world. Also, you're ignoring the fact that having an endlessly accreting set of symbols in the public namespace is not free. Internal C libraries don't have to be supported and don't have any signficant user facing costs by simply being there. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: While there's certainly much to be said for the idea that jsonb should be an extension, I don't think we have the technology to package it as a *separate* extension; it'd have to be included in the hstore extension. I disagree with that. The shared C bits can live inside the core system and the SQL level hooks and extension specific behaviors could live in an extension. That approach abandons every bit of value in an extension, no? You certainly don't get to fix bugs outside a core-system release cycle. That's core vs non core debate. Just about everyone (including me) wants json and hstore to live in core -- meaning packaged, shipped, supported, and documented with the postgresql source code releases. Only an elite set of broadly useful and popular extensions get that honor of which json is most certainly one. Moreover, you give up nothing except the debate/approval issues to get your code in core. If you want to release off cycle, you can certainly do that and enterprising users can simply install the extension manually (or perhaps via pgxn) instead of via contrib. BTW,This is yet another thing that becomes impossible if you don't extension (on top of legacy/backwards compatibility issues and general bloat which is IMNSHO already a pretty severe situation). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 11:44 AM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: On Wed, Mar 5, 2014 at 10:43 AM, Stephen Frost sfr...@snowman.net wrote: Yeah, from what I gather you're suggesting, that's more-or-less move it all to core, except that all of the actual interface bits end up in an extension that has to be installed to use what would have to already be there. I don't see that as any kind of improvement. If you don't then you simply have not been paying attention to the endless backwards compatibility problems we've faced which are highly ameliorated in an extension heavy world. We have backwards compatibility problems because we don't want to *break* things for people. Moving things into extensions doesn't magically fix that- if you break something in a backwards-incompatible way then you're going to cause a lot of grief for people. It doesn't magically fix it, but at least provides a way forward. If the function you want to modify is in an extension 'foo', you get to put your new stuff in 'foo2' extension. That way your users do not have to adjust all the code you would have broken. Perhaps for in-core extensions you offer the old one in contrib for a while until a reasonable amount of time passes then move it out to pgxn. This is a vastly better system than the choices we have now, which is A. break code or B. do nothing. Also, you're ignoring the fact that having an endlessly accreting set of symbols in the public namespace is not free. Internal C libraries don't have to be supported and don't have any signficant user facing costs by simply being there. I *really* hate how extensions end up getting dumped into the public schema and I'm not a big fan for having huge search_paths either. At least with extensions you have control over this. mentioned earlier- I'm also not advocating that everything be put into core. I don't follow what you mean by Internal C libraries don't have to be supported because, I mean, we are free to change them or delete them. They do not come with the legacy that user facing API comes. They also do not bloat the public namespace. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 2:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Merlin Moncure escribió: It doesn't magically fix it, but at least provides a way forward. If the function you want to modify is in an extension 'foo', you get to put your new stuff in 'foo2' extension. That way your users do not have to adjust all the code you would have broken. Perhaps for in-core extensions you offer the old one in contrib for a while until a reasonable amount of time passes then move it out to pgxn. Uhm. Would it work to define a new version of foo, say 2.0, but keep the old 1.2 version the default? That way, if you want to keep the old foo you do nothing (after both fresh install and pg_upgrade), and if you want to upgrade to the new code, it's just an ALTER EXTENSION UPDATE away. Certainly. The important point though is that neither option is available if the old stuff is locked into the public namespace. Consider various warts like the array ('array_upper' et al) API or geo types. We're stuck with them. Even with jsonb: it may be the hot new thing *today* but 5 years down the line there's json2 that does all kinds of wonderful things we haven't thought about -- what if it displaces current usages? The very same people who are arguing that jsonb should not be in an extension are the ones arguing json is legacy and to be superseded. These two points of view IMO are directly in conflict: if json would have been an extension than the path to deprecation is clear. Now the json functions are in the public namespace. Forever (or at least for a very long time). On Wed, Mar 5, 2014 at 2:46 PM, Stephen Frost sfr...@snowman.net wrote: I don't see why we can't do exactly what you're suggesting in core. Because you can't (if you're defining core to mean 'not an extension'). Functions can't be removed or changed because of legacy application support. In an extension world, they can -- albeit not 'magically', but at least it can be done. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Mar 5, 2014 at 4:24 PM, Stephen Frost sfr...@snowman.net wrote: * Merlin Moncure (mmonc...@gmail.com) wrote: On Wed, Mar 5, 2014 at 2:46 PM, Stephen Frost sfr...@snowman.net wrote: I don't see why we can't do exactly what you're suggesting in core. Because you can't (if you're defining core to mean 'not an extension'). Functions can't be removed or changed because of legacy application support. In an extension world, they can -- albeit not 'magically', but at least it can be done. That simply isn't accurate on either level- if there is concern about application support, that can apply equally to core and contrib, and we certainly *can* remove and/or redefine functions in core with sufficient cause. It's just not something we do lightly for things living in either core or contrib. For an example, consider the FDW API, particularly what we did between 9.1 and 9.2. Well, we'll have to agree to disagree I suppose. Getting back on topic, the question is 'what about jsonb/hstore2'? At this point my interests are practical. I promised (heh) to bone up the docs. I'm on vacation this weekend so it's looking like around sometime late next week for that. In particular, it'd be helpful to get some kind of read on the final disposition of hstore2. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Mar 4, 2014 at 6:48 AM, Teodor Sigaev teo...@sigaev.ru wrote: On Tue, Mar 4, 2014 at 2:44 AM, Teodor Sigaev teo...@sigaev.ru wrote: Do we have function to trim right zeros in numeric? Fixed, pushed to github (https://github.com/feodor/postgres/tree/jsonb_and_hstore). Now it used hash_numeric to index numeric value. As I can see, it provides needed trim and doesn't depend on locale. Possible mismatch (the same hash value for different numeric valye) will rechecked anyway - interested operations set recheck flag. huh. what it is the standard for equivalence? I guess we'd be following javascript ===, right? (http://dorey.github.io/JavaScript-Equality-Table/). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Feb 28, 2014 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-28 14:45:29 -0500, Andrew Dunstan wrote: Well, the jsonb portion of this is arguably the most ready, certainly it's had a lot more on-list review. Having crossread both patches I tend to agree with this. I don't think it's unrealistic to get jsonb committable, but the hstore bits are another story. hm, do you have any specific concerns/objections about hstore? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Feb 28, 2014 at 8:57 AM, Stephen Frost sfr...@snowman.net wrote: The specific issues mentioned on this thread look more like bugs to be addressed or additional operators which need to be implemented for jsonb (imv, that should really be done for 9.4, but we have this deadline looming...) along with perhaps dropping the implicit cast between json and hstore (is there really a need for it..?). Bugs/bad behaviors should be addressed (which AFAICT are mostly if not all due to implicit casts). Missing operators OTOH are should not hold up the patch, particuarly when the you have the option of an explicit cast to hstore if you really want them. Notwithstanding some of the commentary above, some of jsonb features (in particular, the operators) are quite useful and should find regular usage (json has them also, but jsonb removes the performance penalty). The upshot is that with the current patch you have to do a lot of casting to get 100% feature coverage and that future improvements to jsonb will remove the necessity of that. Also the hstore type will be required to do anything approximating the nosql pattern. I don't think the extension issue is a deal breaker either way. While I have a preference for extensions generally, this is nothing personal to jsonb. And if we can't come to a consensus on that point the patch should be accepted on precedent (json being in core). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Feb 28, 2014 at 1:45 PM, Andrew Dunstan and...@dunslane.net wrote: Well, the jsonb portion of this is arguably the most ready, certainly it's had a lot more on-list review. That is definitely true. Also, the jsonb type does not introduce any new patterns that are not already covered by json -- it just does some things better/faster (and, in a couple of cases, a bit differently) so there's a safe harbor. The implicit casts snuck in after the review started -- that was a mistake obviously (but mostly with hstore). The side argument of 'to extension or not' is just that. Make a decision and commit this thing. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Feb 27, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote: However, we had this discussion already in November-December, which resulted in the current patch. Now you and Robert want to change the rules on Andrew, which means Andrew is ready to quit, and we go another year without JSON indexing. How we got here is not the point. All that matters is what's going to happen from here. Here are the facts as I see them: 1) we've worked ourselves into a situation where we're simultaneously developing two APIs that do essentially exactly the same thing (hstore and jsonb). Text json is not the problem and is irrelevant to the discussion. 2) The decision to do that was made a long time ago. I complained loudly as my mousy no-programming-only-griping voice would allow here: http://postgresql.1045698.n5.nabble.com/JSON-Function-Bike-Shedding-tp5744932p5746152.html. The decision was made (and Robert cast one of the deciding votes in support of that decision) to bifurcate hstore/json. I firmly believe that was a mistake but there's no point in revisiting it. Done is done. 3) In it's current state jsonb is not very useful and we have to recognize that; it optimizes text json but OTOH covers, maybe 30-40% of what hstore offers. In particular, it's missing manipulation and GIST/GIN. The stuff it does offer however is how Andrew, Josh and others perceive the API will be used and I defer to them with the special exception of deserialization (the mirror of to_json) which is currently broken or near-useless in all three types. Andrew recognized that and has suggested a fix; even then to me it only matters to the extent that the API is clean and forward compatible. Here are the options on the table: a) Push everything to 9.5 and introduce out of core hstore2/jsonb extensions to meet market demand. Speaking practically, 'out of core' translates to Can't be used to most industrial IT shops. I hate this option but recognize it's the only choice if the code isn't ready in time. b) Accept hstore2 but push jsonb on the premise they should be married in some way or that jsonb simply isn't ready. I'm not a fan of this option either unless Andrew specifically thinks it's a good idea. The stuff that is there seems to work pretty well (again, except deserialization which I haven't tested recently) and the jsonb patterns that are in place have some precedent in terms of the text json type. c) Accept hstore2 and jsonb as in-core extensions (assuming code worthiness). Since extensions can't call into each other (this really ought to be solved at some point) this means a lot of code copy/pasto. The main advantage here is that it reduces the penalty of failure and avoids pollution of the public schema. I did not find the rationale upthread that there was a stigma to in-core extensions in any way convincing. In fact I'd go further and suggest that we really ought to have a project policy to have all non-SQL standard functions, operators and types as extensions from here on out. Each in-core type introduction after having introduced the extension system has left me scratching my head. d) The status quo. This essentially means we'll have to liberally document how things are (to avoid confusing our hapless users) and take Andrew at his word that a separate extension will materialize making jsonb more broadly useful. The main concern here is that the market will vote with their feet and adopt hstore API style broadly, sticking us with a bunch of marginally used functions in the public namespace to support forever. My personal preference is c) but am perfectly ok with d), particularly if there was more visibility into the long term planning. Good documentation will help either way and that's why I signed up for it. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 10:07 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 02/26/2014 06:21 AM, Merlin Moncure wrote: On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 12:12 PM, Robert Haas wrote: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Awaiting doc patch from Merlin, then. It will need to be clear enough that an ordinary user can distinguish which type they want. Sure. Please also highlight that any change will require a full table rewrite with an exclusive lock, so data type choices on larger tables may be hard to change later. Yeah. Good idea. Also gonna make a table of what happens when you cast from A to B (via text, json, jsonb, hstore). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 02/25/2014 08:54 PM, Josh Berkus wrote: That's called a straw man argument, Robert. Me: We should recommend that people use jsonb unless they have a specific reason for using json. We could also make the opposite argument - people use json unless they have a specific reason for using jsonb. btw, there is one more thing about JSON which I recently learned - a lot of JavaScript people actually expect the JSON binary form to retain field order It is not in any specs, but nevertheless all major imlementations do it and some code depends on it. IIRC, this behaviour is currently also met only by json and not by jsonb. Yes: This was the agreement that was struck and is the main reason why there are two json types, not one. JSON does not guarantee field ordering as I read the spec and for the binary form ordering is not maintained as a concession to using the hstore implementation. You can always use the standard text json type for storage and cast into the index for searching; what you give up there is some performance and the ability to manipulate the json over the hstore API. I think that will have to do for now and field ordering for hstore/jsonb can be reserved as a research item. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 26, 2014 at 11:41 AM, Josh Berkus j...@agliodbs.com wrote: On 02/26/2014 07:02 AM, Merlin Moncure wrote: On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote: It is not in any specs, but nevertheless all major imlementations do it and some code depends on it. IIRC, this behaviour is currently also met only by json and not by jsonb. Yes: This was the agreement that was struck and is the main reason why there are two json types, not one. JSON does not guarantee field ordering as I read the spec and for the binary form ordering is not maintained as a concession to using the hstore implementation. Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field ordering. So users who are familiar with JSONish data *storage* should be aware that field ordering is not preserved. right (although I'm not sure what wasn't true there). I think the status quo is fine; If you have to have the document precisely preserved for whatever reason you can do that -- you just have to be prepared to give up some things. As noted in the other thread serialization is more interesting but that also works fine. The breakdown in terms of usage between json/jsonb to me is very clear (json will handle serialization/deserializaton heavy patterns and a few edge cases for storage). The split between json and jsonb in hindsight made a lot of sense. What is not going to be so clear for users (particularly without good supporting documentation) is how things break down in terms of usage between hstore and jsonb. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 26, 2014 at 12:05 PM, Josh Berkus j...@agliodbs.com wrote: On 02/26/2014 09:57 AM, Merlin Moncure wrote: What is not going to be so clear for users (particularly without good supporting documentation) is how things break down in terms of usage between hstore and jsonb. Realistically? Once we get done with mapping the indexes and operators, users who are used to Hstore1 use Hstore2, and everyone else uses jsonb. jsonb is nothing other than a standardized syntax interface to hstore2, and most users will choose the syntax similar to what they already know over learning new stuff. The problem is that as of today, they are not done and AFAICT will not be for 9.4. Developers wanting to utilize the nosql pattern are going to have to lean heavily on hstore API and that's a simple fact...people reading about all the great new feature of postgres are going to want to learn how to do things and it's reasonable to want to anticipate the things they want to do and explain how to use them. I would like to extend that case coverage to include the json type as well as its documentation is pretty lousy for that (I should know: I wrote most of it). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote: On 02/24/2014 11:06 AM, Merlin Moncure wrote: On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote: I still find the phrasing as jsonb is more efficient for most purposes to be a bit off Basically, the text json type is faster for serialization/deserialization pattern (not just document preservation) and jsonb is preferred when storing json and doing repeated subdocument accesses. Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. [sorry for noise on this]. Here's the use case coverage as I see it today: CASE:jsonjsonb hstore Static document: yes poor poor Precise document:yes nono Serialization: yes nono Deserialization: poor*** yes* no Repeated Access: poor yes yes Manipulation:no no** yes GIST/GIN searching: no no** yes notes: * jsonb gets 'yes' for deserialization assuming andrew's 'two level' deserialization fix goes in (otherwise 'poor'). ** jsonb can't do this today, but presumably will be able to soon *** 'poor' unless json type also gets the deserialization fix, then 'yes'. hstore can deserialize hstore format, but will rely on json/jsonb for deserializing json 'Static document' represents edge cases where the json is opaque to the database but performance -- for example large map polygons. 'Precise document' represents cases where whitespace or key order is important. Peter asked upthread how to access the various features. Well, today, it basically means a bit of nimble casting to different structures depending on which particular features are important to you, which IMNSHO is not bad at all as long as we understand that most people who rely on jsonb will also need hstore for its searching and operators. Down the line when hstore and jsonb are more flushed out it's going to come down to an API style choice. Frankly, a lot of the above doesn't make much sense to me. WTF is Manipulation'? Unless I see much more actual info on the tests being conducted it's just about impossible to comment. The performance assessment at this stage is simply anecdotal as far as I'm concerned. Er, I wasn't making performance assessments (except in cases where it was obvious like poor support for arbitrary access with json) , but API coverage of use cases. Manipulation I thought obvious: the ability to manipulate the document (say, change some value to something else): the nosql pattern. through the API. Neither json or jsonb can do that at present...only hstore can. jsonb cant't; it only covers some of what json type currently covers (but some of the thing it does cover is much faster). On Mon, Feb 24, 2014 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote: Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. Aside from that, I want our docs to make a strong endorsement of using jsonb over json for most users. jsonb will continue to be developed and improved in the future; it is very unlikely that json will. Maybe that's what I should say rather than anything about efficiency. I would hope that endorsement doesn't extend to misinforming users. Moreover, json type is handling all serialization at present and will continue to do so for some years. In fact, in this release we got a bunch of new very necessary enhancements (json_build) to serialization! You're trying to deprecate and enhance the type at the same time! The disconnect here is that your statements would be correct if the only usage for the json type would be for storing data in json. However, people (including myself) are doing lots of wonderful things storing data in the traditional way and moving into and out of json in queries and that, besides working better in the json type, is only possible in json. That might change in the future by figuring out a way to cover json serialization cases through jsonb but that's not how things work today, end of story. Look, I definitely feel the frustration and weariness here in terms of my critiquing the proposed API along with the other arguments I've made. Please understand that nobody wants this to go out the door more than me if the objective is to lock in the API 'as is' then let's be polite to our users and try to document various use cases and what's good at what. merlin -- Sent via pgsql-hackers mailing list
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote: On 02/25/2014 12:12 PM, Robert Haas wrote: I don't agree that jsonb should be preferred in all but a handful of situations. Nor do I agree that partisanship belongs in our documentation. Therefore, -1 for your proposal to recommend that, and +1 for Merlin's proposal to present a comparison which fairly illustrates the situations in which each will outperform the other. Awaiting doc patch from Merlin, then. It will need to be clear enough that an ordinary user can distinguish which type they want. Sure. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 12:20 AM, Josh Berkus j...@agliodbs.com wrote: All, Here's a draft cleanup on the JSON section of the Datatype docs. Since there's been a bunch of incremental patches on this, I just did a diff against HEAD. I looked over json-functions a bit, but am not clear on what needs to change there; the docs are pretty similar to other sections of Functions, and if they're complex it's because of the sheer number of JSON-related functions. Anyway, this version of datatypes introduces a comparison table, which I think should make things a bit clearer for users. I still find the phrasing as jsonb is more efficient for most purposes to be a bit off Basically, the text json type is faster for serialization/deserialization pattern (not just document preservation) and jsonb is preferred when storing json and doing repeated subdocument accesses. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote: I still find the phrasing as jsonb is more efficient for most purposes to be a bit off Basically, the text json type is faster for serialization/deserialization pattern (not just document preservation) and jsonb is preferred when storing json and doing repeated subdocument accesses. Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com wrote: On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote: I still find the phrasing as jsonb is more efficient for most purposes to be a bit off Basically, the text json type is faster for serialization/deserialization pattern (not just document preservation) and jsonb is preferred when storing json and doing repeated subdocument accesses. Hm, I'm going to withdraw that. I had done some testing of simple deserialization (cast to text and the like) and noted that jsonb was as much as 5x slower. However, I just did some checking on json[b]_populate_recordset though and it's pretty much a wash. [sorry for noise on this]. Here's the use case coverage as I see it today: CASE:jsonjsonb hstore Static document: yes poor poor Precise document:yes nono Serialization: yes nono Deserialization: poor*** yes* no Repeated Access: poor yes yes Manipulation:no no** yes GIST/GIN searching: no no** yes notes: * jsonb gets 'yes' for deserialization assuming andrew's 'two level' deserialization fix goes in (otherwise 'poor'). ** jsonb can't do this today, but presumably will be able to soon *** 'poor' unless json type also gets the deserialization fix, then 'yes'. hstore can deserialize hstore format, but will rely on json/jsonb for deserializing json 'Static document' represents edge cases where the json is opaque to the database but performance -- for example large map polygons. 'Precise document' represents cases where whitespace or key order is important. Peter asked upthread how to access the various features. Well, today, it basically means a bit of nimble casting to different structures depending on which particular features are important to you, which IMNSHO is not bad at all as long as we understand that most people who rely on jsonb will also need hstore for its searching and operators. Down the line when hstore and jsonb are more flushed out it's going to come down to an API style choice. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Feb 11, 2014 at 3:35 AM, Hannu Krosing ha...@2ndquadrant.com wrote: On 02/11/2014 01:16 AM, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: It works in enough cases atm that it's worthwile trying to keep it working. Sure, it could be better, but it's what we have right now. Atm it's e.g. the only realistic way to copy larger amounts of bytea between servers without copying the entire cluster. That's the thing -- it might work today, but what about tomorrow? We'd be sending the wrong signals. People start building processes around all of this and now we've painted ourselves into a box. Better in my mind to simply educate users that this practice is dangerous and unsupported, as we used to do. I guess until now. It seems completely odd to me that we're attaching a case to the jsonb type, in the wrong way -- something that we've never attached to any other type before. For example, why didn't we attach a version code to the json type send function? JSON is supposed to be a *standard* way of encoding data in strings. If the ever changes, it will not be JSON type anymore. My point was that as we reserved the right to change jsonb binary format we'd probably want to reserve the right to change json's as well. This was in support of the theme of 'why is jsonb a special case?'. However, I think it's pretty much settled that the any potential concerns I raised in terms of providing a version flag are outweighed by it's potential usefulness. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. The server version *is* the version basically. If a broader solution exists I think it should be addressed broadly. Versioning one type only IMNSHO is a complete hack. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 12:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. Really? How would you expect to do that, exactly? In particular, how would you propose that a binary pg_dump file be reloadable if we redefine the binary format down the road without having made provision like this? Versioning one type only IMNSHO is a complete hack. I don't feel a need for versioning int, or float8, or most other types; and that includes the ones for which we've previously defined binary format as equivalent to text (enums). In this case we know that we're not totally satisfied with the binary format we're defining today, so I think a type-specific escape hatch is a reasonable solution. Moreover, I don't especially buy tying it to server version, even if we had an information pathway that would provide that reliably in all contexts. Why not? Furthermore what are we doing now? If we need a binary format contract that needs to be separated from this discussion. I've written (along with Andrew C) the only serious attempt to deal with client side binary format handling (http://libpqtypes.esilo.com/) and in all interesting cases it depends on the server version to define binary parsing behaviors. I agree WRT float8, etc but other types have changed in a couple of cases and it's always been with the version. I find it highly unlikely that any compatibility behaviors are going to be defined *for each and every returned datum* now, or ever...so even if it's a few bytes lost, why do it? Intra-version compatibility issues should they ever have to be handled would be more likely handled at connection- or query- time. Point being, if an escape hatch is needed, I'm near 100% certain this is not the right place to do it. Binary wire format compatibility is a complex topic and proposed solution ISTM is not at all fleshed out. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:02 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote: On 02/10/2014 05:05 AM, Andres Freund wrote: I'd suggest making the format discernible from possible different future formats, to allow introducing a proper binary at some later time. Maybe just send a int8 first, containing the format. Teodor privately suggested something similar. I was thinking of just sending a version byte, which for now would be '\x01'. An int8 seems like more future-proofing provision than we really need. Hm. Isn't that just about the same? I was thinking of the c type int8, not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than to do it manually inside the string. -1. Currently no other wire format types send version and it's not clear why this one is special. We've changed the wire format versions before and it's upon the client to deal with those changes. The server version *is* the version basically. If a broader solution exists I think it should be addressed broadly. Versioning one type only IMNSHO is a complete hack. I don't find that very convincing. The entire reason jsonb exists is because the parsing overhead of text json is significant, so it stands to reason that soon somebody will try to work on a better wire protocol, even if the current code cannot be made ready for 9.4. And I don't think past instability of binary type's formats is a good reason for *needlessly* breaking stuff like binary COPYs. And it's not like one prefixed byte has any real-world relevant cost. The point is, why does this one type get a version id? Imagine a hypothetical program that sent/received the binary format for jsonb. All you have to to is manage the version flag appropriately, right? Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. I guess you could argue that having a version flag could, say, allow libpq clients to gracefully error out if, say, a old non-exotic-format speaking libpq happens to connect to a newer sever -- assuming the client actually bothered to check the flag. That's zero help to the client though -- regardless the compatibility isn't established and that's zero help to other binary formats that we have=, and probably will continue to-, change. What about them? Are we now, at the upteenth hour of the final commit fest, suddenly deciding that binary wire formats going to be compatible across versions? The kinda low effort way to deal with binary format compatibility is to simply document the existing formats and document format changes in some convenient place. The 'real' long term path to doing it IMO is to abstract out a shared/client server type library with some protocol negotiation features. Then, at connection time, the client/server agree on what's the optimal way to send things -- perhaps the client can signal things like 'want compression for long datums'. The only case for a version flag at the data point level is if the server is sending version X at this tuple and version Y at that tuple. I don't think that's a makable case. Some might say, what about a compression bit based on compressibility/length? and to that I'd answer: why is that handling specific to the json type...are text/bytea/arrays not worth that feature too? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:38 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 17:35:12 -0600, Merlin Moncure wrote: Wrong. You still need to have code that checks the server version and see if it's supported (particularly for sending) and as there is *no protocol negotiation of the formats at present it's all going to boil down to if version = X do Y*. How does the server know which 'versions' are ok to send? It doesn't. Follow along with me here: Suppose we don't introduce a version flag today and change the format to some more exotic structure for 9.5. How has the version flag made things easier for the client? It hasn't. The client goes if version = X do Y. think of binary COPY outputting data in 9.4 and then trying to import that data into 9.5. That's the interesting case here. right, json could be made work, but any other format change introduced to any other already existing type will break. That's not a real solution unless we decree henceforth that no formats will change from here on in, in which case I withdraw my objection. I think COPY binary has exactly the same set of considerations as the client side. If you want to operate cleanly between versions (which has never been promised in the past), you have to encode in a header the kinds of things the server would need to parse it properly. Starting with, but not necessarily limited to, the encoding server's version. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote: It works in enough cases atm that it's worthwile trying to keep it working. Sure, it could be better, but it's what we have right now. Atm it's e.g. the only realistic way to copy larger amounts of bytea between servers without copying the entire cluster. That's the thing -- it might work today, but what about tomorrow? We'd be sending the wrong signals. People start building processes around all of this and now we've painted ourselves into a box. Better in my mind to simply educate users that this practice is dangerous and unsupported, as we used to do. I guess until now. It seems completely odd to me that we're attaching a case to the jsonb type, in the wrong way -- something that we've never attached to any other type before. For example, why didn't we attach a version code to the json type send function? Wasn't the whole point of this is that jsonb send/recv be more spiritually closer to json? If we want to introduce alternative type formats in the 9.5 cycle, why can't we attach version based encoding handling to *that* problem? The more angles I look at this from the more it looks messy and rushed. Notwithstanding all the above, I figure here enough smart people disagree (once again, heh) to call it consensus. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.us wrote: Merlin Moncure mmonc...@gmail.com writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. That's completely unfair. I'm arguing *not* to attach version dependency expectations to the jsonb type, at all, not the other way around. If you want to do that, fine, but do it *later* as in, 9.5, or beyond. I just gave an example of how binary format changes could be worked in later. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Monday, February 10, 2014, Andres Freund and...@2ndquadrant.com wrote: On 2014-02-10 19:01:48 -0600, Merlin Moncure wrote: On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.usjavascript:; wrote: Merlin Moncure mmonc...@gmail.com javascript:; writes: On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com javascript:; wrote: And if we add a new format version in 9.5 we need to make it discernible from the 9.4 format. Without space for a format indicator we'd have to resort to ugly tricks like defining the high bit in the first byte set indicates the new version. I don't see the improvement here. Point being: a 9.5 binary format reading server could look for a magic token in the beginning of the file which would indicate the presence of a header. The server could then make intelligent decisions about reading data inside the file which would be follow exactly the same kinds of decisions binary format consuming client code would make. Perhaps it would be a simple check on version, or something more complex that would involve a negotiation. The 'format' indicator, should version not be precise enough, needs to be in the header, not passed with every instance of the data type, and certainly not for one type in the absence of others. Basically, you want to move the goalposts to somewhere that's not only out of reach today, but probably a few counties away from the stadium. I don't see this happening at all frankly, because nobody has been interested enough to work on something like it up to now. And I definitely don't see it as appropriate to block improvement of jsonb until this happens. That's completely unfair. I'm arguing *not* to attach version dependency expectations to the jsonb type, at all, not the other way around. If you want to do that, fine, but do it *later* as in, 9.5, or beyond. I just gave an example of how binary format changes could be worked in later. Comeon. Your way requires building HEAPS of new and generic infrastructure in 9.5 and would only work for binary copy. The proposed way requires about two lines of code. Without the generic infrastructure we'd end up relying on some intracacies like the meaning of high bit in the first byte or such. Anyway, that's it on this subthread from me Fair enough. I'll concede the point. merlin
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 12:44 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: send/recv functions are also needed for binary-format COPY. IMHO jsonb must have send/recv functions. All other built-in types have them, except for types like 'smgr', 'aclitem' and 'any*' that no-one should be using as column types. Yes -- completely agree. I also consider the hstore functionality (in particular, searching and access operators) to be essential functionality. I'm actually surprised we have an alternate binary wire format for jsonb at all; json is explicitly text and I'm not sure what the use case of sending the internal structure is. Meaning, maybe jsonb send/recv should be a thin wrapper to sending the json string. The hstore send/recv I think properly covers the case where client side binary wire format actors would want to manage performance critical cases that want to avoid parsing. On Wed, Feb 5, 2014 at 1:21 AM, Oleg Bartunov obartu...@gmail.com wrote: Andrew provided us more information and we'll work on recv. What people think about testing this stuff ? btw, we don't have any regression test on this. I'm intensely interested in this work; I consider it to be transformative. I've *lightly* tested the jsonb/hstore functionality and so far everything is working. I still have concerns about the API. Aside from the stuff I mentioned upthread I find the API split between jsonb and hstore to be a little odd; a lot of useful bits (for example, the @ operator) come via the hstore type only. So these types are joined at the hip for real work which makes the diverging incomplete behaviors in functions like populate_record() disconcerting. Another point I'm struggling with is what jsonb brings to the table that isn't covered either hstore or json; working through a couple of cases I find myself not using the jsonb functionality except as a 'hstore json formatter' which the json type covers. I'm probably being obtuse, but we have to be cautious before plonking a couple of dozen extra functions in the public schema. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 10:22 AM, Andrew Dunstan and...@dunslane.net wrote: I'm actually surprised we have an alternate binary wire format for jsonb at all; json is explicitly text and I'm not sure what the use case of sending the internal structure is. Meaning, maybe jsonb send/recv should be a thin wrapper to sending the json string. The hstore send/recv I think properly covers the case where client side binary wire format actors would want to manage performance critical cases that want to avoid parsing. The whole reason we have jsonb is to avoid reparsing where possible Sure; but on the server side. The wire format is for handling client concerns. For example, the case you're arguing for would be for libpq client to extract as jsonb as binary, manipulate it on a binary level, then send it back as binary. I find this case to be something of a stretch. That being said, for binary dump/restore perhaps there's a performance case to be made. In fact, I'd rather have the send and recv functions in the jsonb code and have hstore's functions call them, so we don't duplicate code. yeah. Agree that there needs to be two sets of routines, not three. I think a case could be made for the jsonb type could take either json's or hstore's depending on if the above. FWIW, either way is fine. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote: If we had infinite time/manpower, this wouldn't really be an issue. We don't, though, and so I suggest that this may be one of the better things to toss overboard. The hstore send/recv functions have basically the same (copy/pasted/name adjusted) implementation. Since hstore will presumably remain (as the current hstore is) 'deep binary' and all of Andres's gripes apply to the hstore as well, this change buys us precisely zap from a time perspective; it comes down to which is intrinsically the better choice. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote: On 02/05/2014 07:48 AM, Merlin Moncure wrote: Another point I'm struggling with is what jsonb brings to the table that isn't covered either hstore or json; working through a couple of cases I find myself not using the jsonb functionality except as a 'hstore json formatter' which the json type covers. I'm probably being obtuse, but we have to be cautious before plonking a couple of dozen extra functions in the public schema. There's three reasons why it's worthwhile: 1) user-friendliness: telling users they need to do ::JSON and ::HSTORE2 all the time is sufficiently annoying -- and prone to causing errors -- to be a blocker to adoption by a certain, very numerous, class of user. That's a legitimate point of concern. But in and of itself I'm sure sure it warrants exposing a separate API. 2) performance: to the extent that we can operate entirely in JSONB and not transform back and forth to JSON and HSTORE, function calls (and index lookups) will be much faster. And given the competition, speed is important. Not following this. I do not see how the presence of jsonb helps at all. Client to server communication will be text-binary (and vice versa) and handling within the server itself will be in binary. This is the crux of my point. 3) growth: 9.4's JSONB functions are a prerequisite to developing richer JSON querying capabilities in 9.5 and later, which will go beyond JSON formatting for HSTORE. I kind of get this point. But in lieu of a practical use case today, what's the rush to implement? I fully anticipate I'm out on left field on this one (I have a cot and mini fridge there). The question on the table is: what use cases (performance included) does jsonb solve that is not solve can't be solved without it? With the possible limited exception of andrew's yet to be delivered enhanced deserialization routines, I can't think of any. If presented with reasonable evidence I'll shut my yap, pronto. Frankly, if it were entirely up to me HSTORE2 would be part of core and its only interface would be JSONB. But it's not. So this is a compromise. I don't. To be pedantic: hstore is in core, but packaged as an extension. That's a very important distinction. In fact, I'll go further and say it seem wise for all SQL standard type work to happen in extensions. As long as it's an in core contrib extension, I see no stigma to that whatsoever. It's not clear at all to me why the json type was put to the public schema and now we're about to double down with jsonb. Having things extension packaged greatly eases concerns about future API changes because if problems emerge it's not impossible to imagine compatibility extensions to appear to bridge the gap if certain critical functions change. That's exactly the sort of thing that we may want to happen here, I think. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 2:37 PM, Andrew Dunstan and...@dunslane.net wrote: The time for this discussion was months ago. I would not have spent many many hours of my time if I thought it was going to be thrown away. I find this attitude puzzling, to say the least. You were a major part of the discussion when we said OK, we'll leave json as it is (text based) and add jsonb. That's exactly what we're doing. certainly. I'll shut my yap; I understand your puzzlement. At the time though, I had assumed the API was going to incorporate more of the hstore feature set than it did. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Feb 5, 2014 at 3:03 PM, Josh Berkus j...@agliodbs.com wrote: That was the original goal. However, Oleg and Teodor's late delivery of Hstore2 limited what Andrew could do for JSONB before CF4 started. yeah. anyways, I'm good on this point. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund and...@2ndquadrant.com wrote: On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote: + para id=functions-json-table + xref linkend=functions-json-creation-table shows the functions that are + available for creating typejson/type values. + (see xref linkend=datatype-json) /para - table id=functions-json-table -titleJSON Support Functions/title + indexterm + primaryarray_to_json/primary + /indexterm + indexterm + primaryrow_to_json/primary + /indexterm + indexterm + primaryto_json/primary + /indexterm + indexterm + primaryjson_build_array/primary + /indexterm + indexterm + primaryjson_build_object/primary + /indexterm + indexterm + primaryjson_object/primary + /indexterm Hm, why are you collecting the indexterms at the top in the contrast to the previous way of collecting them at the point of documentation? diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 1ae9fa0..fd93d9b 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ tsvector.o tsvector_op.o tsvector_parser.o \ txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ - rangetypes_typanalyze.o rangetypes_selfuncs.o + rangetypes_typanalyze.o rangetypes_selfuncs.o \ + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. +/* + * for jsonb we always want the de-escaped value - that's what's in token + */ + strange newline. +static void +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) +{ + JsonbInState *_state = (JsonbInState *) state; + JsonbValue v; + + v.size = sizeof(JEntry); + + switch (tokentype) + { + ... + default:/* nothing else should be here in fact */ + break; Shouldn't this at least Assert(false) or something? +static void +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c) +{ + uint32 hentry = c JENTRY_TYPEMASK; + + if (hentry == JENTRY_ISNULL) + { + v-type = jbvNull; + v-size = sizeof(JEntry); + } + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR) + { + recvJsonb(buf, v, level + 1, (uint32) c); + } + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE) + { + v-type = jbvBool; + v-size = sizeof(JEntry); + v-boolean = (hentry == JENTRY_ISFALSE) ? false : true; + } + else if (hentry == JENTRY_ISNUMERIC) + { + v-type = jbvNumeric; + v-numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf), + Int32GetDatum(0), Int32GetDatum(-1))); + + v-size = sizeof(JEntry) * 2 + VARSIZE_ANY(v-numeric); What's the *2 here? +static void +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header) +{ + uint32 hentry; + uint32 i; This function and recvJsonbValue call each other recursively, afaics without any limit, shouldn't they check for the stack depth? + hentry = header JENTRY_TYPEMASK; + + v-size = 3 * sizeof(JEntry); *3? + if (hentry == JENTRY_ISOBJECT) + { + v-type = jbvHash; + v-hash.npairs = header JB_COUNT_MASK; + if (v-hash.npairs 0) + { + v-hash.pairs = palloc(sizeof(*v-hash.pairs) * v-hash.npairs); + Hm, if I understand correctly, we're just allocating JB_COUNT_MASK (which is 0x0FFF) * sizeof(*v-hash.pairs) bytes here, without any crosschecks about the actual length of the data? So with a few bytes the server can be coaxed to allocate a gigabyte of data? Since this immediately calls another input routine, this can be done in a nested fashion, quickly OOMing the server. I think this and several other places really need a bit more input sanity checking. + for (i = 0; i v-hash.npairs; i++) + { + recvJsonbValue(buf, v-hash.pairs[i].key, level, pq_getmsgint(buf, 4)); + if (v-hash.pairs[i].key.type != jbvString) + elog(ERROR, jsonb's key could be only a string); Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a few other places. +char * +JsonbToCString(StringInfo out, char *in, int estimated_len) +{ + boolfirst = true; + JsonbIterator *it; + int type; + JsonbValue
Re: [HACKERS] jsonb and nested hstore
On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com wrote: Hmm, neither me, nor Teodor have experience and knowledge with populate_record() and moreover hstore here is virgin and we don't know the right behaviour, so I think we better take it from jsonb, once Andrew realize it. Andrew ? Andrew Gierth wrote the current implementation of htsore populate_record IIRC. Unfortunately the plan for jsonb was to borrow hstore's (I don't think hstore can use the jsonb implementation because you'd be taking away the ability to handle internally nested structures it currently has). Of my two complaints upthread, the second one, not being able to populate from and internally well formed structure, is by far the more serious one I think. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Jan 31, 2014 at 8:45 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/31/2014 08:57 AM, Merlin Moncure wrote: On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com wrote: Hmm, neither me, nor Teodor have experience and knowledge with populate_record() and moreover hstore here is virgin and we don't know the right behaviour, so I think we better take it from jsonb, once Andrew realize it. Andrew ? Andrew Gierth wrote the current implementation of htsore populate_record IIRC. Unfortunately the plan for jsonb was to borrow hstore's (I don't think hstore can use the jsonb implementation because you'd be taking away the ability to handle internally nested structures it currently has). Of my two complaints upthread, the second one, not being able to populate from and internally well formed structure, is by far the more serious one I think. Umm, I think at least one of us is seriously confused. I am going to look at dealing with these issues in a way that can be used by both - at least the populate_record case. As far as populate_record goes, there is a bit of an impedance mismatch, since json/hstore records are heterogenous and one-dimensional, whereas sql arrays are homogeneous and multidimensional. Right now I am thinking I will deal with arrays up to two dimensions, because I can do that relatively simply, and after that throw in the towel. That will surely deal with 99.9% of use cases. Of course this would be documented. Anyway, Let me see what I can do. If Andrew Gierth wants to have a look at fixing the hstore() side that might help speed things up. (ah, you beat me to it.) Disregard my statements above. It works. postgres=# select jsonb_populate_record(null::x, hstore(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb); jsonb_populate_record - (1,{(1,\\{(1,{1,2})}\\)}) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Fri, Jan 31, 2014 at 9:26 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/31/2014 09:53 AM, Merlin Moncure wrote: On Fri, Jan 31, 2014 at 8:45 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/31/2014 08:57 AM, Merlin Moncure wrote: On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com wrote: Hmm, neither me, nor Teodor have experience and knowledge with populate_record() and moreover hstore here is virgin and we don't know the right behaviour, so I think we better take it from jsonb, once Andrew realize it. Andrew ? Andrew Gierth wrote the current implementation of htsore populate_record IIRC. Unfortunately the plan for jsonb was to borrow hstore's (I don't think hstore can use the jsonb implementation because you'd be taking away the ability to handle internally nested structures it currently has). Of my two complaints upthread, the second one, not being able to populate from and internally well formed structure, is by far the more serious one I think. Umm, I think at least one of us is seriously confused. I am going to look at dealing with these issues in a way that can be used by both - at least the populate_record case. As far as populate_record goes, there is a bit of an impedance mismatch, since json/hstore records are heterogenous and one-dimensional, whereas sql arrays are homogeneous and multidimensional. Right now I am thinking I will deal with arrays up to two dimensions, because I can do that relatively simply, and after that throw in the towel. That will surely deal with 99.9% of use cases. Of course this would be documented. Anyway, Let me see what I can do. If Andrew Gierth wants to have a look at fixing the hstore() side that might help speed things up. (ah, you beat me to it.) Disregard my statements above. It works. postgres=# select jsonb_populate_record(null::x, hstore(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb); jsonb_populate_record - (1,{(1,\\{(1,{1,2})}\\)}) Actually, there is a workaround to the limitations of hstore(record): yeah I'm ok with hstore() function as it is. That also eliminates backwards compatibility concerns so things worked out. The only 'must fix' 9.4 facing issue I see on the table is to make sure jsonb populate function is forward compatible with future expectations of behavior which to me means zeroing in on the necessity of the as_text argument (but if you can expand coverage without jeopardizing 9.4 inclusion than great...). For my part I'm going to continue functionally testing the rest of the API (so far, a cursory look hasn't turned up anything else). I'm also signing up for some documentation refinements which will be done after you nail down these little bits but before the end of the 'fest. IMNSHO, formal code review needs to begin ASAP (salahaldin is the reviewer per the fest wiki) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Jan 30, 2014 at 9:50 AM, Andrew Dunstan and...@dunslane.net wrote: Now, if we're agreed on that, I then also wonder if the 'as_text' argument needs to exist at all for the populate functions except for backwards compatibility on the json side (not jsonb). For non-complex structures it does best effort casting anyways so the flag is moot. Well, I could certainly look at making the populate_record{set} and to_record{set} logic handle types that are arrays or composites inside the record. It might not be terribly hard to do - not sure. A quick analysis suggests that this is fixable with fairly minimal disturbance in the jsonb case. In the json case it would probably involve reparsing the inner json. That's probably doable, because the routines are all reentrant, but not likely to be terribly efficient. It will also be a deal more work. Right. Also the text json functions are already in the wild anyways -- that's not in the scope of this patch so if they need to be fixed that could be done later. ISTM then the right course of action is to point jsonb 'populate' variants at hstore implementation, not the text json one and remove the 'as text' argument. Being able to ditch that argument is the main reason why I think this should be handled now (not forcing hstore dependency to handle complex json is gravy). People handling json as text would then invoke a ::jsonb cast trading off performance for flexibility which is perfectly fine. If you agree, perhaps we can HINT the error in certain places that return ERROR: cannot call json_populate_record on a nested object that the jsonb variant can be used as a workaround. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Jan 30, 2014 at 1:07 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2014 04:56 PM, Andrew Dunstan wrote: On 01/29/2014 01:03 PM, Andrew Dunstan wrote: On 01/27/2014 10:43 PM, Andrew Dunstan wrote: On 01/26/2014 05:42 PM, Andrew Dunstan wrote: Here is the latest set of patches for nested hstore and jsonb. Because it's so large I've broken this into two patches and compressed them. The jsonb patch should work standalone. The nested hstore patch depends on it. All the jsonb functions now use the jsonb API - there is no more turning jsonb into text and reparsing it. At this stage I'm going to be starting cleanup on the jsonb code (indentation, error messages, comments etc.) as well get getting up some jsonb docs. Here is an update of the jsonb part of this. Charges: * there is now documentation for jsonb * most uses of elog() in json_funcs.c are replaced by ereport(). * indentation fixes and other tidying. No changes in functionality. Further update of jsonb portion. Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. Updated to apply cleanly after recent commits. Updated patches for both pieces. Included is some tidying done by Teodor, and fixes for remaining whitespace issues. This now passes git diff --check master cleanly for me. Something seems off: postgres=# create type z as (a int, b int[]); CREATE TYPE postgres=# create type y as (a int, b z[]); CREATE TYPE postgres=# create type x as (a int, b y[]); CREATE TYPE -- test a complicated construction postgres=# select row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x; row - (1,{(1,\\{(1,{1,2})}\\)}) postgres=# select hstore(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x); hstore -- a=1, b={\(1,\\\{\\(1,\\{1,2}\\)\\}\\\)\} here, the output escaping has leaked into the internal array structures. istm we should have a json expressing the internal structure. It does (weirdly) map back however: postgres=# select populate_record(null::x, hstore(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x)); populate_record - (1,{(1,\\{(1,{1,2})}\\)}) OTOH, if I go via json route: postgres=# select row_to_json(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x); row_to_json --- {a:1,b:[{a:1,b:[{a:1,b:[1,2]}]}]} so far, so good. let's push to hstore: postgres=# select row_to_json(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb::hstore; row_to_json --- a=1, b=[{a=1, b=[{a=1, b=[1, 2]}]}] this ISTM is the 'right' behavior. but what if we bring it back to record object? postgres=# select populate_record(null::x, row_to_json(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb::hstore); ERROR: malformed array literal: {{a=1, b={{a=1, b={1, 2} yikes. The situation as I read it is that (notwithstanding my comments upthread) there is no clean way to slide rowtypes to/from hstore and jsonb while preserving structure. IMO, the above query should work and the populate function record above should return the internally structured row object, not the text escaped version. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Thu, Jan 30, 2014 at 4:52 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/30/2014 07:21 PM, Merlin Moncure wrote: postgres=# select hstore(row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x); hstore -- a=1, b={\(1,\\\{\\(1,\\{1,2}\\)\\}\\\)\} here, the output escaping has leaked into the internal array structures. istm we should have a json expressing the internal structure. What has this to do with json at all? It's clearly a failure in the hstore() function. yeah -- meant to say 'hstore' there. Also I'm not sure that it's 'wrong'; it's just doing what it always did. That brings up another point: are there any interesting cases of compatibility breakage? I'm inclined not to care about this particular case though... array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb::hstore); ERROR: malformed array literal: {{a=1, b={{a=1, b={1, 2} yikes. The situation as I read it is that (notwithstanding my comments upthread) there is no clean way to slide rowtypes to/from hstore and jsonb while preserving structure. IMO, the above query should work and the populate function record above should return the internally structured row object, not the text escaped version. And this is a failure in populate_record(). I think we possibly need to say that handling of nested composites and arrays is an area that needs further work. OTOH, the refusal of json_populate_record() and json_populate_recordset() to handle these in 9.3 has not generated a flood of complaints, so I don't think it's a tragedy, just a limitation, which should be documented if it's not already. (And of course hstore hasn't handled nested anything before now.) Meanwhile, maybe Teodor can fix the two hstore bugs shown here. While not a flood, there certainly have been complaints. See http://postgresql.1045698.n5.nabble.com/Best-way-to-populate-nested-composite-type-from-JSON-td5770566.html http://osdir.com/ml/postgresql-pgsql-general/2014-01/msg00205.html But, if we had to drop this in the interests of time I'd rather see the behavior cauterized off so that it errored out 'not supported' (as json_populate does) that attempt to implement the wrong behavior. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)
On Tue, Jan 28, 2014 at 5:58 AM, Christian Kruse christ...@2ndquadrant.com wrote: Hi, On 28/01/14 13:51, Heikki Linnakangas wrote: Oh darn, I remembered we had already committed this, but clearly not. I'd love to still get this into 9.4. The latest patch (hugepages-v5.patch) was pretty much ready for commit, except for documentation. I'm working on it. I ported it to HEAD and currently doing some benchmarks. Next will be documentation. you mentioned benchmarks -- do you happen to have the results handy? (curious) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb generation functions
On Wed, Jan 29, 2014 at 9:08 AM, Andrew Dunstan and...@dunslane.net wrote: In the jsonb patch I have been working on, I have replicated all of what I call the json processing functions, and I will shortly add analogs for the new functions in that category json_to_record and json_to_recordset. However I have not replicated what I call the json generation functions, array_to_json, row_to_json, to_json, and the new functions json_build_array, json_build_object, and json_object, nor the aggregate functions json_agg and the new json_object_agg. The reason for that is that I have always used those for constructing json given to the client, rather than json stored in the database, and for such a use there would be no point in turning it into jsonb rather than generating the json string directly. However, I could be persuaded that we should have a jsonb analog of every json function. If we decide that, the next question is whether we have to have it now, or if it can wait. my 0.02$: it can wait -- possibly forever. Assuming the casts work I see absolutely no issue whatsover asking users to do: select xx_to_json(something complex)::jsonb; If you examine all the use cases json and jsonb, while they certainly have some overlap, are going to be used in different patterns. In hindsight the type bifurcation was a good thing ISTM. I don't think there should be any expectation for 100% match of the API especially since you can slide things around with casts. In fact, for heavy serialization at the end of the day it might be better to defer the jsonb creation to the final stage of serialization anyways (avoiding iterative insertion) even if we did match the API. (can't hurt to manage scope either considering the timelines for 9.4) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Jan 29, 2014 at 12:03 PM, Andrew Dunstan and...@dunslane.net wrote: Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. I think the opening paragraphs contrasting json/jsonb be needs refinement. json is going to be slightly faster than jsonb for input *and* output. For example, in one application I store fairly large json objects containing pre-compiled static polygon data that is simply flipped up to google maps. This case will likely be pessimal for jsonb. For the next paragaph, I'd like to expand it a bit on 'specialized needs' and boil it down to specific uses cases. Basically, json will likely be more compact in most cases and slightly faster for input/output; jsonb would be preferred in any context where processing, or searching or extensive server side parsing is employed. If you agree, I'd be happy to do that... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Wed, Jan 29, 2014 at 3:56 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/29/2014 01:03 PM, Andrew Dunstan wrote: On 01/27/2014 10:43 PM, Andrew Dunstan wrote: On 01/26/2014 05:42 PM, Andrew Dunstan wrote: Here is the latest set of patches for nested hstore and jsonb. Because it's so large I've broken this into two patches and compressed them. The jsonb patch should work standalone. The nested hstore patch depends on it. All the jsonb functions now use the jsonb API - there is no more turning jsonb into text and reparsing it. At this stage I'm going to be starting cleanup on the jsonb code (indentation, error messages, comments etc.) as well get getting up some jsonb docs. Here is an update of the jsonb part of this. Charges: * there is now documentation for jsonb * most uses of elog() in json_funcs.c are replaced by ereport(). * indentation fixes and other tidying. No changes in functionality. Further update of jsonb portion. Only change in functionality is the addition of casts between jsonb and json. The other changes are the merge with the new json functions code, and rearrangement of the docs changes to make them less ugly. Essentially I moved the indexterm tags right out of the table as is done in some other parts pf the docs. That makes the entry tags much clearer to read. Updated to apply cleanly after recent commits. ok, great. This is really fabulous. So far most everything feels natural and good. I see something odd in terms of the jsonb use case coverage. One of the major headaches with json deserialization presently is that there's no easy way to easily move a complex (record- or array- containing) json structure into a row object. For example, create table bar(a int, b int[]); postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, false); ERROR: cannot populate with a nested object unless use_json_as_text is true If find the use_json_as_text argument here to be pretty useless (unlike in the json_build to_record variants where it least provides some hope for an escape hatch) for handling this since it will just continue to fail: postgres=# select jsonb_populate_record(null::bar, '{a: 1, b: [1,2]}'::jsonb, true); ERROR: missing ] in array dimensions OTOH, the nested hstore handles this no questions asked: postgres=# select * from populate_record(null::bar, 'a=1, b={1,2}'::hstore); a | b ---+--- 1 | {1,2} So, if you need to convert a complex json to a row type, the only effective way to do that is like this: postgres=# select* from populate_record(null::bar, '{a: 1, b: [1,2]}'::json::hstore); a | b ---+--- 1 | {1,2} Not a big deal really. But it makes me wonder (now that we have the internal capability of properly mapping to a record) why *both* the json/jsonb populate record variants shouldn't point to what the nested hstore behavior is when the 'as_text' flag is false. That would demolish the error and remove the dependency on hstore in order to do effective rowtype mapping. In an ideal world the json_build 'to_record' variants would behave similarly I think although there's no existing hstore analog so I'm assuming it's a non-trival amount of work. Now, if we're agreed on that, I then also wonder if the 'as_text' argument needs to exist at all for the populate functions except for backwards compatibility on the json side (not jsonb). For non-complex structures it does best effort casting anyways so the flag is moot. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Mon, Jan 27, 2014 at 9:43 PM, Andrew Dunstan and...@dunslane.net wrote: On 01/26/2014 05:42 PM, Andrew Dunstan wrote: Here is the latest set of patches for nested hstore and jsonb. Because it's so large I've broken this into two patches and compressed them. The jsonb patch should work standalone. The nested hstore patch depends on it. All the jsonb functions now use the jsonb API - there is no more turning jsonb into text and reparsing it. At this stage I'm going to be starting cleanup on the jsonb code (indentation, error messages, comments etc.) as well get getting up some jsonb docs. Here is an update of the jsonb part of this. Charges: * there is now documentation for jsonb * most uses of elog() in json_funcs.c are replaced by ereport(). * indentation fixes and other tidying. No changes in functionality. Don't have time to fire it up this morning, but a quick scan of the patch turned up a few minor things: * see a comment typo, line 290 'jsonn': * line 332: 'bogus input' -- is this up to error reporting standards? How about value 'x' must be one of array, object, numeric, string, bool? * line 357: jsonb's key could be only a string prefer non possessive: jsonb keys must be a string * line 374, 389: ditto 332 * line 513: is panic appropriate here? * line 599: ditto * line 730: odd phrasing in comment, also commenting on this function is a little light * line 807: slightly prefer 'with respect to' * line 888: another PANIC: these maybe correct, seems odd to halt server on corrupted datum though* * line 1150: hm, is the jsonb internal hash structure documented? Aside: why didn't we use standard hash table (performance maybe)? * line 1805-6: poor phrasing. How about: it will order and make unique the hash keys. Otherwise we believe that pushed keys are ordered and unique. (Don't like verbed 'unqiue'). * line 1860: no break here: merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
Looks like this review is against jsonb-5, not jsonb-6. oh yep -- shoot, sorry for the noise. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Jan 28, 2014 at 10:46 AM, Andrew Dunstan and...@dunslane.net wrote: On 01/28/2014 11:29 AM, Tom Lane wrote: Andrew Dunstan and...@dunslane.net writes: The problem is not the indexterm element, it's the space that might exist outside it. Are we using block level elements like para inside entry elements anywhere else? Probably not, and I wonder why you're trying to. Whole paras inside a table entry (this is a table no?) don't sound like they are going to lead to nice-looking results. See http://developer.postgresql.org/~adunstan/functions-json.html yeah. note: I think the json documentation needs *major* overhaul. too much is going in inside the function listings where there really should be a big breakout discussing the big picture of json/jsonb with examples of various use cases. I want to give it a shot but unfortunately can not commit to do that by the end of the 'fest. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb and nested hstore
On Tue, Jan 28, 2014 at 12:09 PM, Josh Berkus j...@agliodbs.com wrote: On 01/28/2014 09:58 AM, Merlin Moncure wrote: yeah. note: I think the json documentation needs *major* overhaul. too much is going in inside the function listings where there really should be a big breakout discussing the big picture of json/jsonb with examples of various use cases. I want to give it a shot but unfortunately can not commit to do that by the end of the 'fest. FWIW, I've promised Andrew that I'll overhaul this by the end of beta. Given that we have all of beta for doc refinements. In addition to this, the JSON vs JSONB datatype page really needs expansion and clarification. right: exactly. I'd be happy to help (such as I can) ...I wanted to see if jsonb to make it in on this 'fest (doc issues notwithstanding); it hasn't been formally reviewed yet AFAICT. So my thinking here is to get docs to minimum acceptable standards in the short term and focus on the structural code issues for the 'fest (if jsonb slips then it's moot obviously). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new json funcs
On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote: On 01/24/2014 12:59 PM, Andrew Dunstan wrote: On 01/24/2014 03:40 PM, Laurence Rowe wrote: For consistency with the existing json functions (json_each, json_each_text, etc.) it might be better to add separate json_to_record_text and json_to_recordset_text functions in place of the nested_as_text parameter to json_to_record and json_to_recordset. It wouldn't be consistent with json_populate_record() and json_populate_recordset(), the two closest relatives, however. And yes, I appreciate that we have not been 100% consistent. Community design can be a bit messy that way. FWIW, I prefer the parameter to having differently named functions. +1. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote: Implicit casts to text, anybody? This backward compatibility break orphaned the company I work for on 8.1 until last year and very nearly caused postgres to be summarily extirpated (only rescued at the last minute by my arrival). It cost hundreds of thousands of dollars to qualify a sprawling java code base so that it could be moved back into a supported version. Breaking compatibility sucks -- it hurts your users and costs people money. Hacking type casts may not have been a mistake, but the arbitrary introduction of the breakage certainly was. This project has no deprecation policy, and I'd argue we'd need one before considering breaking changes. For example, maybe we could pull out an occasional release for longer term support to help users that caught out. But really, the better way to go IMNSHO is to take a hard line on compatibility issues pretty much always -- consider the case of libc and win32 api. There are certain limited exceptions to this rule -- for example security problems or gross violations of the standard (bringing row-wise comparison to spec comes to mind as an example of that). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Mon, Jan 13, 2014 at 3:45 PM, David Fetter da...@fetter.org wrote: On Mon, Jan 13, 2014 at 10:40:57AM -0600, Merlin Moncure wrote: This project has no deprecation policy, I believe it actually does, although it's not a formal, written policy. Would you like to help draft one up? Lack of 'formal, written, policy' is equivalent to 'no policy'. Regardless, the way things we done in the 7.x/8.x series may no longer apply today; the project has grown up and we need to be more serious about things, at least, IMNSHO. and I'd argue we'd need one before considering breaking changes. For example, maybe we could pull out an occasional release for longer term support to help users that caught out. But really, the better way to go IMNSHO is to take a hard line on compatibility issues pretty much always -- consider the case of libc and win32 api. Could you please help remind us what that was? Let's take gets() for example. C11 finally ditched it 12 years (!) after it was formally deprecated in C99 and informally deprecate in endless man pages (don't use this!) for decades before that. And even then most compilers, at least the decent ones, should allow to request previous standards for some time beyond that. The win32 API is also remarkably stable; ancient code written for it beyond the dim horizon of time will still compile and execute today. These are probably strong contenders for most popular APIs ever made -- see the connection? Now, comparing C APIs to an SQL implementation for deprecation purposes isn't quite applies to apples, but I'll stand by the analogy. or gross violations of the standard We're definitely there on lower bounds of arrays. The standard, for a wonder, is clear and unambiguous about them. Whether we should go there on the rest of our array implementation is a question for another thread. The SQL standard requests that standard syntax gives standard behavior. Alternate bounds is non-standard syntax giving non-standard behavior and is thus excepted. Naturally, non-standard syntax is dangerous because the standard may later implement it in which case you then have a real problem (that may be the case here: I don't know). Our array implementation is a real mess on multiple levels but at least it's an internally consistent mess. Maybe it really should be 'fixed', but not before the super un-fun discussion of how to ease the path for our hapless users happens first. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On Fri, Jan 10, 2014 at 2:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote: On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote: On 1/10/14, 1:20 AM, Merlin Moncure wrote: I'm piling on: it's not clear at all to me why you've special cased this to lower_bound=1. First of all, there are other reasons to check length than iteration. Yes, I agree. A length function that returned 0 for empty arrays would be far from useless. Can you point me to some examples? The example I see all the time is code like if array_length(nodes, 1) 5 then ... do something ... then you realise (or not as the case may be) that this doesn't work for empty arrays, and have to remember to wrap it in a coalesce call. Simply being able to write if cardinality(nodes) 5 then ... do something ... is not just shorter, easier to type and easier to read, it is far less likely to be the source of subtle bugs. right -- exactly. or, 'ORDER BY cardinatility(nodes)', etc etc. Furthermore, we already have pretty good support for iteration with arrays via unnest(). What's needed for better iteration support (IMO) is a function that does what unnest does but returns an array on indexes (one per dimsension) -- a generalization of the _pg_expandarray function. Lets' say 'unnest_dims'. 'unnest_dims' is non-trivial to code in user land while 'array_length' is an extremely trivial wrapper to array_upper(). cardinality() (which is much better name for the function IMSNHO) gives a*b*c values say for a 3d array also does something non-trivial *particularly in the case of offset arrays*. On Fri, Jan 10, 2014 at 3:36 AM, Marko Tiikkaja ma...@joh.to wrote: I guess what I truly want is a less generic type that's like an array, but always one-dimensional with a lower bound of 1. Your function would be the only one in the array API that implemented special behaviors like that. That's suggests to me that the less generic function belongs in user land, not in the core array API. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote: On 1/10/14, 10:41 AM, Merlin Moncure wrote: What's needed for better iteration support (IMO) is a function that does what unnest does but returns an array on indexes (one per dimsension) -- a generalization of the _pg_expandarray function. Lets' say 'unnest_dims'. So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a functionality I've considered us to have been missing for a long time. not quite. it returns int[], anyelement: so, using your example, you'd get: [1,1], 1 [1,2], 2 [2,1], 3 [2,2], 4 like unnest() it would fully decompose the array do individual elements. what you have above slices the array which is useful,but probably shouldn't live under the 'unnest' name -- perhaps 'slice'. Pavel added it to pl/pgsql under the FOREACH syntax (FYI). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On Fri, Jan 10, 2014 at 6:00 AM, Florian Pflug f...@phlo.org wrote: On Jan10, 2014, at 11:00 , Merlin Moncure mmonc...@gmail.com wrote: On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote: On 1/10/14, 10:41 AM, Merlin Moncure wrote: What's needed for better iteration support (IMO) is a function that does what unnest does but returns an array on indexes (one per dimsension) -- a generalization of the _pg_expandarray function. Lets' say 'unnest_dims'. So unnest_dims('{{1,2},{3,4}}'::int[]) would return VALUES (1, '{1,2}'::int[]), (2, '{3,4}'::int[])? If so, then yes, that's a functionality I've considered us to have been missing for a long time. not quite. it returns int[], anyelement: so, using your example, you'd get: [1,1], 1 [1,2], 2 [2,1], 3 [2,2], 4 Now that we have WITH ORDINALITY, it'd be sufficient to have a variant of array_dims() that returns int[][] instead of text, say array_dimsarray(). Your unnest_dims could then be written as unnest(array_dimsarray(array)) with ordinality hm, not quite following that. maybe an example? my issue with 'WITH ORDINALITY' (while it's pretty neat) is that it doesn't give you the dimension coordinate of each datum so you can't really use it to slice. with unnest_dims(), you an slice, say via: select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[]) group by dims[1]; or select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[]) where dims[1] = 2; not super elegant, but good enough for most uses I think. anyways, getting back on topic, the question on the table is cardinality() vs array_length, right? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disallow arrays with non-standard lower bounds
On Fri, Jan 10, 2014 at 4:10 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Jan 10, 2014 at 1:26 PM, Jim Nasby j...@nasby.net wrote: On 1/9/14, 10:58 PM, Tom Lane wrote: Jim Nasby j...@nasby.net writes: ISTM that allowing users to pick arbitrary lower array bounds was a huge mistake. I've never seen anyone make use of it, can't think of any legitimate use cases for it, and hate the stupendous amount of extra code needed to deal with it. You lack imagination, sir. Considering what you'd normally want to do in SQL, the only example I can think of is to not have the argument over 0 vs 1 based. Actually, I was thinking there might be some computational problems where changing lower bound would be nice, but then again, what other languages actually support this? Perl does, though they regret it bitterly. What does it matter? Our arrays have had the capability for years and years and because it's cleaner is simply not justification to break people's applications. Why are we even considering this? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] array_length(anyarray)
On Thu, Jan 9, 2014 at 11:08 AM, Marko Tiikkaja ma...@joh.to wrote: On 1/9/14 5:44 PM, Florian Pflug wrote: On Jan9, 2014, at 14:57 , Dean Rasheed dean.a.rash...@gmail.com wrote: On 19 December 2013 08:05, Pavel Stehule pavel.steh...@gmail.com wrote: length should be irrelevant to fact so array starts from 1, 0 or anything else Yes, this should just return the number of elements, and 0 for an empty array. +1. Anything that complains about arrays whose lower bound isn't 1 really needs a *way* less generic name than array_length(). Problem is, if you're operating on an array which could have a lower bound that isn't 1, why would you look at the length in the first place? You can't access any elements by index, you'd need to look at array_lower(). You can't iterate over the array by index, you'd need to do array_lower() .. array_lower() + array_length(), which doesn't make sense. And then there's the myriad of stuff you can do with unnest() without actually having to look at the length. Same goes for multi-dimensional arrays: you have even less things you can do there with only a length. I'm piling on: it's not clear at all to me why you've special cased this to lower_bound=1. First of all, there are other reasons to check length than iteration. If you want your code to blow up with non 1 based array, that should be checked in userland I think (perhaps with a constraint); the server API function should implement as many reasonable behaviors as possible. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer fwei...@redhat.com wrote: On 01/03/2014 04:20 PM, Tom Lane wrote: I think Florian has a good point there, and the reason is this: what you are talking about will be of exactly zero use to applications that want to see the results of one query before launching the next. Which eliminates a whole lot of apps. I suspect that almost the *only* common use case in which a stream of queries can be launched without feedback is going to be bulk data loading. It's not clear at all that pipelining the PQexec code path is the way to better performance for that --- why not use COPY, instead? The data I encounter has to be distributed across multiple tables. Switching between the COPY TO commands would again need client-side buffering and heuristics for sizing these buffers. Lengths of runs vary a lot in my case. I also want to use binary mode as a far as possible to avoid the integer conversion overhead, but some columns use custom enum types and are better transferred in text mode. Some INSERTs happen via stored procedures, to implement de-duplication. These issues could be addressed by using temporary staging tables. However, when I did that in the past, this caused pg_shdepend bloat. Carefully reusing them when possible might avoid that. Again, due to the variance in lengths of runs, the staging tables are not always beneficial. I understand that pipelining introduces complexity. But solving the issues described above is no picnic, either. Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)? It transfers most everything in binary (enums notably are handled as strings). A typical usage of libpqtypes would be to arrange multiple records into an array on the client then hand them off to a stored procedure on the server side (perhaps over an asynchronous call while you assemble the next batch). libpqtypes was written for C applications with very high performance requirements (for non performance critical cases we might use json instead). In my experience it's not too difficult to arrange an assembly/push loop that amortizes the round trip overhead to zero; it's not as efficient as COPY but much more flexible and will blow away any scheme that sends data row per query. I agree with Tom that major changes to the libpq network stack is probably not a good idea. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Async query processing
On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire klaussfre...@gmail.com wrote: On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane t...@sss.pgh.pa.us wrote: Claudio Freire klaussfre...@gmail.com writes: On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer fwei...@redhat.com wrote: Loading data into the database isn't such an uncommon task. Not everything is OLTP. Truly, but a sustained insert stream of 10 Mbps is certainly way beyond common non-OLTP loads. This is far more specific than non-OLTP. I think Florian has a good point there, and the reason is this: what you are talking about will be of exactly zero use to applications that want to see the results of one query before launching the next. Which eliminates a whole lot of apps. I suspect that almost the *only* common use case in which a stream of queries can be launched without feedback is going to be bulk data loading. It's not clear at all that pipelining the PQexec code path is the way to better performance for that --- why not use COPY, instead? You're forgetting ORM workloads. ORMs can usually plan the inserts to be in a sequence that both don't require feedback (except the knowledge that they were successful), and that do not violate constraints. Flushing a whole object hierarchy for instance, can be done without feedback. Not even serial columns need feedback, since many ORMs (SQLAlchemy, Hibernate) support allocation of ID sequences in batches (by issuing a proper select nextval). I agree, that with the proposed API, it's too error prone to be useful. But I also think, if the API is simple and fool-proof enough, it could be build them and they will come. I know I'll be happy to implement support for SQLAlchemy (since it will benefit me), if the API resembles the proposition below (at least in simplicity). Per-query expectations could be such a thing. And it can even work with PQexec: PQexec(con, SELECT nextval('a_id_seq') FROM generate_series(1,10);); --read-- PQexec(con, SELECT nextval('b_id_seq') FROM generate_series(1,10);); --read-- PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); ... 9 times... PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK); PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC); do { // do something useful } while (PQflush()); Here, the PQASYNC flag would temporarily switch to non-blocking I/O, and buffer what cannot be sent. PQASNC_CORK, would only buffer (only send if the buffer is full). After any ASYNC call, PQflush would be necessary (to flush the send queue and to consume the expected responses), but I can imagine any synchronous call (PQexec, PQsendQuery or whatever) could detect a non-empty buffer and just blockingly flush right there. This can benefit many useful patterns. ORM flush, is one, if there can be preallocation of IDs (which I know at least SQLAlchemy and Hibernate both support). Execute-many of prepared statements is another one, quite common. I'm not sure what would happen if one of the queries returned an error. If in a transaction, all the following queries would error out I'd imagine. If not, they would simply be executed blindly.. am I correct? Long term, I'd rather see an optimized 'ORM flush' assemble the data into a structured data set (perhaps a JSON document) and pass it to some receiving routine that decomposed it into records. This is a better way to so things on so many levels. Maybe I'm an old cranky guy yelling at pigeons, but I don't think the current approach that many ORMs take is going to withstand the test of time. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple read-write masters in a cluster with wal-streaming synchronization
On Tue, Dec 31, 2013 at 3:51 PM, Mark Dilger markdil...@yahoo.com wrote: The BDR documentation http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf says, Physical replication forces us to use just one node: multi-master required for write scalability Physical replication provides best read scalability I am inclined to agree with the second statement, but I think my proposal invalidates the first statement, at least for a particular rigorous partitioning over which server owns which data. In my own workflow, I load lots of data from different sources. The partition the data loads into depends on which source it came from, and it is never mixed or cross referenced in any operation that writes the data. It is only mixed in the sense that applications query data from multiple sources. So for me, multi-master with physical replication seems possible, and would presumably provide the best read scalability. I doubt that I am in the only database user who has this kind of workflow. The alternatives are ugly. I can load data from separate sources into separate database servers *without* replication between them, but then the application layer has to emulate queries across the data. (Yuck.) Or I can use logical replication such as BDR, but then the servers are spending more effort than with physical replication, so I get less bang for the buck when I purchase more servers to add to the cluster. Or I can use FDW to access data from other servers, but that means the same data may be pulled across the wire arbitrarily many times, with corresponding impact on the bandwidth. Am I missing something here? Does BDR really provide an equivalent solution? I think BDR is better: while it does only support schema-equivalent replication that is the typical case for distributed write systems like this. Also, there are a lot less assumptions about the network architecture in the actual data itself (for example, what happens when you want to change onwer/mege/split data?). IMNSHO, It's better that each node is managing WAL for itself, not the other way around except in the very special case you want an exact replica of the database on each node at all times as with the current HS/SR. A **huge** amount of work has/is being put in to wal based logical replication support (LLSR in the BDR docs) that should mostly combine the flexibility of trigger based logical replication with the robustness of wal based replication that we have in core now. LLSR a low level data transmission framework that can be wrapped by higher level user facing stuff like BDR. LLSR, by the way, does not come attached with the assumption that all databases have the same schema. If I were you, I'd be studying up on LLSR and seeing how it could be molded into the use cases you're talking about. From a development point of view, the replication train hasn't just left the station, it's a space shuttle that just broke out of earth's orbit. By my reckoning a new 'from the ground up' implementation of replication requiring in-core changes has an exactly zero percent change of being adopted. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stuck spinlock
On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote: On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: And while we're on the subject ... isn't bgworker_die() utterly and completely broken? That unconditional elog(FATAL) means that no process using that handler can do anything remotely interesting, like say touch shared memory. Yeah, but for the record (since I see I got cc'd here), that's not my fault. I moved it into bgworker.c, but it's been like that since Alvaro's original commit of the bgworker facility (da07a1e856511dca59cbb1357616e26baa64428e). Is this an edge case or something that will hit a lot of users? Arbitrary server panics seems pretty serious... Is your question about the bgworker part you're quoting or about the stuck spinlock stuff? I don't think the bgworker bug is too bad in practice but the one in handle_sig_alarm() stuff certainly is. I think while it looks possible to hit problems without statement/lock timeout, it's relatively unlikely that those are hit in practice. Well, both -- I was just wondering out loud what the severity level of this issue was. In particular, is it advisable for the general public avoid this release? My read on this is 'probably'. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stuck spinlock
On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote: And while we're on the subject ... isn't bgworker_die() utterly and completely broken? That unconditional elog(FATAL) means that no process using that handler can do anything remotely interesting, like say touch shared memory. Yeah, but for the record (since I see I got cc'd here), that's not my fault. I moved it into bgworker.c, but it's been like that since Alvaro's original commit of the bgworker facility (da07a1e856511dca59cbb1357616e26baa64428e). Is this an edge case or something that will hit a lot of users? Arbitrary server panics seems pretty serious... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Thu, Dec 12, 2013 at 4:02 AM, knizhnik knizh...@garret.ru wrote: On 12/12/2013 11:42 AM, Pavel Stehule wrote: it is interesting idea. For me, a significant information from comparation, so we do some significantly wrong. Memory engine should be faster naturally, but I don't tkink it can be 1000x. Sorry, but I didn't fabricate this results: Below is just snapshot from my computer: postgres=# select DbItem_load(); dbitem_load - 998 (1 row) postgres=# \timing Timing is on. postgres=# select cs_used_memory(); cs_used_memory 4441894912 (1 row) postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from (select (cs_project_agg(ss1.*)).* from (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q, cs_hash_sum(q.score*q.volenquired, q.trader||q.desk||q.office) s1, cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2) ss1) ss2; agg_val | cs_cut --+ 1.50028393511844 | (John Coltrane,New York Corporates,New York) Time: 506.125 ms postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office); ... Time: 449328.645 ms postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office); ... Time: 441530.689 ms Please notice that time of second execution is almost the same as first, although all data can fit in cache! Certainly it was intersting to me to understand the reason of such bad performance. And find out two things: 1. select sum(score*volenquired)/sum(volenquired) from DbItem group by (trader,desk,office); and select sum(score*volenquired)/sum(volenquired) from DbItem group by trader,desk,office; are not the same queries (it is hard to understand to C programmer:) And first one is executed significantly slower. 2. It is not enough to increase shared_buffers parameter in postgresql.conf. work_mem is also very important. When I increased it to 1Gb from default 1Mb, then time of query execution is reduced to 7107.146 ms. So the real difference is ten times, not 1000 times. Yeah. It's not fair to compare vs an implementation that is constrained to use only 1mb. For analytics work huge work mem is pretty typical setting. 10x improvement is believable considering you've removed all MVCC overhead, locking, buffer management, etc. and have a simplified data structure. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Thu, Dec 12, 2013 at 12:18 PM, knizhnik knizh...@garret.ru wrote: IMHO it is strange to see such small default values in postgresql configuration. This (low default work mem) is because of three things: 1) Most queries do not really need a lot of work mem 2) Work mem stacks with each query using it -- so with your 1mb setting vs 1000 connections, you get a gigabyte. So, some conservatism is justified although this setting tended to be much more dangerous in the old days when we measured memory in megabytes. 3) Postgres does not query available physical memory for default settings due to portability issues. So we tend to tune to common denominator. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Mon, Dec 9, 2013 at 1:40 PM, knizhnik knizh...@garret.ru wrote: Hello! I want to annouce my implementation of In-Memory Columnar Store extension for PostgreSQL: Documentation: http://www.garret.ru/imcs/user_guide.html Sources: http://www.garret.ru/imcs-1.01.tar.gz Any feedbacks, bug reports and suggestions are welcome. Vertical representation of data is stored in PostgreSQL shared memory. This is why it is important to be able to utilize all available physical memory. Now servers with Tb or more RAM are not something exotic, especially in financial world. But there is limitation in Linux with standard 4kb pages for maximal size of mapped memory segment: 256Gb. It is possible to overcome this limitation either by creating multiple segments - but it requires too much changes in PostgreSQL memory manager. Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the system). I found several messages related with MAP_HUGETLB flag, the most recent one was from 21 of November: http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org I wonder what is the current status of this patch? I looked over your extension. I think it's a pretty amazing example of the postgres extension and type systems -- up there with postgis. Very well done. How long did this take you to write? MAP_HUGETLB patch was marked 'returned with feedback'. https://commitfest.postgresql.org/action/patch_view?id=1308. It seems likely to be revived, perhaps in time for 9.4. Honestly, I think your efforts here provide more argument for adding huge tbl support. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] In-Memory Columnar Store
On Wed, Dec 11, 2013 at 10:08 AM, knizhnik knizh...@garret.ru wrote: 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer. Just defining insertion per-row trigger with empty procedure increase time of insertion of 6 million records twice - from 7 till 15 seconds. If trigger procedure is not empty, then time is increased proportionally number of performed calls. In my case inserting data with propagation it in columnar store using trigger takes about 80 seconds. But if I first load data without triggers in PostgreSQL table and then insert it in columnar store using load function (implemented in C), then time will be 7+9=16 seconds. Yeah. For this problem, we either unfortunately have to try to try to use standard sql functions in such away that supports inlining (this is a black art mostly, and fragile), or move logic out of the function and into the query via things like window functions, or just deal with the performance hit. postgres flavored SQL is pretty much the most productive language on the planet AFAIC, but the challenge is always performance, performance. Down the line, I am optimistic per call function overhead can be optimized, probably by expanding what can be inlined somehow. The problem is that this requires cooperation from the language executors this is not currently possible through the SPI interface, so I really don't know. Certainly I realize that plpgsql is interpreted language. But for example also interpreted Python is able to do 100 times more calls per second. Unfortunately profiler doesn;t show some bottleneck - looks like long calltime is caused by large overhead of initializing and resetting memory context and copying arguments data. 2. Inefficient implementation of expanding composite type columns using (foo()).* clause. In this case function foo() will be invoked as much times as there are fields in the returned composite type. Even in case of placing call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still sometimes performs redundant calls which can be avoided using hack with adding OFFSET 1 clause. Yeah, this is long standing headache. LATERAL mostly deals with this but most cases (even with pre-9.3) can be worked around one way or another. 3. 256Gb limit for used shared memory segment size at Linux. I figure this will be solved fairly soon. It's a nice problem to have. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compression of tables
On Tue, Dec 10, 2013 at 1:18 AM, Thomas Munro mu...@ip9.org wrote: Hi I have been wondering what the minimum useful heap table compression system would be for Postgres, in order to reduce disk footprint of large mostly static datasets. Do you think an approach similar to the static row-level compression of DB2 could make sense? I imagine something like this: 1. You have a table which already has data in it. 2. You run a COMPRESS operation, which builds a static dictionary, and rewrites the whole table with compressed frozen tuples. Frozen tuples have CTIDs just like regular tuples, and can be pointed to by indexes. They are decompressed on the fly when needed. Clearly things get tricky once you need to update rows. Assume for simplicity that future UPDATEs and INSERTs produce normal, non-compressed tuples that would only be compressed by a subsequent COMPRESS operation. The question is how to deal with the existing compressed rows when UPDATEd or DELETEd. Some approaches: 1. Just don't allow updates of compressed rows (!). 2. Exclusively lock the whole page when trying to update any compressed row, while you explode it into regular uncompressed tuples on new pages which you can work on (!). 3. Pull the minimum header fields out of the compressed tuples so that the MVCC machinery can work, to support updates of compressed tuples. Perhaps just the t_xmax, t_ctid values (t_xmin == frozen is implied), so that a writer can update them. This means an overhead of at least 10 bytes per tuple over the compressed size (plus the item offsets in the page header). 4. Something far cleverer. Well, these are straw-man suggestions really and I probably don't understand enough about PG internals (MVCC and implications for VACUUM) to be making them. But I'm curious to know if anyone has researched something like this. For example, I have a system that occupies a couple of TB on disk, but only a few to a few hundred MB per day change, mostly adding data to an active partition. I periodically run CLUSTER on any partition that has pg_stat.correlation 0.9 (this effectively just re-CLUSTERs the active one). At the same times I would COMPRESS, and the DB could potentially fit on much smaller SSDs. Most commercial database systems I encounter these days are using compression of some sort (more sophisticated than the above, with dynamic dictionaries, and sometimes column based storage etc). postgres compresses TOASTED data: one strategy could be to arrange your data somehow to utilize TOAST. I doubt you'll ever see generally heap compressed data in the way you're thinking: postgres has a strong informal policy of not implementing features which are dubious and or excessively complicated with limited benefit, particularly if there are ways to handle this outside the database; there are various operating system level tricks that can cause a compressed file or even an entire tablespace (o/s folder) masquerade as a regular structures. So maybe you are asking for a feature we already have: CREATE TABLESPACE. For example take a look here: https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F (out of curiosity, if this strategy fits the bill for you I wouldn't mind seeing a follow up on how this handles your static data use case). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON decoding plugin
On Mon, Dec 9, 2013 at 10:53 AM, Euler Taveira eu...@timbira.com.br wrote: On 09-12-2013 13:12, Merlin Moncure wrote: This is pretty neat. Couple minor questions: *) Aren't you *en*coding data into json, not the other way around (decoding?) Yes. The 'decoding' came from the functionality (logical decoding) and because the POC plugin is named 'test_decoding'. I also think that 'json_decoding' doesn't say much about the module purpose. I confess that I don't like the name but can't come up with a good name. Maybe 'wal2json' or 'logrep2json'? Could you suggest something? I'm partial to wal2json actually. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] JSON decoding plugin
On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira eu...@timbira.com.br wrote: Hi, A few months ago, it was proposed [1] that would be interested to have a json output plugin for logical decoding. Here it is. Each transaction is a JSON object that can contain xid (optional), timestamp (optional), and change array. Each change's element is a command that was decoded and it can contains: kind (I/U/D), schema (optional), table, columnnames, columntypes (optional), columnvalues, and oldkeys (only for U/D). columnnames, columntypes and columnvalues are arrays. oldkeys is an object that contains the following arrays: keynames, keytypes (optional), and keyvalues. The JSON objects are serialized if you are decoding a serie of transactions. Here is an output example: { xid: 702, change: [ { kind: insert, schema: public, table: foo, columnnames: [a, b, c], columntypes: [int4, int4, text], columnvalues: [1, 2, test] } ,{ kind: update, schema: public, table: foo, columnnames: [a, b, c], columntypes: [int4, int4, text], columnvalues: [1, 2, test2], oldkeys: { keynames: [a, b], keytypes: [int4, int4], keyvalues: [1, 2] } } ] } { xid: 703, change: [ { kind: update, schema: public, table: foo, columnnames: [a, b, c], columntypes: [int4, int4, text], columnvalues: [1, 3, test2], oldkeys: { keynames: [a, b], keytypes: [int4, int4], keyvalues: [1, 2] } } ] } { xid: 704, change: [ { kind: delete, schema: public, table: foo, oldkeys: { keynames: [a, b], keytypes: [int4, int4], keyvalues: [1, 3] } } ] } Some data types was adapted to conform with JSON spec. NAN and Infinity are not valid JSON symbols so their representation is NULL (as some JSON implementations). Due to JSON datatype simplicity, I represent the vast majority of Postgres datatypes as string (However, I admit that we could mimic the json datatype conversion rules). The oldkeys treatment follows what was defined by the commit [2]. It uses: (i) primary key (default behavior); (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table); (iii) full tuple (if REPLICA IDENTITY FULL is defined for table); (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for table). The TOAST columns have a special treatment for UPDATEs. If a tuple that contains a TOAST field is updated, the TOAST field is included iif it is changed too. It means that unchanged TOAST field are omitted from columns* arrays. This means less overhead while transmitting, processing and applying changes. By design, (i) output plugin doesn't know about aborted transactions and (ii) subtransactions are reordered into a toplevel transaction and only the committed pieces are passed to the plugin. You can test it firing the regression tests (e.g. 'make test') or using the following steps? postgresql.conf: wal_level = logical max_wal_senders = 2 max_logical_slots = 2 start collecting WAL records: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --plugin=json_decoding_plugin --init [execute some transactions] start printing decoded transactions: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start stop collecting WAL records: $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop Comments? This is pretty neat. Couple minor questions: *) Aren't you *en*coding data into json, not the other way around (decoding?) *) Consider generating a long bytea instead of explicitly writing a 32kb sql into the patch. *) You've built your own json serializer here. Maybe some code can be shared with the json type? *) Consider removing 'plugin ' from the name of the plugin. --plugin=json_decoding etc. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:
Re: [HACKERS] Status of FDW pushdowns
On Mon, Dec 2, 2013 at 10:26 PM, David Fetter da...@fetter.org wrote: On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote: On 11/28/2013 03:24 AM, David Fetter wrote: WITH, or SRF, or whatever, the point is that we need to be able to specify what we're sending--probably single opaque strings delimited just as we do other strings--and what we might get back--errors only, rows, [sets of] refcursors are the ones I can think of offhand. So, you're thinking of something like: WITH FOREIGN somecte AS $$... foreign query ...$$ SELECT ... FROM somecte; I was picturing something a little more like an SRF which would take one opaque string, the remote command, some descriptor, perhaps an enum, of what if anything might come back. Long ago, I implemented a similar thing in DBI-Link. It was called remote_exec_dbh(data_source_id integer, query text, returns_rows bool) Couple thoughts: *) Any 'pass through' API should support parameterization (the FDW may not support that, but many will and API should allow for it). Lack of parameterization is a major downside of dblink. The function could be set up to be variadic for the parameters. *) For a connectivity APIs of this style, Dblink-ish mechanic of separating command execution from data returning commands is likely the right way to go. Also, probably better to stick with SRF mechanics if we go the 'function route'. So basically we are making dblink for FDW, adding parameterization and some concept of utilizing the foreign server. All this is assuming we are adding a special remote execution function ('fdwlink'). While that would be great, it's a significant deviation from the standard into postgresql specific SRF syntax. If some of the qual pushdown deparsing functionality could be put inside the internal FDW API, then you'd get the best of both worlds. Maybe you'd still want a dblink style extension anyways, but it wouldn't be as critical. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Dec 4, 2013 at 1:39 PM, David Fetter da...@fetter.org wrote: On Wed, Dec 04, 2013 at 12:43:44PM -0600, Merlin Moncure wrote: On Mon, Dec 2, 2013 at 10:26 PM, David Fetter da...@fetter.org wrote: On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote: On 11/28/2013 03:24 AM, David Fetter wrote: WITH, or SRF, or whatever, the point is that we need to be able to specify what we're sending--probably single opaque strings delimited just as we do other strings--and what we might get back--errors only, rows, [sets of] refcursors are the ones I can think of offhand. So, you're thinking of something like: WITH FOREIGN somecte AS $$... foreign query ...$$ SELECT ... FROM somecte; I was picturing something a little more like an SRF which would take one opaque string, the remote command, some descriptor, perhaps an enum, of what if anything might come back. Long ago, I implemented a similar thing in DBI-Link. It was called remote_exec_dbh(data_source_id integer, query text, returns_rows bool) Couple thoughts: *) Any 'pass through' API should support parameterization (the FDW may not support that, but many will and API should allow for it). Lack of parameterization is a major downside of dblink. The function could be set up to be variadic for the parameters. I don't know for sure that that needs to be in version 1 of this. It definitely shouldn't block implementing the non-parameterized one. I'm not making the case it should be version anything. But, if you went dblink style, you'd want to go variadic. It's not really any extra work and you can always embed the string if the FDW driver doesn't support parameterization. What the standard has is literally insane. Not sure I agree. The guiding principle of the standard implementation AIUI is that it wants to connectivity management via syntax and keep the DML abstractions clean (minus some un-implementable things like RI triggers). In other words, you write exactly the same queries for native and foreign tables. This makes things much easier for people who just want to write SQL the classical way and not get into funky vendor specific APIs. The downside of SQL-MED, particularly the way postgres implemented the driver API, is that each driver is responsible for for all optimization efforts and I think this is bad. So I'm openly wondering if the FDW API should expose optional query rewriting hooks. The odbc-fdw and jdbc-fdw drivers for example could then benefit from those hooks so that qual pushdown could be implemented with far less code duplication and effort and a *much* broader set of problems could be addressed by FDW. For non- or exotic- SQL implementations those hooks could be implemented locally by the driver or disabled if doesn't make sense to use them. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Why we are going to have to go DirectIO
On Wed, Dec 4, 2013 at 2:31 PM, Jonathan Corbet cor...@lwn.net wrote: For those interested in the details... (1) It's not quite 50/50, that's one bound for how the balance is allowed to go. (2) Anybody trying to add tunables to the kernel tends to run into resistance. Exposing thousands of knobs tends to lead to a situation where you *have* to be an expert on all those knobs to get decent behavior out of your system. So there is a big emphasis on having the kernel tune itself whenever possible. Here is a situation where that is not always happening, but a fix (which introduces no knob) is in the works. I think there are interesting parallels here with the 'query plan hints' debate. In both cases I think the conservative voices are correct: better not to go crazy adding knobs. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Wed, Nov 27, 2013 at 4:20 AM, Shigeru Hanada shigeru.han...@gmail.com wrote: Hi Merlin, 2013/11/22 Merlin Moncure mmonc...@gmail.com: On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: 2013/11/22 Tom Lane t...@sss.pgh.pa.us: Merlin Moncure mmonc...@gmail.com writes: On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote: I know join pushdowns seem insignificant, but it helps to restrict what data must be passed back because you would only pass back joined rows. By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. [ shrug... ] So use dblink. For better or worse, the FDW stuff is following the SQL standard's SQL/MED design, which does not do it like that. Pass-through mode mentioned in SQL/MED standard might be what he wants. happen to have a link handy? SQL/MED standard doesn't say much about PASS THROUGH mode, especially about interaction between client. Besides it, I think it would be nice to allow arbitrary FDW as backend of dblink interface like this: postgres= SELECT dblink_connect('con1', 'server name of an FDW'); postgres= SELECT * FROM dblink('con1', 'some query written in remote syntax') as t(/* record type definition */...); This provides a way to execute query without defining foreign table. yeah. (thanks for indulging -- this is barely on topic I guess). if it were possible to create a supporting function (say, fdw_link) that could somehow interface with a previously established server, it could probably be worked out. Then all FDW could leverage parameterization without having to copy and paste the pgsql-fdw qual push code. But that would be a fairly large break from the rest of the FDW syntax and having to define the record at each call site is admittedly a bit of a headache. Hm, another way to think about this would be to somehow abstract the qual push into a library so that it could be accessed by other FDWs if they opted in. This would address my chief complaint that only the pgsql-fdw (the only database for which we already have an in-core high quality connection api) driver could tap the excellent work you've done. If this were even possible, it would probably result in more fdw API changes. If my: SELECT * FROM big_sql_server_foreign_table WHERE id = x; was fast, that'd be pretty nice. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql shows line number
On Tue, Nov 26, 2013 at 9:22 AM, Tom Lane t...@sss.pgh.pa.us wrote: Marko Tiikkaja ma...@joh.to writes: On 11/26/13 2:24 PM, Pavel Stehule wrote: yes, but we can define new statement like \sq+ (Show Query) There's already \p; might not be too difficult to add a \p+ which would also show the line numbers. I don't actually see the point of this. If you're working with queries that are long enough that counting lines is even slightly difficult, surely you're going to use \e to correct the problem? So it seems to me that the existing \e nnn facility pretty much does what's needed, and there's no great need to clutter your regular display with line numbers. I agree in the sense line numbers don't help. But \e is not really a solution to the problem. A pretty common thing for me to see (as an artifact of my styling) in the log: ERROR: syntax error at or near from LINE 223: from ^ Not very helpful. Also good luck if your SQL is dynamically generated. What I think would help *would* be able to optionally add a some lines of context: then you'd at least have a shot at eyeballing the error etc. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On Mon, Nov 25, 2013 at 3:33 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Pavel Stehule pavel.steh...@gmail.com writes: there is other issue - simply parser will be really user unfriendly, and user friendly parser will not by simply :( If simple things are hard to implement, get yourself better tools. Each time we get on the topic of improving scripting abilities for our interactive tool, it's always the same problem: having to invent a scripting language with a whole parser is just too much work. Maybe it's time we step back a little and consider real scripting solutions to embed into psql, and pgbench too: I'm thinking (did I miss something?) that Pavel was commenting merely on the parsing of setting unicode border characters, not the wider scripting of psql. (psql scripting is a fun topic to discuss though :-)). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Put json type into alphabetical order in manual table
On Mon, Nov 25, 2013 at 3:07 PM, Andreas Karlsson andr...@proxel.se wrote: Hi, When looking at table 8-1 at http://www.postgresql.org/docs/9.3/static/datatype.html i noticed that all types except for json was in alphabetical order. I have attached a patch which fixes this. By the way should character and character varying be swapped in that table too? I would. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada shigeru.han...@gmail.com wrote: 2013/11/22 Tom Lane t...@sss.pgh.pa.us: Merlin Moncure mmonc...@gmail.com writes: On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote: I know join pushdowns seem insignificant, but it helps to restrict what data must be passed back because you would only pass back joined rows. By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. [ shrug... ] So use dblink. For better or worse, the FDW stuff is following the SQL standard's SQL/MED design, which does not do it like that. Pass-through mode mentioned in SQL/MED standard might be what he wants. happen to have a link handy? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On Fri, Nov 22, 2013 at 2:23 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello 2013/11/21 Merlin Moncure mmonc...@gmail.com On Thu, Nov 21, 2013 at 1:09 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello I wrote new styles for psql table borders. http://postgres.cz/wiki/Pretty_borders_in_psql This patch is simply and I am think so some styles can be interesting for final presentation. great. hm, maybe we could integrate color? (see: http://merlinmoncure.blogspot.com/2012/09/psql-now-with-splash-of-color.html). it is next possible enhancing - I would to go forward in small steps, please :) minimally (and independent on proposed patch) we can introduce some like final regexp filtering - that can be used for this or other purposes. Yeah. A per field regexp would do the trick. As you have it, I like Peter's idea best. Being able to specify the various character codes makes a lot of sense. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On Fri, Nov 22, 2013 at 8:45 AM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Pavel Stehule escribió: 2013/11/21 Peter Eisentraut pete...@gmx.net Maybe make the border setting a string containing the various characters by index. Then everyone can create their own crazy borders. I seriously though about it, but not sure if it is good way. How about having a single unicode line style, and then have a different \pset setting to determine exactly what chars to print? This wouldn't allow for programmability, but it seems better UI to me. This proliferation of unicode line style names seems odd. That makes sense to me, especially if you could pass escapes. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] why semicolon after begin is not allowed in postgresql?
On Fri, Nov 22, 2013 at 4:34 PM, Mike Blackwell mike.blackw...@rrd.com wrote: I believe the section you are reading refers to the BEGIN keyword in the procedural language plpgsql, not the SQL 'BEGIN' command. The issue stems from confusing two distinct languages both of which, along with several more procedural languages, are documented in the same manual. This is inherited constraint from Oracle pl/sql which pl/pgsql is, uh, inspired by. In pl/sql, all block opening constructs (THEN, LOOP, BEGIN) do not get semi-colons. BEGIN is a weird case because it's (quite unfortunately) also the same thing that explicitly opens a transaction in vanilla SQL; you use a semi-colon there as with any standard SQL statement. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote: Hmm. All callers of RecoveryInProgress() must be prepared to handle the case that RecoveryInProgress() returns true, but the system is no longer in recovery. No matter what locking we do in RecoveryInProgress(), the startup process might finish recovery just after RecoveryInProgress() has returned. True. What about the attached? It reads the shared variable without a lock or barrier. If it returns 'true', but the system in fact just exited recovery, that's OK. As explained above, all the callers must tolerate that anyway. But if it returns 'false', then it performs a full memory barrier, which should ensure that it sees any other shared variables as it is after the startup process cleared SharedRecoveryInProgress (notably, XLogCtl-ThisTimeLineID). I'd argue that we should also remove the spinlock in StartupXLOG and replace it with a write barrier. Obviously not for performance reasons, but because somebody might add more code to run under that spinlock. Looks good otherwise, although a read memory barrier ought to suffice. This code is in a very hot code path. Are we *sure* that the read barrier is fast enough that we don't want to provide an alternate function that only returns the local flag? I don't know enough about them to say either way. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Thu, Nov 21, 2013 at 9:09 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 09:08:05 -0600, Merlin Moncure wrote: This code is in a very hot code path. Are we *sure* that the read barrier is fast enough that we don't want to provide an alternate function that only returns the local flag? I don't know enough about them to say either way. A read barrier is just a compiler barrier on x86. That's good enough for me then. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.
On Thu, Nov 21, 2013 at 10:37 AM, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 21.11.2013 17:08, Merlin Moncure wrote: On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote: Hmm. All callers of RecoveryInProgress() must be prepared to handle the case that RecoveryInProgress() returns true, but the system is no longer in recovery. No matter what locking we do in RecoveryInProgress(), the startup process might finish recovery just after RecoveryInProgress() has returned. True. What about the attached? It reads the shared variable without a lock or barrier. If it returns 'true', but the system in fact just exited recovery, that's OK. As explained above, all the callers must tolerate that anyway. But if it returns 'false', then it performs a full memory barrier, which should ensure that it sees any other shared variables as it is after the startup process cleared SharedRecoveryInProgress (notably, XLogCtl-ThisTimeLineID). I'd argue that we should also remove the spinlock in StartupXLOG and replace it with a write barrier. Obviously not for performance reasons, but because somebody might add more code to run under that spinlock. Looks good otherwise, although a read memory barrier ought to suffice. This code is in a very hot code path. Are we *sure* that the read barrier is fast enough that we don't want to provide an alternate function that only returns the local flag? I don't know enough about them to say either way. In my patch, I put the barrier inside the if (!LocalRecoveryInProgress) block. That codepath can only execute once in a backend, so performance is not an issue there. Does that look sane to you? oh right -- certainly! merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Status of FDW pushdowns
On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote: Where are we on the remaining possible pushdowns for foreign data wrappers, particularly the Postgres one? I know we do WHERE restriction pushdowns in 9.3, but what about join and aggregate pushdowns? Is anyone working on those? I know join pushdowns seem insignificant, but it helps to restrict what data must be passed back because you would only pass back joined rows. By 'insignificant' you mean 'necessary to do any non-trivial real work'. Personally, I'd prefer it if FDW was extended to allow arbitrary parameterized queries like every other database connectivity API ever made ever. But in lieu of that, I'll take as much push down power as possible :-D. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] new unicode table border styles for psql
On Thu, Nov 21, 2013 at 1:09 AM, Pavel Stehule pavel.steh...@gmail.com wrote: Hello I wrote new styles for psql table borders. http://postgres.cz/wiki/Pretty_borders_in_psql This patch is simply and I am think so some styles can be interesting for final presentation. great. hm, maybe we could integrate color? (see: http://merlinmoncure.blogspot.com/2012/09/psql-now-with-splash-of-color.html). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Tue, Nov 19, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote: On 11/19/2013 08:14 AM, Robert Haas wrote: On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I am sure you could also devise an json encoding scheme where white space is significant ;) I don't even have to think hard. If you want your JSON to be human-readable, it's entirely possible that you want it stored using the same whitespace that was present on input. There is a valid use case for normalizing whitespace, too, of course. Given that JSON is a data interchange format, I suspect that there are an extremely large combination of factors which would result in an unimplementably large number of parser settings. For example, I personally would have use for a type which allowed the storage of JSON *fragments*. Therefore I am interested only in supporting two: a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's I'm uncomfortable with the word 'legacy'. This suggests the new type will essentially deprecate the old type. jsonb will be likely be pessimal to large serializations. If you're not manipulating and searching the documents, why would you use it? It's going to take more space on disk and memory and should provide little benefit for present *as well as future code* . (note, it will provide extreme benefits for nosql type uses which is of huge strategic importance for the project). json and jsonb APIs should work together cleanly, and the documentation should suggest which types are different and better for which cases. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote: I don't think any name that doesn't begin with json is acceptable. I could live with jsonb. It has the merit of brevity, but maybe it's a tad too close to json to be the right answer. I think that seems right. Couple thoughts: *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? *) How much of the existing json API has to be copied over to the jsonb type and how exactly is that going to happen? For example, I figure we'd need a record_to_jsonb etc. for sure, but do we also need a jsonb_each()...can't we overload instead? Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus j...@agliodbs.com wrote: Merlin, *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? In/out functions and defaults are all different. Otherwise, the two types will be accessing the same code base, so no duplication. Think of is as XML vs. TEXT. Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. I'll also note that for functions designed for output to the client, it doesn't make much of a difference whether the result is JSON or JSONB, since the string representation will be identical. However, it makes a difference if the data is going to be stored, since a double conversion on a large JSON value would be expensive. Hm, but it would matter wouldn't it...the jsonb representation would give output with the record fields reordered, deduplicated, etc. Also, presumably, the jsonb serialization would be necessarily slower for exactly that reason, although perhaps not enough to matter much. In other words, we need a version of each function which outputs JSONB, but that version doesn't have to be the default if users don't specify. Note that this raises the issue of first alternate data type ambiguity again for overloading builtin functions. We really need that method of prefering a specific version of the function ... You'd need explicit jsonb creating functions: record_to_jsonb, array_to_jsonb etc. AFAIK, these functions would be the only ones that would have to explicitly reference the jsonb type if you don't count casts. It's tempting to *not* make those functions as that would only require the user to specify jsonb for table columns...you'd then go from json to jsonb with a cast, perhaps even an implicit one. The disadvantage there is that you'd then get unsimplified json always. Hm -- on that note, is it technically feasible to *not* duplicate the json API implementations, but just (ab)use implicit casting between the APIs? That way the text API would own all the serialization routines as it does now but you'd run mutation and searching through jsonb... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 11/14/2013 08:17 PM, Merlin Moncure wrote: On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler da...@justatheory.com wrote: On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote: This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to be guaranteed? It doesn't, but the row_to_json function has a very clear mechanism of action. And, 'not being documented' is not the standard for latitude to make arbitrary changes to existing function behaviors. the whole hash*() function family was changed based on not documented premise, so we do have a precedent . In my experience, no JSON parser guarantees key order or duplication. I found one in about two seconds. http://docs.python.org/2/library/json.html object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value ofobject_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion). If object_hook is also defined, the object_pairs_hooktakes priority. That makes the rest of your argument moot. Plus, I quite clearly am dealing with parsers that do. I am sure you could also devise an json encoding scheme where white space is significant ;) The question is, how much of it should json *type* support. As discussed in other thread, most of your requirements would be met by having json/row/row set-to-text serializer functions which output json-formatted text. No, that would not work putting aside the fact it would require rewriting heaps of code. What I do now inside the json wrapping routines is create things like { x: [ {dynamic object}, {dynamic object}, ... ], y: ..., ... } The only way to do it is to build 'dynamic object' into json in advance of the outer xxx_to_json call. The 'dynamic object' is created out of a json builder that takes a paired array -- basically a variant of Andrew's 'json_build' upthread. If the 'json serializer' outputted text, the 'outer' to_json call would then re-escape the object. I can't use hstore for that purpose precisely because of the transformations it does on the object. Stepping back, I'm using json serialization as a kind of 'supercharged crosstab'. To any client that can parse json, json serialization completely displaces crosstabbing -- it's superior in every way. I am, if you may, kind of leading research efforts in the area and I can tell you with absolute certainty that breaking this behavior is a mistake. Forcing hstore-ish output mechanisms removes the ability to handle certain important edge cases that work just fine today. If that ability was taken away, it would be a very bitter pill for me to swallow and would have certain ramifications for me professionally; I went out on a pretty big limb and pushed pg/json aggressively (over strenuous objection) in an analytics product which is now in the final stages of beta testing. I would hate to see the conclusion of the case study be Ultimately we had to migrate the code back to Hibernate due to compatibility issues. Here are the options on the table: 1) convert existing json type to binary flavor (notwithstanding objections) 2) maintain side by side types, one representing binary, one text. unfortunately, i think the text one must get the name 'json' due to unfortunate previous decision. 3) merge the behaviors into a single type and get the best of both worlds (as suggested upthread). I think we need to take a *very* hard look at #3 before exploring #1 or #2: Haven't through it through yet but it may be possible to handle this in such a way that will be mostly transparent to the end user and may have other benefits such as a faster path for serialization. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol
On Fri, Nov 15, 2013 at 6:06 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: But: I very, very much agree with the other concerns around this. This should be a patch to fix single user mode, not one to make postgres into a single process database. It's not, and trying to make it by using single user mode for it will start to hinder development of normal postgres because we suddenly need to be concerned about performance and features in situations where we previously weren't. +1 Maybe what needs to happen to this patch is away to restrict its usage to --single. I'm thinking that postgres --single maybe could be made to fork the server process underneath the psql controler client process transparently. I personally would prefer not to do that. My main non-administrative interest in this mode is doing things like benchmarking protocol overhead. I'm OK with not supporting (and optimizing) for single user code paths but I don't like the idea of building walls that serve no purpose other than to make it difficult for other people mess around. Just document strenuously that this mode is not intended for application bundling and move on... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com wrote: On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote: Here are the options on the table: 1) convert existing json type to binary flavor (notwithstanding objections) 2) maintain side by side types, one representing binary, one text. unfortunately, i think the text one must get the name 'json' due to unfortunate previous decision. 3) merge the behaviors into a single type and get the best of both worlds (as suggested upthread). I think we need to take a *very* hard look at #3 before exploring #1 or #2: Haven't through it through yet but it may be possible to handle this in such a way that will be mostly transparent to the end user and may have other benefits such as a faster path for serialization. If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (see http://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of examples) --- without undue maintenance overhead, then great. I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at $work. Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: *) 'json' type internally has a binary as well a text representation. The text representation is basically the current type behavior (duduplicated unordered). The binary representation is the hstore-ish variant. The text mode is discarded when it's deemed no longer appropriate to be needed, and, once gone, can never be rebuilt as it was. *) only the binary internal representation ever gets stored to disk (or anything else). *) the text mode is preferred for output if it is there. otherwise, a deduplicated, reordered text representation is generated *) When literal text is casted to json, the binary structure is built up and kept alongside binary mode. So, if you went: 'select '{a: 1, a: 2}'::json', you'd get the same thing back. (This is how it works now.). but, if you went: 'insert into foo select '{a: 1, a: 2}'::json returning *', you'd get {a: 2} back essentially (although technically that would be a kind of race). *) When the json is stored to table, the text representation gets immediately discarded on the basis that it's no longer the true representation of the data. *) Ditto when making any equality operation (not as sure on this point). *) Ditto when doing any operation that mutates the structure in any way. the text representation is immutable except during serialization and if it gets invalidated it gets destroyed. *) New API function: json_simplify(); or some such. It reorders and dedups from user's point of view (but really just kills off the text representation) *) once the text mode is gone, you get basically the proposed 'hstore' behavior. *) serialization functions are generally used in contexts that do not store anything but get output as query data. They create *only* the text mode. However, if the resultant json is stored anywhere, the text mode is destroyed and replaced with binary variant. This is both a concession to the current behavior and an optimization of 'serialization-in-query' for which I think the binary mode is pessimal performance wise. so, xxx_to_json serialization functions work exactly as they do now which fixes my problem essentially. *) if you are unhappy with duplicates in the above, just get use to calling json_simpify() on the serialized json (or deal with in on the client side). This is all pretty glossy, but maybe there is a way forward... merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan and...@dunslane.net wrote: On 11/15/2013 03:25 PM, Merlin Moncure wrote: On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com wrote: On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote: Here are the options on the table: 1) convert existing json type to binary flavor (notwithstanding objections) 2) maintain side by side types, one representing binary, one text. unfortunately, i think the text one must get the name 'json' due to unfortunate previous decision. 3) merge the behaviors into a single type and get the best of both worlds (as suggested upthread). I think we need to take a *very* hard look at #3 before exploring #1 or #2: Haven't through it through yet but it may be possible to handle this in such a way that will be mostly transparent to the end user and may have other benefits such as a faster path for serialization. If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (see http://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of examples) --- without undue maintenance overhead, then great. I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at $work. Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: *) 'json' type internally has a binary as well a text representation. The text representation is basically the current type behavior (duduplicated unordered). The binary representation is the hstore-ish variant. The text mode is discarded when it's deemed no longer appropriate to be needed, and, once gone, can never be rebuilt as it was. *) only the binary internal representation ever gets stored to disk (or anything else). *) the text mode is preferred for output if it is there. otherwise, a deduplicated, reordered text representation is generated *) When literal text is casted to json, the binary structure is built up and kept alongside binary mode. So, if you went: 'select '{a: 1, a: 2}'::json', you'd get the same thing back. (This is how it works now.). but, if you went: 'insert into foo select '{a: 1, a: 2}'::json returning *', you'd get {a: 2} back essentially (although technically that would be a kind of race). *) When the json is stored to table, the text representation gets immediately discarded on the basis that it's no longer the true representation of the data. *) Ditto when making any equality operation (not as sure on this point). *) Ditto when doing any operation that mutates the structure in any way. the text representation is immutable except during serialization and if it gets invalidated it gets destroyed. *) New API function: json_simplify(); or some such. It reorders and dedups from user's point of view (but really just kills off the text representation) *) once the text mode is gone, you get basically the proposed 'hstore' behavior. *) serialization functions are generally used in contexts that do not store anything but get output as query data. They create *only* the text mode. However, if the resultant json is stored anywhere, the text mode is destroyed and replaced with binary variant. This is both a concession to the current behavior and an optimization of 'serialization-in-query' for which I think the binary mode is pessimal performance wise. so, xxx_to_json serialization functions work exactly as they do now which fixes my problem essentially. *) if you are unhappy with duplicates in the above, just get use to calling json_simpify() on the serialized json (or deal with in on the client side). This is all pretty glossy, but maybe there is a way forward... It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs. I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it being remotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks like. Well, maybe so. My main gripe with the 'two types' solutions is that: 1) current type is already in core (that is, not an extension). In hindsight, I think this was a huge mistake. 2) current type has grabbed the 'json' type name and the 'json_xxx' API. 3) current type is getting used all over the place 'Two types' means that (AIUI) you can't mess around with the existing API too much. And the new type (due out in 2016?) will be something of a second citizen. The ramifications of dealing with the bifurcation is what makes *my* head hurt. Every day the json stuff is getting more and more widely adopted. 9.4 isn't going to drop until 2014 best case and it won't be widely deployed in the enterprise until 2015 and beyond. So you're going to have a huge code base
Re: [HACKERS] additional json functionality
On Fri, Nov 15, 2013 at 2:54 PM, Josh Berkus j...@agliodbs.com wrote: On 11/15/2013 12:25 PM, Merlin Moncure wrote: Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: *) 'json' type internally has a binary as well a text representation. The text representation is basically the current type behavior snip long detailed explanation of behavior-dependant type That's not at all workable. Users would be completely unable to predict or understand the JSON type and how it acts. That's not just violating POLS; that's bashing POLS' head in with a shovel. All right: make a new type then, and leave the current one alone please. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers