from:"Merlin Moncure"

Re: [HACKERS] jsonb and nested hstore

2014-03-07 Thread Merlin Moncure

On Thu, Mar 6, 2014 at 10:33 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Mar  6, 2014 at 09:50:56PM +0400, Oleg Bartunov wrote:
 Hi there,

 Looks like consensus is done. I and Teodor are not happy with it, but
 what we can do :)   One thing I  want to do is to reserve our
 contribution to the flagship feature (jsonb), particularly, binary
 storage for nested structures and indexing. Their work was sponsored
 by Engine Yard.

 OK, if we are going with an unchanged hstore in contrib and a new JSONB,
 there is no reason to wack around JSONB to be binary compatible with the
 old hstore format.  What sacrifices did we need to make to have JSBONB
 be binary compatible with hstore, can those sacrifices be removed, and
 can that be done in time for 9.4?

Also,
*) what hstore2 features (if any) that are not already reflected in
the jsonb type are going to be moved to josnb for 9.4?
*) if the answer above is anything but 'nothing', what hstore-isms are
going to be adjusted in the process of doing so?  Presumably there
would be same function name changes to put them in the jsonb style but
also the hstore sytnax ('=') is going to be embedded in some of the
search operators and possibly other things.  Is that going change?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-06 Thread Merlin Moncure

On Thu, Mar 6, 2014 at 11:28 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So here my opinion on what we should do:

 1. Forget about hstore2
 2. Add GIN and GIST operator classes to jsonb, if they're ready for commit
 pretty darn soon. If not, punt them to next release.

For #2, would we maintain the hstore syntax for the searching
operators.  For example,

SELECT count(*) FROM jsonb_schema WHERE tabledata @ 'columns =
{{column_name=total_time}}';

Note the hstore-ish = in the searching operator.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 8:39 AM, Bruce Momjian br...@momjian.us wrote:
 So, I am going to ask a back-track question and ask why we can't move
 hstore into core.

This is exactly the opposite of what should be happening.  Now, jsonb
might make it into core because of the json precedent but the entire
purpose of the extension system is stop dumping everything in the
public namespace.   Stuff 'in core' becomes locked in stone, forever,
because of backwards compatibility concerns, which are IMNSHO, a
bigger set of issues than even pg_upgrade related issues.  Have we
gone through all the new hstore functions and made sure they don't
break existing applications?  Putting things in core welds your only
escape hatch shut.

*All* non-sql standard types ought to be in extensions in an ideal world.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 9:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andrew Dunstan and...@dunslane.net writes:
 On 03/05/2014 09:39 AM, Bruce Momjian wrote:
 So, I am going to ask a back-track question and ask why we can't move
 hstore into core.  Is this a problem with the oids of the hstore data
 type and functions?  Is this a pg_upgrade-only problem?  Can this be
 fixed?

 Yes, pg_upgrade is the problem, and no, I can't see how it can be fixed.

 Builtin types have Oids in a certain range. Non-builtin types have Oids
 outside that range. If you have a clever way to get over that I'd be all
 ears, but it seems to me insurmountable right now.

 More to the point:

 1. Built-in types have predetermined, fixed OIDs.  Types made by
 extensions do not, and almost certainly will have different OIDs in
 different existing databases.

 2. There's no easy way to change the OID of an existing type during
 pg_upgrade, because it may be on-disk in (at least) array headers.

 We could possibly get around #2, if we could think of a secure way
 for array_out and sibling functions to identify the array type
 without use of the embedded OID value.  I don't know how we could
 do that though, particularly in polymorphic-function contexts.

 Also, there might be other cases besides arrays where we've embedded
 type OIDs in on-disk data; anyone remember?

composite types.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 9:52 AM, Bruce Momjian br...@momjian.us wrote:
 On Wed, Mar  5, 2014 at 09:19:33AM -0600, Merlin Moncure wrote:
 On Wed, Mar 5, 2014 at 8:39 AM, Bruce Momjian br...@momjian.us wrote:
  So, I am going to ask a back-track question and ask why we can't move
  hstore into core.

 This is exactly the opposite of what should be happening.  Now, jsonb
 might make it into core because of the json precedent but the entire
 purpose of the extension system is stop dumping everything in the
 public namespace.   Stuff 'in core' becomes locked in stone, forever,
 because of backwards compatibility concerns, which are IMNSHO, a
 bigger set of issues than even pg_upgrade related issues.  Have we
 gone through all the new hstore functions and made sure they don't
 break existing applications?  Putting things in core welds your only
 escape hatch shut.

 *All* non-sql standard types ought to be in extensions in an ideal world.

 I have seen your opinion on this but there have been enough
 counter-arguments that I am not ready to head in that direction.

The only counter argument given is that this will prevent people from
being able to use extensions because they A: can't or won't install
contrib packages or B: are too stupid or lazy to type 'create
extension json'.  Note I'm discussing 'in core extension vs in core
built in'.  'out of core extension' loosely translates to 'can't be
used by the vast majority of systems.

Most corporate IT departments (including mine) have a policy of only
installing packages through the operating system packaging to simplify
management of deploying updates.  Really strict companies might not
even allow anything but packages supplied by a vendor like redhat
(which in practice keeps you some versions back from the latest).
Now, if some crappy hosting company blocks contrib I don't believe at
all that this should drive our project management decisions.

Postgresql is both a database and increasingly a development language
platform.  Most good stacks have a system (cpan, npm, etgc)  to manage
the scope of the installed runtime and it's a routine and expected
exercise to leverage that system.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 10:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 There's the absolutely significant issue that you cannot reasonably
 write extensions that interact on a C level. You can't call from
 extension to extension directly, but you can from extension to pg core
 provided ones.

Certainly.  Note I never said that the internal .so can't be in core
that both extensions interface with and perhaps wrap.  It would be
nice to have a intra-extension call system worked out but that in no
way plays to the bigger issues at stake.  This is all about management
of the public API; take a good skeptical look at the history of types
like xml, json, geo, money and others.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 *All* non-sql standard types ought to be in extensions in an ideal world.

 While there's certainly much to be said for the idea that jsonb should be
 an extension, I don't think we have the technology to package it as a
 *separate* extension; it'd have to be included in the hstore extension.

I disagree with that.  The shared C bits can live inside the core
system and the SQL level hooks and extension specific behaviors could
live in an extension.  AFAICT moving jsonb to extension is mostly a
function of migrating the hard coded SQL hooks out to an extension
(I'm probably oversimplifying though).

 Just out of curiosity, exactly what features are missing from jsonb
 today that are available with hstore?  How long would it take to
 copy-and-paste all that code, if someone were to decide to do the
 work instead of argue about it?

Basically opclasses, operators (particularly search operators) and
functions/operators to manipulate the hstore in place.  Personally I'm
not inclined to copy/paste the code.  I'd also like to see this stuff
committed, and don't want to hold up the patch for that unless it's
determined for other reasons (and by other people) this is the only
reasonable path for 9.4.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 10:43 AM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Merlin Moncure mmonc...@gmail.com writes:
  *All* non-sql standard types ought to be in extensions in an ideal 
  world.
 
  While there's certainly much to be said for the idea that jsonb should be
  an extension, I don't think we have the technology to package it as a
  *separate* extension; it'd have to be included in the hstore extension.

 I disagree with that.  The shared C bits can live inside the core
 system and the SQL level hooks and extension specific behaviors could
 live in an extension.  AFAICT moving jsonb to extension is mostly a
 function of migrating the hard coded SQL hooks out to an extension
 (I'm probably oversimplifying though).

 Yeah, from what I gather you're suggesting, that's more-or-less move it
 all to core, except that all of the actual interface bits end up in an
 extension that has to be installed to use what would have to already be
 there.  I don't see that as any kind of improvement.

If you don't then you simply have not been paying attention to the
endless backwards compatibility problems we've faced which are highly
ameliorated in an extension heavy world.  Also, you're ignoring the
fact that having an endlessly accreting set of symbols in the public
namespace is not free.  Internal C libraries don't have to be
supported and don't have any signficant user facing costs by simply
being there.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 11:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Wed, Mar 5, 2014 at 10:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 While there's certainly much to be said for the idea that jsonb should be
 an extension, I don't think we have the technology to package it as a
 *separate* extension; it'd have to be included in the hstore extension.

 I disagree with that.  The shared C bits can live inside the core
 system and the SQL level hooks and extension specific behaviors could
 live in an extension.

 That approach abandons every bit of value in an extension, no?
 You certainly don't get to fix bugs outside a core-system release cycle.

That's core vs non core debate.  Just about everyone (including me)
wants json and hstore to live in core -- meaning packaged, shipped,
supported, and documented with the postgresql source code releases.
Only an elite set of broadly useful and popular extensions get that
honor of which json is most certainly one.

Moreover, you give up nothing except the debate/approval issues to get
your code in core.  If you want to release off cycle, you can
certainly do that and enterprising users can simply install the
extension manually (or perhaps via pgxn) instead of via contrib.

BTW,This is yet another thing that becomes impossible if you don't
extension (on top of legacy/backwards compatibility issues and general
bloat which is IMNSHO already a pretty severe situation).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 11:44 AM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 On Wed, Mar 5, 2014 at 10:43 AM, Stephen Frost sfr...@snowman.net wrote:
  Yeah, from what I gather you're suggesting, that's more-or-less move it
  all to core, except that all of the actual interface bits end up in an
  extension that has to be installed to use what would have to already be
  there.  I don't see that as any kind of improvement.

 If you don't then you simply have not been paying attention to the
 endless backwards compatibility problems we've faced which are highly
 ameliorated in an extension heavy world.

 We have backwards compatibility problems because we don't want to
 *break* things for people.  Moving things into extensions doesn't
 magically fix that- if you break something in a backwards-incompatible
 way then you're going to cause a lot of grief for people.

It doesn't magically fix it, but at least provides a way forward. If
the function you want to modify is in an extension 'foo', you get to
put your new stuff in 'foo2' extension.  That way your users do not
have to adjust all the code you would have broken.  Perhaps for
in-core extensions you offer the old one in contrib for a while until
a reasonable amount of time passes then move it out to pgxn.  This is
a vastly better system than the choices we have now, which is A. break
code or B. do nothing.

 Also, you're ignoring the
 fact that having an endlessly accreting set of symbols in the public
 namespace is not free.  Internal C libraries don't have to be
 supported and don't have any signficant user facing costs by simply
 being there.

 I *really* hate how extensions end up getting dumped into the public
 schema and I'm not a big fan for having huge search_paths either.

At least with extensions you have control over this.

 mentioned earlier- I'm also not advocating that everything be put into
 core.  I don't follow what you mean by Internal C libraries don't have
 to be supported because,

I mean, we are free to change them or delete them.  They do not come
with the legacy that user facing API comes.  They also do not bloat
the public namespace.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 2:45 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Merlin Moncure escribió:
 It doesn't magically fix it, but at least provides a way forward. If
 the function you want to modify is in an extension 'foo', you get to
 put your new stuff in 'foo2' extension.  That way your users do not
 have to adjust all the code you would have broken.  Perhaps for
 in-core extensions you offer the old one in contrib for a while until
 a reasonable amount of time passes then move it out to pgxn.

 Uhm.  Would it work to define a new version of foo, say 2.0, but keep
 the old 1.2 version the default?  That way, if you want to keep the old
 foo you do nothing (after both fresh install and pg_upgrade), and if you
 want to upgrade to the new code, it's just an ALTER EXTENSION UPDATE
 away.

Certainly.  The important point though is that neither option is
available if the old stuff is locked into the public namespace.
Consider various warts like the array ('array_upper' et al) API or geo
types.  We're stuck with them.  Even with jsonb: it may be the hot new
thing *today* but 5 years down the line there's json2 that does all
kinds of wonderful things we haven't thought about -- what if it
displaces current usages?  The very same people who are arguing that
jsonb should not be in an extension are the ones arguing json is
legacy and to be superseded.  These two points of view IMO are
directly in conflict: if json would have been an extension than the
path to deprecation is clear.  Now the json functions are in the
public namespace.  Forever (or at least for a very long time).

On Wed, Mar 5, 2014 at 2:46 PM, Stephen Frost sfr...@snowman.net wrote:
 I don't see why we can't do exactly what you're suggesting in core.

Because you can't (if you're defining core to mean 'not an
extension').  Functions can't be removed or changed because of legacy
application support.  In an extension world, they can -- albeit not
'magically', but at least it can be done.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-05 Thread Merlin Moncure

On Wed, Mar 5, 2014 at 4:24 PM, Stephen Frost sfr...@snowman.net wrote:
 * Merlin Moncure (mmonc...@gmail.com) wrote:
 On Wed, Mar 5, 2014 at 2:46 PM, Stephen Frost sfr...@snowman.net wrote:
  I don't see why we can't do exactly what you're suggesting in core.

 Because you can't (if you're defining core to mean 'not an
 extension').  Functions can't be removed or changed because of legacy
 application support.  In an extension world, they can -- albeit not
 'magically', but at least it can be done.

 That simply isn't accurate on either level- if there is concern about
 application support, that can apply equally to core and contrib, and we
 certainly *can* remove and/or redefine functions in core with sufficient
 cause.  It's just not something we do lightly for things living in
 either core or contrib.

 For an example, consider the FDW API, particularly what we did between
 9.1 and 9.2.

Well, we'll have to agree to disagree I suppose.  Getting back on
topic, the question is 'what about jsonb/hstore2'?  At this point my
interests are practical.  I promised (heh) to bone up the docs. I'm on
vacation this weekend so it's looking like around sometime late next
week for that.  In particular, it'd be helpful to get some kind of
read on the final disposition of hstore2.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-04 Thread Merlin Moncure

On Tue, Mar 4, 2014 at 6:48 AM, Teodor Sigaev teo...@sigaev.ru wrote:
 On Tue, Mar 4, 2014 at 2:44 AM, Teodor Sigaev teo...@sigaev.ru wrote:

 Do we have function to trim right zeros  in numeric?


 Fixed, pushed to github
 (https://github.com/feodor/postgres/tree/jsonb_and_hstore). Now it used
 hash_numeric to index numeric value. As I can see, it provides needed trim
 and doesn't depend on locale. Possible mismatch (the same hash value for
 different numeric valye) will rechecked anyway - interested operations set
 recheck flag.

huh.  what it is the standard for equivalence?  I guess we'd be
following javascript ===, right?
(http://dorey.github.io/JavaScript-Equality-Table/).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-03-03 Thread Merlin Moncure

On Fri, Feb 28, 2014 at 2:01 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-28 14:45:29 -0500, Andrew Dunstan wrote:
 Well, the jsonb portion of this is arguably the most ready, certainly it's
 had a lot more on-list review.

 Having crossread both patches I tend to agree with this. I don't think
 it's unrealistic to get jsonb committable, but the hstore bits are
 another story.

hm, do you have any specific concerns/objections about hstore?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-28 Thread Merlin Moncure

On Fri, Feb 28, 2014 at 8:57 AM, Stephen Frost sfr...@snowman.net wrote:
 The specific issues mentioned on this thread look more like bugs to be
 addressed or additional operators which need to be implemented for
 jsonb (imv, that should really be done for 9.4, but we have this
 deadline looming...) along with perhaps dropping the implicit cast
 between json and hstore (is there really a need for it..?).

Bugs/bad behaviors should be addressed (which AFAICT are mostly if not
all due to implicit casts).  Missing operators OTOH are should not
hold up the patch, particuarly when the you have the option of an
explicit cast to hstore if you really want them.

Notwithstanding some of the commentary above, some of jsonb features
(in particular, the operators) are quite useful and should find
regular usage (json has them also, but jsonb removes the performance
penalty).  The upshot is that with the current patch you have to do a
lot of casting to get 100% feature coverage and that future
improvements to jsonb will remove the necessity of that.  Also the
hstore type will be required to do anything approximating the nosql
pattern.

I don't think the extension issue is a deal breaker either way.  While
I have a preference for extensions generally, this is nothing personal
to jsonb.  And if we can't come to a consensus on that point the patch
should be accepted on precedent (json being in core).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-28 Thread Merlin Moncure

On Fri, Feb 28, 2014 at 1:45 PM, Andrew Dunstan and...@dunslane.net wrote:
 Well, the jsonb portion of this is arguably the most ready, certainly it's
 had a lot more on-list review.

That is definitely true.   Also, the jsonb type does not introduce any
new patterns that are not already covered by json -- it just does some
things better/faster (and, in a couple of cases, a bit differently) so
there's a safe harbor.  The implicit casts snuck in after the review
started -- that was a mistake obviously (but mostly with hstore).  The
side argument of 'to extension or not' is just that.  Make a decision
and commit this thing.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-27 Thread Merlin Moncure

On Thu, Feb 27, 2014 at 1:11 PM, Josh Berkus j...@agliodbs.com wrote:
However, we had this discussion already in November-December, which
resulted in the current patch. Now you and Robert want to change the
rules on Andrew, which means Andrew is ready to quit, and we go another
year without JSON indexing.

How we got here is not the point. All that matters is what's going to
happen from here. Here are the facts as I see them:

1) we've worked ourselves into a situation where we're simultaneously
developing two APIs that do essentially exactly the same thing (hstore
and jsonb). Text json is not the problem and is irrelevant to the
discussion.

2) The decision to do that was made a long time ago. I complained
loudly as my mousy no-programming-only-griping voice would allow here:
http://postgresql.1045698.n5.nabble.com/JSON-Function-Bike-Shedding-tp5744932p5746152.html.
The decision was made (and Robert cast one of the deciding votes in
support of that decision) to bifurcate hstore/json. I firmly believe
that was a mistake but there's no point in revisiting it. Done is
done.

3) In it's current state jsonb is not very useful and we have to
recognize that; it optimizes text json but OTOH covers, maybe 30-40%
of what hstore offers. In particular, it's missing manipulation and
GIST/GIN. The stuff it does offer however is how Andrew, Josh and
others perceive the API will be used and I defer to them with the
special exception of deserialization (the mirror of to_json) which is
currently broken or near-useless in all three types. Andrew
recognized that and has suggested a fix; even then to me it only
matters to the extent that the API is clean and forward compatible.

Here are the options on the table:
a) Push everything to 9.5 and introduce out of core hstore2/jsonb
extensions to meet market demand. Speaking practically, 'out of core'
translates to Can't be used to most industrial IT shops. I hate
this option but recognize it's the only choice if the code isn't ready
in time.

b) Accept hstore2 but push jsonb on the premise they should be married
in some way or that jsonb simply isn't ready. I'm not a fan of this
option either unless Andrew specifically thinks it's a good idea. The
stuff that is there seems to work pretty well (again, except
deserialization which I haven't tested recently) and the jsonb
patterns that are in place have some precedent in terms of the text
json type.

c) Accept hstore2 and jsonb as in-core extensions (assuming code
worthiness). Since extensions can't call into each other (this really
ought to be solved at some point) this means a lot of code copy/pasto.
The main advantage here is that it reduces the penalty of failure
and avoids pollution of the public schema. I did not find the
rationale upthread that there was a stigma to in-core extensions in
any way convincing. In fact I'd go further and suggest that we really
ought to have a project policy to have all non-SQL standard functions,
operators and types as extensions from here on out. Each in-core type
introduction after having introduced the extension system has left me
scratching my head.

d) The status quo. This essentially means we'll have to liberally
document how things are (to avoid confusing our hapless users) and
take Andrew at his word that a separate extension will materialize
making jsonb more broadly useful. The main concern here is that the
market will vote with their feet and adopt hstore API style broadly,
sticking us with a bunch of marginally used functions in the public
namespace to support forever.

My personal preference is c) but am perfectly ok with d), particularly
if there was more visibility into the long term planning. Good
documentation will help either way and that's why I signed up for it.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure

On Tue, Feb 25, 2014 at 10:07 PM, Craig Ringer cr...@2ndquadrant.com wrote:
 On 02/26/2014 06:21 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.

 Sure.

 Please also highlight that any change will require a full table rewrite
 with an exclusive lock, so data type choices on larger tables may be
 hard to change later.

Yeah.  Good idea.  Also gonna make a table of what happens when you
cast from A to B (via text, json, jsonb, hstore).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure

On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 02/25/2014 08:54 PM, Josh Berkus wrote:
 That's called a straw man argument, Robert.
 Me: We should recommend that people use jsonb unless they have a
 specific reason for using json.
 We could also make the opposite argument - people use json unless they
 have a specific reason for using jsonb.

 btw, there is one more thing about JSON which I recently learned - a lot of
 JavaScript people actually expect the JSON binary form to retain field order

 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.

Yes: This was the agreement that was struck and is the main reason why
there are two json types, not one.  JSON does not guarantee field
ordering as I read the spec and for the binary form ordering is not
maintained as a concession to using the hstore implementation.

You can always use the standard text json type for storage and cast
into the index for searching; what you give up there is some
performance and the ability to manipulate the json over the hstore
API.  I think that will have to do for now and field ordering for
hstore/jsonb can be reserved as a research item.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure

On Wed, Feb 26, 2014 at 11:41 AM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 07:02 AM, Merlin Moncure wrote:
 On Tue, Feb 25, 2014 at 3:57 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 It is not in any specs, but nevertheless all major imlementations do it and
 some code depends on it.
 IIRC, this behaviour is currently also met only by json and not by jsonb.

 Yes: This was the agreement that was struck and is the main reason why
 there are two json types, not one.  JSON does not guarantee field
 ordering as I read the spec and for the binary form ordering is not
 maintained as a concession to using the hstore implementation.

 Actually, that's not true; neither Mongo/BSON nor CouchDB preserve field
 ordering.  So users who are familiar with JSONish data *storage* should
 be aware that field ordering is not preserved.

right (although I'm not sure what wasn't true there).  I think the
status quo is fine; If you have to have the document precisely
preserved for whatever reason you can do that -- you just have to be
prepared to give up some things.  As noted in the other thread
serialization is more interesting but that also works fine.  The
breakdown in terms of usage between json/jsonb to me is very clear
(json will handle serialization/deserializaton heavy patterns and a
few edge cases for storage).   The split between json and jsonb in
hindsight made a lot of sense.

What is not going to be so clear for users (particularly without good
supporting documentation) is how things break down in terms of usage
between hstore and jsonb.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-26 Thread Merlin Moncure

On Wed, Feb 26, 2014 at 12:05 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/26/2014 09:57 AM, Merlin Moncure wrote:
 What is not going to be so clear for users (particularly without good
 supporting documentation) is how things break down in terms of usage
 between hstore and jsonb.

 Realistically?  Once we get done with mapping the indexes and operators,
 users who are used to Hstore1 use Hstore2, and everyone else uses jsonb.
  jsonb is nothing other than a standardized syntax interface to hstore2,
 and most users will choose the syntax similar to what they already know
 over learning new stuff.

The problem is that as of today, they are not done and AFAICT will not
be for 9.4.  Developers wanting to utilize the nosql pattern are going
to have to lean heavily on hstore API and that's a simple
fact...people reading about all the great new feature of postgres are
going to want to learn how to do things and it's reasonable to want to
anticipate the things they want to do and explain how to use them.  I
would like to extend that case coverage to include the json type as
well as its documentation is pretty lousy for that (I should know: I
wrote most of it).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Merlin Moncure

On Mon, Feb 24, 2014 at 1:15 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 02/24/2014 11:06 AM, Merlin Moncure wrote:

 On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com
 wrote:

 On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com
 wrote:

 I still find the phrasing as jsonb is more efficient for most
 purposes to be a bit off  Basically, the text json type is faster for
 serialization/deserialization pattern (not just document preservation)
 and jsonb is preferred when storing json and doing repeated
 subdocument accesses.

 Hm, I'm going to withdraw that.  I had done some testing of simple
 deserialization (cast to text and the like) and noted that jsonb was
 as much as 5x slower.  However, I just did some checking on
 json[b]_populate_recordset though and it's pretty much a wash.

 [sorry for noise on this].

 Here's the use case coverage as I see it today:

 CASE:jsonjsonb hstore
 Static document: yes  poor  poor
 Precise document:yes  nono
 Serialization:   yes  nono
 Deserialization: poor***  yes*  no
 Repeated Access: poor yes   yes
 Manipulation:no   no**  yes
 GIST/GIN searching:  no   no**  yes

 notes:
 * jsonb gets 'yes' for deserialization assuming andrew's 'two level'
 deserialization fix goes in (otherwise 'poor').
 ** jsonb can't do this today, but presumably will be able to soon
 *** 'poor' unless json type also gets the deserialization fix, then 'yes'.
  hstore can deserialize hstore format, but will rely on json/jsonb
 for deserializing json

 'Static document' represents edge cases where the json is opaque to
 the database but performance -- for example large map polygons.
 'Precise document' represents cases where whitespace or key order is
 important.

 Peter asked upthread how to access the various features.  Well, today,
 it basically means a bit of nimble casting to different structures
 depending on which particular features are important to you, which
 IMNSHO is not bad at all as long as we understand that most people who
 rely on jsonb will also need hstore for its searching and operators.
 Down the line when hstore and jsonb are more flushed out it's going to
 come down to an API style choice.

 Frankly, a lot of the above doesn't make much sense to me. WTF is
 Manipulation'?

 Unless I see much more actual info on the tests being conducted it's just
 about impossible to comment. The performance assessment at this stage is
 simply anecdotal as far as I'm concerned.

Er, I wasn't making performance assessments (except in cases where it
was obvious like poor support for arbitrary access with json) , but
API coverage of use cases.  Manipulation I thought obvious: the
ability to manipulate the document (say, change some value to
something else): the nosql pattern.  through the API.  Neither json or
jsonb can do that at present...only hstore can.  jsonb cant't; it only
covers some of what json type currently covers (but some of the thing
it does cover is much faster).

On Mon, Feb 24, 2014 at 11:31 AM, Josh Berkus j...@agliodbs.com wrote:
 Hm, I'm going to withdraw that.  I had done some testing of simple
 deserialization (cast to text and the like) and noted that jsonb was
 as much as 5x slower.  However, I just did some checking on
 json[b]_populate_recordset though and it's pretty much a wash.

 Aside from that, I want our docs to make a strong endorsement of using
 jsonb over json for most users.  jsonb will continue to be developed and
 improved in the future; it is very unlikely that json will.  Maybe
 that's what I should say rather than anything about efficiency.

I would hope that endorsement doesn't extend to misinforming users.
Moreover, json type is handling all serialization at present and will
continue to do so for some years.  In fact, in this release we got a
bunch of new very necessary enhancements (json_build) to
serialization!  You're trying to deprecate and enhance the type at the
same time!

The disconnect here is that your statements would be correct if the
only usage for the json type would be for storing data in json.
However, people (including myself) are doing lots of wonderful things
storing data in the traditional way and moving into and out of json in
queries and that, besides working better in the json type, is only
possible in json.  That might change in the future by figuring out a
way to cover json serialization cases through jsonb but that's not how
things work today, end of story.

Look, I definitely feel the frustration and weariness here in terms of
my critiquing the proposed API along with the other arguments I've
made.  Please understand that nobody wants this to go out the door
more than me if the objective is to lock in the API 'as is' then let's
be polite to our users and try to document various use cases and
what's good at what.

merlin


-- 
Sent via pgsql-hackers mailing list

Re: [HACKERS] jsonb and nested hstore

2014-02-25 Thread Merlin Moncure

On Tue, Feb 25, 2014 at 4:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/25/2014 12:12 PM, Robert Haas wrote:
 I don't agree that jsonb should be preferred in all but a handful of
 situations.  Nor do I agree that partisanship belongs in our
 documentation.  Therefore, -1 for your proposal to recommend that, and
 +1 for Merlin's proposal to present a comparison which fairly
 illustrates the situations in which each will outperform the other.

 Awaiting doc patch from Merlin, then.  It will need to be clear enough
 that an ordinary user can distinguish which type they want.

Sure.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-24 Thread Merlin Moncure

On Mon, Feb 24, 2014 at 12:20 AM, Josh Berkus j...@agliodbs.com wrote:
 All,

 Here's a draft cleanup on the JSON section of the Datatype docs.  Since
 there's been a bunch of incremental patches on this, I just did a diff
 against HEAD.

 I looked over json-functions a bit, but am not clear on what needs to
 change there; the docs are pretty similar to other sections of
 Functions, and if they're complex it's because of the sheer number of
 JSON-related functions.

 Anyway, this version of datatypes introduces a comparison table, which I
 think should make things a bit clearer for users.

I still find the phrasing as jsonb is more efficient for most
purposes to be a bit off  Basically, the text json type is faster for
serialization/deserialization pattern (not just document preservation)
and jsonb is preferred when storing json and doing repeated
subdocument accesses.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-24 Thread Merlin Moncure

On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote:
 I still find the phrasing as jsonb is more efficient for most
 purposes to be a bit off  Basically, the text json type is faster for
 serialization/deserialization pattern (not just document preservation)
 and jsonb is preferred when storing json and doing repeated
subdocument accesses.

Hm, I'm going to withdraw that.  I had done some testing of simple
deserialization (cast to text and the like) and noted that jsonb was
as much as 5x slower.  However, I just did some checking on
json[b]_populate_recordset though and it's pretty much a wash.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-24 Thread Merlin Moncure

On Mon, Feb 24, 2014 at 9:08 AM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Feb 24, 2014 at 8:46 AM, Merlin Moncure mmonc...@gmail.com wrote:
 I still find the phrasing as jsonb is more efficient for most
 purposes to be a bit off  Basically, the text json type is faster for
 serialization/deserialization pattern (not just document preservation)
 and jsonb is preferred when storing json and doing repeated
subdocument accesses.

 Hm, I'm going to withdraw that.  I had done some testing of simple
 deserialization (cast to text and the like) and noted that jsonb was
 as much as 5x slower.  However, I just did some checking on
 json[b]_populate_recordset though and it's pretty much a wash.

[sorry for noise on this].

Here's the use case coverage as I see it today:

CASE:jsonjsonb hstore
Static document: yes  poor  poor
Precise document:yes  nono
Serialization:   yes  nono
Deserialization: poor***  yes*  no
Repeated Access: poor yes   yes
Manipulation:no   no**  yes
GIST/GIN searching:  no   no**  yes

notes:
* jsonb gets 'yes' for deserialization assuming andrew's 'two level'
deserialization fix goes in (otherwise 'poor').
** jsonb can't do this today, but presumably will be able to soon
*** 'poor' unless json type also gets the deserialization fix, then 'yes'.
 hstore can deserialize hstore format, but will rely on json/jsonb
for deserializing json

'Static document' represents edge cases where the json is opaque to
the database but performance -- for example large map polygons.
'Precise document' represents cases where whitespace or key order is important.

Peter asked upthread how to access the various features.  Well, today,
it basically means a bit of nimble casting to different structures
depending on which particular features are important to you, which
IMNSHO is not bad at all as long as we understand that most people who
rely on jsonb will also need hstore for its searching and operators.
Down the line when hstore and jsonb are more flushed out it's going to
come down to an API style choice.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-11 Thread Merlin Moncure

On Tue, Feb 11, 2014 at 3:35 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 02/11/2014 01:16 AM, Merlin Moncure wrote:
 On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 It works in enough cases atm that it's worthwile trying to keep it
 working. Sure, it could be better, but it's what we have right now. Atm
 it's e.g. the only realistic way to copy larger amounts of bytea between
 servers without copying the entire cluster.
 That's the thing -- it might work today, but what about tomorrow?
 We'd be sending the wrong signals.  People start building processes
 around all of this and now we've painted ourselves into a box.  Better
 in my mind to simply educate users that this practice is dangerous and
 unsupported, as we used to do. I guess until now.  It seems completely
 odd to me that we're attaching a case to the jsonb type, in the wrong
 way -- something that we've never attached to any other type before.
 For example, why didn't we attach a version code to the json type send
 function?
 JSON is supposed to be a *standard* way of encoding data in
 strings. If the ever changes, it will not be JSON type anymore.

My point was that as we reserved the right to change jsonb binary
format we'd probably want to reserve the right to change json's as
well.  This was in support of the theme of 'why is jsonb a special
case?'.  However, I think it's pretty much settled that the any
potential concerns I raised in terms of providing a version flag are
outweighed by it's potential usefulness.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote:
 On 02/10/2014 05:05 AM, Andres Freund wrote:
 I'd suggest making the format discernible from possible different future
 formats, to allow introducing a proper binary at some later time. Maybe
 just send a int8 first, containing the format.
 

 Teodor privately suggested something similar.  I was thinking of just
 sending a version byte, which for now would be '\x01'. An int8 seems like
 more future-proofing provision than we really need.

 Hm. Isn't that just about the same? I was thinking of the c type int8,
 not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than
 to do it manually inside the string.

-1.   Currently no other wire format types send version and it's not
clear why this one is special.  We've changed the wire format versions
before and it's upon the client to deal with those changes.  The
server version *is* the version basically.  If a broader solution
exists I think it should be addressed broadly.  Versioning one type
only IMNSHO is a complete hack.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 12:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
 On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote:
 Teodor privately suggested something similar.  I was thinking of just
 sending a version byte, which for now would be '\x01'. An int8 seems like
 more future-proofing provision than we really need.

 -1.   Currently no other wire format types send version and it's not
 clear why this one is special.  We've changed the wire format versions
 before and it's upon the client to deal with those changes.

 Really?  How would you expect to do that, exactly?  In particular,
 how would you propose that a binary pg_dump file be reloadable if
 we redefine the binary format down the road without having made
 provision like this?

 Versioning one type only IMNSHO is a complete hack.

 I don't feel a need for versioning int, or float8, or most other types;
 and that includes the ones for which we've previously defined binary
 format as equivalent to text (enums).  In this case we know that we're not
 totally satisfied with the binary format we're defining today, so I think
 a type-specific escape hatch is a reasonable solution.

 Moreover, I don't especially buy tying it to server version, even if we
 had an information pathway that would provide that reliably in all
 contexts.

Why not?  Furthermore what are we doing now?  If we need a binary
format contract that needs to be separated from this discussion.

I've written (along with Andrew C) the only serious attempt to deal
with client side binary format handling (http://libpqtypes.esilo.com/)
and in all interesting cases it depends on the server version to
define binary parsing behaviors.   I agree WRT float8, etc but other
types have changed in a couple of cases and it's always been with the
version.   I find it highly unlikely that any compatibility behaviors
are going to be defined *for each and every returned datum* now, or
ever...so even if it's a few bytes lost, why do it?  Intra-version
compatibility issues should they ever have to be handled would be more
likely handled at connection- or query- time.

Point being, if an escape hatch is needed, I'm near 100% certain this
is not the right place to do it.  Binary wire format compatibility is
a complex topic and proposed solution ISTM is not at all fleshed out.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 5:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 11:59:53 -0600, Merlin Moncure wrote:
 On Mon, Feb 10, 2014 at 6:39 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-02-10 07:27:59 -0500, Andrew Dunstan wrote:
  On 02/10/2014 05:05 AM, Andres Freund wrote:
  I'd suggest making the format discernible from possible different future
  formats, to allow introducing a proper binary at some later time. Maybe
  just send a int8 first, containing the format.
  
 
  Teodor privately suggested something similar.  I was thinking of just
  sending a version byte, which for now would be '\x01'. An int8 seems like
  more future-proofing provision than we really need.
 
  Hm. Isn't that just about the same? I was thinking of the c type int8,
  not the 64bit type. It seems cleaner to do a pg_sendint(..., 1, 1) than
  to do it manually inside the string.

 -1.   Currently no other wire format types send version and it's not
 clear why this one is special.  We've changed the wire format versions
 before and it's upon the client to deal with those changes.  The
 server version *is* the version basically.  If a broader solution
 exists I think it should be addressed broadly.  Versioning one type
 only IMNSHO is a complete hack.

 I don't find that very convincing. The entire reason jsonb exists is
 because the parsing overhead of text json is significant, so it stands
 to reason that soon somebody will try to work on a better wire protocol,
 even if the current code cannot be made ready for 9.4. And I don't think
 past instability of binary type's formats is a good reason for
 *needlessly* breaking stuff like binary COPYs.
 And it's not like one prefixed byte has any real-world relevant cost.

The point is, why does this one type get a version id?  Imagine a
hypothetical program that sent/received the binary format for jsonb.
All you have to to is manage the version flag appropriately, right?

Wrong.  You still need to have code that checks the server version and
see if it's supported (particularly for sending) and as there is *no
protocol negotiation of the formats at present it's all going to boil
down to if version = X do Y*.   How does the server know which
'versions' are ok to send? It doesn't.  Follow along with me here:
Suppose we don't introduce a version flag today and change the format
to some more exotic structure for 9.5.  How has the version flag made
things easier for the client?  It hasn't. The client goes if version
= X do Y.

I guess you could argue that having a version flag could, say, allow
libpq clients to gracefully error out if, say, a old non-exotic-format
speaking libpq happens to connect to a newer sever -- assuming the
client actually bothered to check the flag.  That's zero help to the
client though -- regardless the compatibility isn't established and
that's zero help to other binary formats that we have=, and probably
will continue to-, change.  What about them?  Are we now, at the
upteenth hour of the final commit fest, suddenly deciding that binary
wire formats going to be compatible across versions?

The kinda low effort way to deal with binary format compatibility is
to simply document the existing formats and document format changes in
some convenient place.  The 'real' long term path to doing it IMO is
to abstract out a shared/client server type library with some protocol
negotiation features.  Then, at connection time, the client/server
agree on what's the optimal way to send things -- perhaps the client
can signal things like 'want compression for long datums'.

The only case for a version flag at the data point level is if the
server is sending version X at this tuple and version Y at that tuple.
 I don't think that's a makable case.  Some might say, what about a
compression bit based on compressibility/length? and to that I'd
answer: why is that handling specific to the json type...are
text/bytea/arrays not worth that feature too?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 5:38 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-02-10 17:35:12 -0600, Merlin Moncure wrote:
 Wrong.  You still need to have code that checks the server version and
 see if it's supported (particularly for sending) and as there is *no
 protocol negotiation of the formats at present it's all going to boil
 down to if version = X do Y*.   How does the server know which
 'versions' are ok to send? It doesn't.  Follow along with me here:
 Suppose we don't introduce a version flag today and change the format
 to some more exotic structure for 9.5.  How has the version flag made
 things easier for the client?  It hasn't. The client goes if version
 = X do Y.

 think of binary COPY outputting data in 9.4 and then trying to import
 that data into 9.5. That's the interesting case here.

right, json could be made work, but any other format change introduced
to any other already existing type will break.  That's not a real
solution unless we decree henceforth that no formats will change from
here on in, in which case I withdraw my objection.

I think COPY binary has exactly the same set of considerations as the
client side.  If you want to operate cleanly between versions (which
has never been promised in the past), you have to encode in a header
the kinds of things the server would need to parse it properly.
Starting with, but not necessarily limited to, the encoding server's
version.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com wrote:
 It works in enough cases atm that it's worthwile trying to keep it
 working. Sure, it could be better, but it's what we have right now. Atm
 it's e.g. the only realistic way to copy larger amounts of bytea between
 servers without copying the entire cluster.

That's the thing -- it might work today, but what about tomorrow?
We'd be sending the wrong signals.  People start building processes
around all of this and now we've painted ourselves into a box.  Better
in my mind to simply educate users that this practice is dangerous and
unsupported, as we used to do. I guess until now.  It seems completely
odd to me that we're attaching a case to the jsonb type, in the wrong
way -- something that we've never attached to any other type before.
For example, why didn't we attach a version code to the json type send
function?  Wasn't the whole point of this is that jsonb send/recv be
more spiritually closer to json?  If we want to introduce alternative
type formats in the 9.5 cycle, why can't we attach version based
encoding handling to *that* problem?

The more angles I look at this from the more it looks messy and rushed.

Notwithstanding all the above, I figure here enough smart people
disagree (once again, heh) to call it consensus.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com wrote:
 And if we add a new format version in 9.5 we need to make it discernible
 from the 9.4 format. Without space for a format indicator we'd have to
 resort to ugly tricks like defining the high bit in the first byte set
 indicates the new version. I don't see the improvement here.

Point being: a 9.5 binary format reading server could look for a magic
token in the beginning of the file which would indicate the presence
of a header.  The server could then make intelligent decisions about
reading data inside the file which would be follow exactly the same
kinds of decisions binary format consuming client code would make.
Perhaps it would be a simple check on version, or something more
complex that would involve a negotiation.  The 'format' indicator,
should version not be precise enough, needs to be in the header, not
passed with every instance of the data type, and certainly not for one
type in the absence of others.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Merlin Moncure mmonc...@gmail.com writes:
 On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
 And if we add a new format version in 9.5 we need to make it discernible
 from the 9.4 format. Without space for a format indicator we'd have to
 resort to ugly tricks like defining the high bit in the first byte set
 indicates the new version. I don't see the improvement here.

 Point being: a 9.5 binary format reading server could look for a magic
 token in the beginning of the file which would indicate the presence
 of a header.  The server could then make intelligent decisions about
 reading data inside the file which would be follow exactly the same
 kinds of decisions binary format consuming client code would make.
 Perhaps it would be a simple check on version, or something more
 complex that would involve a negotiation.  The 'format' indicator,
 should version not be precise enough, needs to be in the header, not
 passed with every instance of the data type, and certainly not for one
 type in the absence of others.

 Basically, you want to move the goalposts to somewhere that's not only
 out of reach today, but probably a few counties away from the stadium.
 I don't see this happening at all frankly, because nobody has been
 interested enough to work on something like it up to now.  And I
 definitely don't see it as appropriate to block improvement of jsonb
 until this happens.

That's completely unfair.  I'm arguing *not* to attach version
dependency expectations to the jsonb type, at all, not the other way
around.  If you want to do that, fine, but do it *later* as in, 9.5,
or beyond.  I just gave an example of how binary format changes could
be worked in later.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-10 Thread Merlin Moncure

On Monday, February 10, 2014, Andres Freund and...@2ndquadrant.com wrote:

 On 2014-02-10 19:01:48 -0600, Merlin Moncure wrote:
  On Mon, Feb 10, 2014 at 6:39 PM, Tom Lane t...@sss.pgh.pa.usjavascript:;
 wrote:
   Merlin Moncure mmonc...@gmail.com javascript:; writes:
   On Mon, Feb 10, 2014 at 6:24 PM, Andres Freund 
 and...@2ndquadrant.com javascript:; wrote:
   And if we add a new format version in 9.5 we need to make it
 discernible
   from the 9.4 format. Without space for a format indicator we'd have
 to
   resort to ugly tricks like defining the high bit in the first byte
 set
   indicates the new version. I don't see the improvement here.
  
   Point being: a 9.5 binary format reading server could look for a magic
   token in the beginning of the file which would indicate the presence
   of a header.  The server could then make intelligent decisions about
   reading data inside the file which would be follow exactly the same
   kinds of decisions binary format consuming client code would make.
   Perhaps it would be a simple check on version, or something more
   complex that would involve a negotiation.  The 'format' indicator,
   should version not be precise enough, needs to be in the header, not
   passed with every instance of the data type, and certainly not for one
   type in the absence of others.
  
   Basically, you want to move the goalposts to somewhere that's not only
   out of reach today, but probably a few counties away from the stadium.
   I don't see this happening at all frankly, because nobody has been
   interested enough to work on something like it up to now.  And I
   definitely don't see it as appropriate to block improvement of jsonb
   until this happens.
 
  That's completely unfair.  I'm arguing *not* to attach version
  dependency expectations to the jsonb type, at all, not the other way
  around.  If you want to do that, fine, but do it *later* as in, 9.5,
  or beyond.  I just gave an example of how binary format changes could
  be worked in later.

 Comeon. Your way requires building HEAPS of new and generic
 infrastructure in 9.5 and would only work for binary copy. The proposed
 way requires about two lines of code. Without the generic infrastructure
 we'd end up relying on some intracacies like the meaning of high bit in
 the first byte or such.

 Anyway, that's it on this subthread from me


Fair enough.  I'll concede the point.

merlin

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 12:44 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 send/recv functions are also needed for binary-format COPY. IMHO jsonb must
 have send/recv functions. All other built-in types have them, except for
 types like 'smgr', 'aclitem' and 'any*' that no-one should be using as
 column types.

Yes -- completely agree.  I also consider the hstore functionality (in
particular, searching and access operators) to be essential
functionality.

I'm actually surprised we have an alternate binary wire format for
jsonb at all; json is explicitly text and I'm not sure what the use
case of sending the internal structure is.  Meaning, maybe jsonb
send/recv should be a thin wrapper to sending the json string.   The
hstore send/recv I think properly covers the case where client side
binary wire format actors would want to manage performance critical
cases that want to avoid parsing.

On Wed, Feb 5, 2014 at 1:21 AM, Oleg Bartunov obartu...@gmail.com wrote:
 Andrew provided us more information and we'll work on recv. What
 people think about testing this stuff ?  btw, we don't have any
 regression test on this.

I'm intensely interested in this work; I consider it to be transformative.

I've *lightly* tested the jsonb/hstore functionality and so far
everything is working.

I still have concerns about the API.  Aside from the stuff I mentioned
upthread I find the API split between jsonb and hstore to be a little
odd; a lot of useful bits (for example, the @ operator) come via the
hstore type only.  So these types are joined at the hip for real work
which makes the diverging incomplete behaviors in functions like
populate_record() disconcerting.  Another point I'm struggling with is
what jsonb brings to the table that isn't covered either hstore or
json; working through a couple of cases I find myself not using the
jsonb functionality except as a 'hstore json formatter' which the json
type covers.  I'm probably being obtuse, but we have to be cautious
before plonking a couple of dozen extra functions in the public
schema.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 10:22 AM, Andrew Dunstan and...@dunslane.net wrote:
 I'm actually surprised we have an alternate binary wire format for
 jsonb at all; json is explicitly text and I'm not sure what the use
 case of sending the internal structure is.  Meaning, maybe jsonb
 send/recv should be a thin wrapper to sending the json string.   The
 hstore send/recv I think properly covers the case where client side
 binary wire format actors would want to manage performance critical
 cases that want to avoid parsing.

 The whole reason we have jsonb is to avoid reparsing where possible

Sure; but on the server side.  The wire format is for handling client
concerns.  For example, the case you're arguing for would be for libpq
client to extract as jsonb as binary, manipulate it on a binary level,
then send it back as binary.  I find this case to be something of a
stretch.

That being said, for binary dump/restore perhaps there's a performance
case to be made.

 In fact, I'd rather have the send and recv functions in the jsonb code and 
 have
 hstore's functions call them, so we don't duplicate code.

yeah.  Agree that there needs to be two sets of routines, not three.
I think a case could be made for the  jsonb type could take either
json's or hstore's depending on if the above. FWIW, either way is
fine.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 11:48 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 If we had infinite time/manpower, this wouldn't really be an issue.
 We don't, though, and so I suggest that this may be one of the better
 things to toss overboard.

The hstore send/recv functions have basically the same
(copy/pasted/name adjusted) implementation.  Since hstore will
presumably remain (as the current hstore is) 'deep binary' and all of
Andres's gripes apply to the hstore as well, this change buys us
precisely zap from a time perspective; it comes down to which is
intrinsically the better choice.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 1:03 PM, Josh Berkus j...@agliodbs.com wrote:
 On 02/05/2014 07:48 AM, Merlin Moncure wrote:
 Another point I'm struggling with is
 what jsonb brings to the table that isn't covered either hstore or
 json; working through a couple of cases I find myself not using the
 jsonb functionality except as a 'hstore json formatter' which the json
 type covers.  I'm probably being obtuse, but we have to be cautious
 before plonking a couple of dozen extra functions in the public
 schema.

 There's three reasons why it's worthwhile:

 1) user-friendliness: telling users they need to do ::JSON and
 ::HSTORE2 all the time is sufficiently annoying -- and prone to
 causing errors -- to be a blocker to adoption by a certain, very
 numerous, class of user.

That's a legitimate point of concern.  But in and of itself I'm sure
sure it warrants exposing a separate API.

 2) performance: to the extent that we can operate entirely in JSONB and
 not transform back and forth to JSON and HSTORE, function calls (and
 index lookups) will be much faster.  And given the competition, speed is
 important.

Not following this.  I do not see how the presence of jsonb helps at
all. Client to server communication will be text-binary (and vice
versa) and handling within the server itself will be in binary.  This
is the crux of my point.

 3) growth: 9.4's JSONB functions are a prerequisite to developing richer
 JSON querying capabilities in 9.5 and later, which will go beyond JSON
 formatting for HSTORE.

I kind of get this point.   But in lieu of a practical use case today,
what's the rush to implement?   I fully anticipate I'm out on left
field on this one (I have a cot and mini fridge there).  The question
on the table is: what use cases (performance included) does jsonb
solve that is not solve can't be solved without it?  With the possible
limited exception of andrew's yet to be delivered enhanced
deserialization routines, I can't think of any.  If presented with
reasonable evidence I'll shut my yap, pronto.

 Frankly, if it were entirely up to me HSTORE2 would be part of core and
 its only interface would be JSONB.  But it's not.  So this is a compromise.

I don't.  To be pedantic: hstore is in core, but packaged as an
extension.  That's a very important distinction.

In fact, I'll go further and say it seem wise for all SQL standard
type work to happen in extensions.  As long as it's an in core contrib
extension, I see no stigma to that whatsoever.  It's not clear at all
to me why the json type was put to the public schema and now we're
about to double down with jsonb.  Having things extension packaged
greatly eases concerns about future API changes because if problems
emerge it's not impossible to imagine compatibility extensions to
appear to bridge the gap if certain critical functions change.  That's
exactly the sort of thing that we may want to happen here, I think.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 2:37 PM, Andrew Dunstan and...@dunslane.net wrote:
 The time for this discussion was months ago. I would not have spent many
 many hours of my time if I thought it was going to be thrown away. I find
 this attitude puzzling, to say the least. You were a major part of the
 discussion when we said OK, we'll leave json as it is (text based) and add
 jsonb. That's exactly what we're doing.

certainly. I'll shut my yap; I understand your puzzlement.  At the
time though, I had assumed the API was going to incorporate more of
the hstore feature set than it did.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-05 Thread Merlin Moncure

On Wed, Feb 5, 2014 at 3:03 PM, Josh Berkus j...@agliodbs.com wrote:
 That was the original goal.  However, Oleg and Teodor's late delivery of
 Hstore2 limited what Andrew could do for JSONB before CF4 started.

yeah. anyways, I'm good on this point.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-02-03 Thread Merlin Moncure

On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote:
 +  para id=functions-json-table
 +   xref linkend=functions-json-creation-table shows the functions that 
 are
 +   available for creating typejson/type values.
 +   (see xref linkend=datatype-json)
/para

 -  table id=functions-json-table
 -titleJSON Support Functions/title
 +  indexterm
 +   primaryarray_to_json/primary
 +  /indexterm
 +  indexterm
 +   primaryrow_to_json/primary
 +  /indexterm
 +  indexterm
 +   primaryto_json/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_build_array/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_build_object/primary
 +  /indexterm
 +  indexterm
 +   primaryjson_object/primary
 +  /indexterm

 Hm, why are you collecting the indexterms at the top in the contrast to
 the previous way of collecting them at the point of documentation?

 diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
 index 1ae9fa0..fd93d9b 100644
 --- a/src/backend/utils/adt/Makefile
 +++ b/src/backend/utils/adt/Makefile
 @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o 
 array_typanalyze.o \
   tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
   tsvector.o tsvector_op.o tsvector_parser.o \
   txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
 - rangetypes_typanalyze.o rangetypes_selfuncs.o
 + rangetypes_typanalyze.o rangetypes_selfuncs.o \
 + jsonb.o jsonb_support.o

 Odd, most OBJS lines are kept in alphabetical order, but that doesn't
 seem to be the case here.

 +/*
 + * for jsonb we always want the de-escaped value - that's what's in token
 + */
 +

 strange newline.

 +static void
 +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype)
 +{
 + JsonbInState *_state = (JsonbInState *) state;
 + JsonbValue  v;
 +
 + v.size = sizeof(JEntry);
 +
 + switch (tokentype)
 + {
 +
 ...

 + default:/* nothing else should 
 be here in fact */
 + break;

 Shouldn't this at least Assert(false) or something?

 +static void
 +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c)
 +{
 + uint32  hentry = c  JENTRY_TYPEMASK;
 +
 + if (hentry == JENTRY_ISNULL)
 + {
 + v-type = jbvNull;
 + v-size = sizeof(JEntry);
 + }
 + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || 
 hentry == JENTRY_ISCALAR)
 + {
 + recvJsonb(buf, v, level + 1, (uint32) c);
 + }
 + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE)
 + {
 + v-type = jbvBool;
 + v-size = sizeof(JEntry);
 + v-boolean = (hentry == JENTRY_ISFALSE) ? false : true;
 + }
 + else if (hentry == JENTRY_ISNUMERIC)
 + {
 + v-type = jbvNumeric;
 + v-numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, 
 PointerGetDatum(buf),
 +
 Int32GetDatum(0), Int32GetDatum(-1)));
 +
 + v-size = sizeof(JEntry) * 2 + VARSIZE_ANY(v-numeric);

 What's the *2 here?

 +static void
 +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header)
 +{
 + uint32  hentry;
 + uint32  i;

 This function and recvJsonbValue call each other recursively, afaics
 without any limit, shouldn't they check for the stack depth?

 + hentry = header  JENTRY_TYPEMASK;
 +
 + v-size = 3 * sizeof(JEntry);

 *3?

 + if (hentry == JENTRY_ISOBJECT)
 + {
 + v-type = jbvHash;
 + v-hash.npairs = header  JB_COUNT_MASK;
 + if (v-hash.npairs  0)
 + {
 + v-hash.pairs = palloc(sizeof(*v-hash.pairs) * 
 v-hash.npairs);
 +

 Hm, if I understand correctly, we're just allocating JB_COUNT_MASK
 (which is 0x0FFF) * sizeof(*v-hash.pairs) bytes here, without any
 crosschecks about the actual length of the data? So with a few bytes the
 server can be coaxed to allocate a gigabyte of data?
 Since this immediately calls another input routine, this can be done in
 a nested fashion, quickly OOMing the server.

 I think this and several other places really need a bit more input
 sanity checking.

 + for (i = 0; i  v-hash.npairs; i++)
 + {
 + recvJsonbValue(buf, v-hash.pairs[i].key, 
 level, pq_getmsgint(buf, 4));
 + if (v-hash.pairs[i].key.type != jbvString)
 + elog(ERROR, jsonb's key could be only 
 a string);

 Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a
 few other places.

 +char *
 +JsonbToCString(StringInfo out, char *in, int estimated_len)
 +{
 + boolfirst = true;
 + JsonbIterator *it;
 + int type;
 + JsonbValue

Re: [HACKERS] jsonb and nested hstore

2014-01-31 Thread Merlin Moncure

On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com wrote:
 Hmm,
 neither me, nor Teodor have experience and knowledge with
 populate_record() and moreover hstore here is virgin and we don't know
 the right behaviour, so I think we better take it from jsonb, once
 Andrew realize it. Andrew ?

Andrew Gierth wrote the current implementation of htsore
populate_record IIRC.  Unfortunately the plan for jsonb was to borrow
hstore's (I don't think hstore can use the jsonb implementation
because you'd be taking away the ability to handle internally nested
structures it currently has).  Of my two complaints upthread, the
second one, not being able to populate from and internally well formed
structure, is by far the more serious one I think.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-31 Thread Merlin Moncure

On Fri, Jan 31, 2014 at 8:45 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/31/2014 08:57 AM, Merlin Moncure wrote:

 On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com
 wrote:

 Hmm,
 neither me, nor Teodor have experience and knowledge with
 populate_record() and moreover hstore here is virgin and we don't know
 the right behaviour, so I think we better take it from jsonb, once
 Andrew realize it. Andrew ?

 Andrew Gierth wrote the current implementation of htsore
 populate_record IIRC.  Unfortunately the plan for jsonb was to borrow
 hstore's (I don't think hstore can use the jsonb implementation
 because you'd be taking away the ability to handle internally nested
 structures it currently has).  Of my two complaints upthread, the
 second one, not being able to populate from and internally well formed
 structure, is by far the more serious one I think.



 Umm, I think at least one of us is seriously confused.

 I am going to look at dealing with these issues in a way that can be used by
 both - at least the populate_record case.

 As far as populate_record goes, there is a bit of an impedance mismatch,
 since json/hstore records are heterogenous and one-dimensional, whereas sql
 arrays are homogeneous and multidimensional. Right now I am thinking I will
 deal with arrays up to two dimensions, because I can do that relatively
 simply, and after that throw in the towel. That will surely deal with 99.9%
 of use cases. Of course this would be documented.

 Anyway, Let me see what I can do.

 If Andrew Gierth wants to have a look at fixing the hstore() side that might
 help speed things up.

(ah, you beat me to it.)

Disregard my statements above. It works.

postgres=# select jsonb_populate_record(null::x, hstore(row(1,
array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb);
jsonb_populate_record
-
 
(1,{(1,\\{(1,{1,2})}\\)})

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-31 Thread Merlin Moncure

On Fri, Jan 31, 2014 at 9:26 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/31/2014 09:53 AM, Merlin Moncure wrote:

 On Fri, Jan 31, 2014 at 8:45 AM, Andrew Dunstan and...@dunslane.net
 wrote:

 On 01/31/2014 08:57 AM, Merlin Moncure wrote:

 On Fri, Jan 31, 2014 at 4:03 AM, Oleg Bartunov obartu...@gmail.com
 wrote:

 Hmm,
 neither me, nor Teodor have experience and knowledge with
 populate_record() and moreover hstore here is virgin and we don't know
 the right behaviour, so I think we better take it from jsonb, once
 Andrew realize it. Andrew ?

 Andrew Gierth wrote the current implementation of htsore
 populate_record IIRC.  Unfortunately the plan for jsonb was to borrow
 hstore's (I don't think hstore can use the jsonb implementation
 because you'd be taking away the ability to handle internally nested
 structures it currently has).  Of my two complaints upthread, the
 second one, not being able to populate from and internally well formed
 structure, is by far the more serious one I think.


 Umm, I think at least one of us is seriously confused.

 I am going to look at dealing with these issues in a way that can be used
 by
 both - at least the populate_record case.

 As far as populate_record goes, there is a bit of an impedance mismatch,
 since json/hstore records are heterogenous and one-dimensional, whereas
 sql
 arrays are homogeneous and multidimensional. Right now I am thinking I
 will
 deal with arrays up to two dimensions, because I can do that relatively
 simply, and after that throw in the towel. That will surely deal with
 99.9%
 of use cases. Of course this would be documented.

 Anyway, Let me see what I can do.

 If Andrew Gierth wants to have a look at fixing the hstore() side that
 might
 help speed things up.

 (ah, you beat me to it.)

 Disregard my statements above. It works.

 postgres=# select jsonb_populate_record(null::x, hstore(row(1,
 array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb);
  jsonb_populate_record

 -

 (1,{(1,\\{(1,{1,2})}\\)})


 Actually, there is a workaround to the limitations of hstore(record):

yeah I'm ok with hstore() function as it is.  That also eliminates
backwards compatibility concerns so things worked out.  The only 'must
fix' 9.4 facing issue I see on the table is to make sure jsonb
populate function is forward compatible with future expectations of
behavior which to me means zeroing in on the necessity of the as_text
argument (but if you can expand coverage without jeopardizing 9.4
inclusion than great...).

For my part I'm going to continue functionally testing the rest of the
API (so far, a cursory look hasn't turned up anything else).  I'm also
signing up for some documentation refinements which will be done after
you nail down these little bits but before the end of the 'fest.

IMNSHO, formal code review needs to begin ASAP (salahaldin is the
reviewer per the fest wiki)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-30 Thread Merlin Moncure

On Thu, Jan 30, 2014 at 9:50 AM, Andrew Dunstan and...@dunslane.net wrote:
 Now, if we're agreed on that, I then also wonder if the 'as_text'
 argument needs to exist at all for the populate functions except for
 backwards compatibility on the json side (not jsonb).  For non-complex
 structures it does best effort casting anyways so the flag is moot.


 Well, I could certainly look at making the populate_record{set} and
 to_record{set} logic handle types that are arrays or composites inside the
 record. It might not be terribly hard to do - not sure.

 A quick analysis suggests that this is fixable with fairly minimal
 disturbance in the jsonb case. In the json case it would probably involve
 reparsing the inner json. That's probably doable, because the routines are
 all reentrant, but not likely to be terribly efficient. It will also be a
 deal more work.

Right.  Also the text json functions are already in the wild anyways
-- that's not in the scope of this patch so if they need to be fixed
that could be done later.

ISTM then the right course of action is to point jsonb 'populate'
variants at hstore implementation, not the text json one and remove
the 'as text' argument.  Being able to ditch that argument is the main
reason why I think this should be handled now (not forcing hstore
dependency to handle complex json is gravy).

People handling json as text would then invoke a ::jsonb cast trading
off performance for flexibility which is perfectly fine.  If you
agree, perhaps we can HINT the error in certain places that return
ERROR:  cannot call json_populate_record on a nested object that the
jsonb variant can be used as a workaround.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-30 Thread Merlin Moncure

On Thu, Jan 30, 2014 at 1:07 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/29/2014 04:56 PM, Andrew Dunstan wrote:


 On 01/29/2014 01:03 PM, Andrew Dunstan wrote:


 On 01/27/2014 10:43 PM, Andrew Dunstan wrote:


 On 01/26/2014 05:42 PM, Andrew Dunstan wrote:


 Here is the latest set of patches for nested hstore and jsonb.

 Because it's so large I've broken this into two patches and compressed
 them. The jsonb patch should work standalone. The nested hstore patch
 depends on it.

 All the jsonb functions now use the jsonb API - there is no more
 turning jsonb into text and reparsing it.

 At this stage I'm going to be starting cleanup on the jsonb code
 (indentation, error messages, comments etc.) as well get getting up some
 jsonb docs.





 Here is an update of the jsonb part of this. Charges:

  * there is now documentation for jsonb
  * most uses of elog() in json_funcs.c are replaced by ereport().
  * indentation fixes and other tidying.

 No changes in functionality.



 Further update of jsonb portion.

 Only change in functionality is the addition of casts between jsonb and
 json.

 The other changes are the merge with the new json functions code, and
 rearrangement of the docs changes to make them less ugly. Essentially I
 moved the indexterm tags right out of the table as is done in some other
 parts pf the docs. That makes the entry tags much clearer to read.





 Updated to apply cleanly after recent commits.



 Updated  patches for both pieces. Included is some tidying done by Teodor,
 and fixes for remaining whitespace issues. This now passes git diff --check
 master cleanly for me.

Something seems off:

postgres=# create type z as (a int, b int[]);
CREATE TYPE
postgres=# create type y as (a int, b z[]);
CREATE TYPE
postgres=# create type x as (a int, b y[]);
CREATE TYPE

-- test a complicated construction
postgres=# select row(1, array[row(1, array[row(1, array[1,2])::z])::y])::x;
 row
-
 
(1,{(1,\\{(1,{1,2})}\\)})

postgres=# select hstore(row(1, array[row(1, array[row(1,
array[1,2])::z])::y])::x);
hstore
--
 a=1, 
b={\(1,\\\{\\(1,\\{1,2}\\)\\}\\\)\}

here, the output escaping has leaked into the internal array
structures.  istm we should have a json expressing the internal
structure.  It does (weirdly) map back however:

postgres=# select populate_record(null::x, hstore(row(1, array[row(1,
array[row(1, array[1,2])::z])::y])::x));
   populate_record
-
 
(1,{(1,\\{(1,{1,2})}\\)})


OTOH, if I go via json route:

postgres=# select row_to_json(row(1, array[row(1, array[row(1,
array[1,2])::z])::y])::x);
  row_to_json
---
 {a:1,b:[{a:1,b:[{a:1,b:[1,2]}]}]}


so far, so good.  let's push to hstore:
postgres=# select row_to_json(row(1, array[row(1, array[row(1,
array[1,2])::z])::y])::x)::jsonb::hstore;
  row_to_json
---
 a=1, b=[{a=1, b=[{a=1, b=[1, 2]}]}]

this ISTM is the 'right' behavior.  but what if we bring it back to
record object?

postgres=# select populate_record(null::x, row_to_json(row(1,
array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb::hstore);
ERROR:  malformed array literal: {{a=1, b={{a=1, b={1, 2}

yikes. The situation as I read it is that (notwithstanding my comments
upthread) there is no clean way to slide rowtypes to/from hstore and
jsonb while preserving structure.  IMO, the above query should work
and the populate function record above should return the internally
structured row object, not the text escaped version.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-30 Thread Merlin Moncure

On Thu, Jan 30, 2014 at 4:52 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/30/2014 07:21 PM, Merlin Moncure wrote:
 postgres=# select hstore(row(1, array[row(1, array[row(1,
 array[1,2])::z])::y])::x);
  hstore

 --
   a=1,
 b={\(1,\\\{\\(1,\\{1,2}\\)\\}\\\)\}

 here, the output escaping has leaked into the internal array
 structures.  istm we should have a json expressing the internal
 structure.

 What has this to do with json at all? It's clearly a failure in the hstore()
 function.

yeah -- meant to say 'hstore' there.  Also I'm not sure that it's
'wrong'; it's just doing what it always did.  That brings up another
point: are there any interesting cases of compatibility breakage?  I'm
inclined not to care about this particular case  though...

 array[row(1, array[row(1, array[1,2])::z])::y])::x)::jsonb::hstore);
 ERROR:  malformed array literal: {{a=1, b={{a=1, b={1,
 2}

 yikes. The situation as I read it is that (notwithstanding my comments
 upthread) there is no clean way to slide rowtypes to/from hstore and
 jsonb while preserving structure.  IMO, the above query should work
 and the populate function record above should return the internally
 structured row object, not the text escaped version.



 And this is a failure in populate_record().

 I think we possibly need to say that handling of nested composites and
 arrays is an area that needs further work. OTOH, the refusal of
 json_populate_record() and json_populate_recordset() to handle these in 9.3
 has not generated a flood of complaints, so I don't think it's a tragedy,
 just a limitation, which should be documented if it's not already. (And of
 course hstore hasn't handled nested anything before now.)

 Meanwhile, maybe Teodor can fix the two hstore bugs shown here.

While not a flood, there certainly have been complaints.  See
http://postgresql.1045698.n5.nabble.com/Best-way-to-populate-nested-composite-type-from-JSON-td5770566.html
http://osdir.com/ml/postgresql-pgsql-general/2014-01/msg00205.html

But, if we had to drop this in the interests of time I'd rather see
the behavior cauterized off so that it errored out 'not supported' (as
json_populate does) that attempt to implement the wrong behavior.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Use MAP_HUGETLB where supported (v3)

2014-01-29 Thread Merlin Moncure

On Tue, Jan 28, 2014 at 5:58 AM, Christian Kruse
christ...@2ndquadrant.com wrote:
 Hi,

 On 28/01/14 13:51, Heikki Linnakangas wrote:
 Oh darn, I remembered we had already committed this, but clearly not. I'd
 love to still get this into 9.4. The latest patch (hugepages-v5.patch) was
 pretty much ready for commit, except for documentation.

 I'm working on it. I ported it to HEAD and currently doing some
 benchmarks. Next will be documentation.

you mentioned benchmarks -- do you happen to have the results handy? (curious)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb generation functions

2014-01-29 Thread Merlin Moncure

On Wed, Jan 29, 2014 at 9:08 AM, Andrew Dunstan and...@dunslane.net wrote:
 In the jsonb patch I have been working on, I have replicated all of what I
 call the json processing functions, and I will shortly add analogs for the
 new functions in that category json_to_record and json_to_recordset.

 However I have not replicated what I call the json generation functions,
 array_to_json, row_to_json, to_json, and the new functions json_build_array,
 json_build_object, and json_object, nor the aggregate functions json_agg and
 the new json_object_agg. The reason for that is that I have always used
 those for constructing json given to the client, rather than json stored in
 the database, and for such a use there would be no point in turning it into
 jsonb rather than generating the json string directly.

 However, I could be persuaded that we should have a jsonb analog of every
 json function. If we decide that, the next question is whether we have to
 have it now, or if it can wait.

my 0.02$: it can wait -- possibly forever.  Assuming the casts work I
see absolutely no issue whatsover asking users to do:

select xx_to_json(something complex)::jsonb;

If you examine all the use cases json and jsonb, while they certainly
have some overlap, are going to be used in different patterns.  In
hindsight the type bifurcation was a good thing ISTM.

I don't think there should be any expectation for 100% match of the
API especially since you can slide things around with casts.  In fact,
for heavy serialization at the end of the day it might be better to
defer the jsonb creation to the final stage of serialization anyways
(avoiding iterative insertion) even if we did match the API.

(can't hurt to manage scope either considering the timelines for 9.4)

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-29 Thread Merlin Moncure

On Wed, Jan 29, 2014 at 12:03 PM, Andrew Dunstan and...@dunslane.net wrote:
 Only change in functionality is the addition of casts between jsonb and
 json.

 The other changes are the merge with the new json functions code, and
 rearrangement of the docs changes to make them less ugly. Essentially I
 moved the indexterm tags right out of the table as is done in some other
 parts pf the docs. That makes the entry tags much clearer to read.

I think the opening paragraphs contrasting json/jsonb be needs
refinement.  json is going to be slightly faster than jsonb for input
*and* output.  For example, in one application I store fairly large
json objects containing pre-compiled static polygon data that is
simply flipped up to google maps.  This case will likely be pessimal
for jsonb.  For the next paragaph, I'd like to expand it a bit on
'specialized needs' and boil it down to specific uses cases.
Basically, json will likely be more compact in most cases and slightly
faster for input/output;  jsonb would be preferred in any context
where processing, or searching or extensive server side parsing is
employed.

If you agree, I'd be happy to do that...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-29 Thread Merlin Moncure

On Wed, Jan 29, 2014 at 3:56 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/29/2014 01:03 PM, Andrew Dunstan wrote:


 On 01/27/2014 10:43 PM, Andrew Dunstan wrote:


 On 01/26/2014 05:42 PM, Andrew Dunstan wrote:


 Here is the latest set of patches for nested hstore and jsonb.

 Because it's so large I've broken this into two patches and compressed
 them. The jsonb patch should work standalone. The nested hstore patch
 depends on it.

 All the jsonb functions now use the jsonb API - there is no more turning
 jsonb into text and reparsing it.

 At this stage I'm going to be starting cleanup on the jsonb code
 (indentation, error messages, comments etc.) as well get getting up some
 jsonb docs.





 Here is an update of the jsonb part of this. Charges:

  * there is now documentation for jsonb
  * most uses of elog() in json_funcs.c are replaced by ereport().
  * indentation fixes and other tidying.

 No changes in functionality.



 Further update of jsonb portion.

 Only change in functionality is the addition of casts between jsonb and
 json.

 The other changes are the merge with the new json functions code, and
 rearrangement of the docs changes to make them less ugly. Essentially I
 moved the indexterm tags right out of the table as is done in some other
 parts pf the docs. That makes the entry tags much clearer to read.

 Updated to apply cleanly after recent commits.

ok, great.  This is really fabulous.  So far most everything feels
natural and good.

I see something odd in terms of the jsonb use case coverage.  One of
the major headaches with json deserialization presently is that
there's no easy way to easily move a complex (record- or array-
containing) json structure into a row object.  For example,

create table bar(a int, b int[]);
postgres=# select jsonb_populate_record(null::bar, '{a: 1, b:
[1,2]}'::jsonb, false);
ERROR:  cannot populate with a nested object unless use_json_as_text is true

If find the use_json_as_text argument here to be pretty useless
(unlike in the json_build to_record variants where it least provides
some hope for an escape hatch) for handling this since it will just
continue to fail:

postgres=# select jsonb_populate_record(null::bar, '{a: 1, b:
[1,2]}'::jsonb, true);
ERROR:  missing ] in array dimensions

OTOH, the nested hstore handles this no questions asked:

postgres=# select * from populate_record(null::bar, 'a=1,
b={1,2}'::hstore);
 a |   b
---+---
 1 | {1,2}

So, if you need to convert a complex json to a row type, the only
effective way to do that is like this:
postgres=# select* from  populate_record(null::bar, '{a: 1, b:
[1,2]}'::json::hstore);
 a |   b
---+---
 1 | {1,2}

Not a big deal really. But it makes me wonder (now that we have the
internal capability of properly mapping to a record) why *both* the
json/jsonb populate record variants shouldn't point to what the nested
hstore behavior is when the 'as_text' flag is false.  That would
demolish the error and remove the dependency on hstore in order to do
effective rowtype mapping.  In an ideal world the json_build
'to_record' variants would behave similarly I think although there's
no existing hstore analog so I'm assuming it's a non-trival amount of
work.

Now, if we're agreed on that, I then also wonder if the 'as_text'
argument needs to exist at all for the populate functions except for
backwards compatibility on the json side (not jsonb).  For non-complex
structures it does best effort casting anyways so the flag is moot.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Merlin Moncure

On Mon, Jan 27, 2014 at 9:43 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/26/2014 05:42 PM, Andrew Dunstan wrote:


 Here is the latest set of patches for nested hstore and jsonb.

 Because it's so large I've broken this into two patches and compressed
 them. The jsonb patch should work standalone. The nested hstore patch
 depends on it.

 All the jsonb functions now use the jsonb API - there is no more turning
 jsonb into text and reparsing it.

 At this stage I'm going to be starting cleanup on the jsonb code
 (indentation, error messages, comments etc.) as well get getting up some
 jsonb docs.





 Here is an update of the jsonb part of this. Charges:

  * there is now documentation for jsonb
  * most uses of elog() in json_funcs.c are replaced by ereport().
  * indentation fixes and other tidying.

 No changes in functionality.

Don't have time to fire it up this morning, but a quick scan of the
patch turned up a few minor things:

* see a comment typo, line 290 'jsonn':
* line 332: 'bogus input' -- is this up to error reporting standards?
 How about value 'x' must be one of array, object, numeric, string,
bool?
* line 357: jsonb's key could be only a string prefer non
possessive: jsonb keys must be a string
* line 374, 389: ditto 332
* line 513: is panic appropriate here?
* line 599: ditto
* line 730: odd phrasing in comment, also commenting on this function
is a little light
* line 807: slightly prefer 'with respect to'
* line 888: another PANIC: these maybe correct, seems odd to halt
server on corrupted datum though*
* line 1150: hm, is the jsonb internal hash structure documented?
Aside: why didn't we use standard hash table (performance maybe)?
* line 1805-6: poor phrasing.  How about: it will order and make
unique the hash keys.  Otherwise we believe that pushed keys are
ordered and unique.  (Don't like verbed 'unqiue').
* line 1860: no break here: 


merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Merlin Moncure

 Looks like this review is against jsonb-5, not jsonb-6.

oh yep -- shoot, sorry for the noise.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Merlin Moncure

On Tue, Jan 28, 2014 at 10:46 AM, Andrew Dunstan and...@dunslane.net wrote:

 On 01/28/2014 11:29 AM, Tom Lane wrote:

 Andrew Dunstan and...@dunslane.net writes:

 The problem is not the indexterm element, it's the space that might
 exist outside it. Are we using block level elements like para inside
 entry elements anywhere else?

 Probably not, and I wonder why you're trying to.  Whole paras inside
 a table entry (this is a table no?) don't sound like they are going
 to lead to nice-looking results.

 See http://developer.postgresql.org/~adunstan/functions-json.html

yeah. note: I think the json documentation needs *major* overhaul. too
much is going in inside the function listings where there really
should be a big breakout discussing the big picture of json/jsonb
with examples of various use cases.  I want to give it a shot but
unfortunately can not commit to do that by the end of the 'fest.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb and nested hstore

2014-01-28 Thread Merlin Moncure

On Tue, Jan 28, 2014 at 12:09 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/28/2014 09:58 AM, Merlin Moncure wrote:
 yeah. note: I think the json documentation needs *major* overhaul. too
 much is going in inside the function listings where there really
 should be a big breakout discussing the big picture of json/jsonb
 with examples of various use cases.  I want to give it a shot but
 unfortunately can not commit to do that by the end of the 'fest.

 FWIW, I've promised Andrew that I'll overhaul this by the end of beta.
 Given that we have all of beta for doc refinements.

 In addition to this, the JSON vs JSONB datatype page really needs
 expansion and clarification.

right: exactly.  I'd be happy to help (such as I can) ...I wanted to
see if jsonb to make it in on this 'fest (doc issues notwithstanding);
it hasn't been formally reviewed yet AFAICT.  So my thinking here is
to get docs to minimum acceptable standards in the short term and
focus on the structural code issues for the 'fest (if jsonb slips then
it's moot obviously).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new json funcs

2014-01-27 Thread Merlin Moncure

On Fri, Jan 24, 2014 at 3:26 PM, Josh Berkus j...@agliodbs.com wrote:
 On 01/24/2014 12:59 PM, Andrew Dunstan wrote:

 On 01/24/2014 03:40 PM, Laurence Rowe wrote:
 For consistency with the existing json functions (json_each,
 json_each_text, etc.) it might be better to add separate
 json_to_record_text and json_to_recordset_text functions in place of
 the nested_as_text parameter to json_to_record and json_to_recordset.



 It wouldn't be consistent with json_populate_record() and
 json_populate_recordset(), the two closest relatives, however.

 And yes, I appreciate that we have not been 100% consistent. Community
 design can be a bit messy that way.

 FWIW, I prefer the parameter to having differently named functions.

+1.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-13 Thread Merlin Moncure

On Sun, Jan 12, 2014 at 4:38 AM, Craig Ringer cr...@2ndquadrant.com wrote:
 Implicit casts to text, anybody?

This backward compatibility break orphaned the company I work for on
8.1 until last year and very nearly caused postgres to be summarily
extirpated (only rescued at the last minute by my arrival). It cost
hundreds of thousands of dollars to qualify a sprawling java code base
so that it could be moved back into a supported version.  Breaking
compatibility sucks -- it hurts your users and costs people money.
Hacking type casts may not have been a mistake, but the arbitrary
introduction of the breakage certainly was.

This project has no deprecation policy, and I'd argue we'd need one
before considering breaking changes.  For example, maybe we could pull
out an occasional release for longer term support to help users that
caught out.   But really, the better way to go IMNSHO is to take a
hard line on compatibility issues pretty much always -- consider the
case of libc and win32 api.  There are certain limited exceptions to
this rule -- for example security problems or gross violations of the
standard (bringing row-wise comparison to spec comes to mind as an
example of that).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-13 Thread Merlin Moncure

On Mon, Jan 13, 2014 at 3:45 PM, David Fetter da...@fetter.org wrote:
 On Mon, Jan 13, 2014 at 10:40:57AM -0600, Merlin Moncure wrote:
 This project has no deprecation policy,

 I believe it actually does, although it's not a formal, written
 policy.  Would you like to help draft one up?

Lack of 'formal, written, policy' is equivalent to 'no policy'.
Regardless, the way things we done in the 7.x/8.x series may no longer
apply today; the project has grown up and we need to be more serious
about things, at least, IMNSHO.

 and I'd argue we'd need one
 before considering breaking changes.  For example, maybe we could pull
 out an occasional release for longer term support to help users that
 caught out.   But really, the better way to go IMNSHO is to take a
 hard line on compatibility issues pretty much always -- consider the
 case of libc and win32 api.

 Could you please help remind us what that was?

Let's take gets() for example.  C11 finally ditched it 12 years (!)
after it was formally deprecated in C99 and informally deprecate in
endless man pages (don't use this!) for decades before that.  And
even then most compilers, at least the decent ones, should allow to
request previous standards for some time beyond that.  The win32 API
is also remarkably stable; ancient code written for it beyond the dim
horizon of time will still compile and execute today.  These are
probably strong contenders for most popular APIs ever made -- see the
connection?  Now, comparing C APIs to an SQL implementation for
deprecation purposes isn't quite applies to apples, but I'll stand by
the analogy.

 or gross violations of the standard

 We're definitely there on lower bounds of arrays.  The standard, for a
 wonder, is clear and unambiguous about them.  Whether we should go
 there on the rest of our array implementation is a question for
 another thread.

The SQL standard requests that standard syntax gives standard
behavior.  Alternate bounds is non-standard syntax giving non-standard
behavior and is thus excepted.  Naturally, non-standard syntax is
dangerous because the standard may later implement it in which case
you then have a real problem (that may be the case here: I don't
know).  Our array implementation is a real mess on multiple levels but
at least it's an internally consistent mess.  Maybe it really should
be 'fixed', but not before the super un-fun discussion of how to ease
the path for our hapless users happens first.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure

On Fri, Jan 10, 2014 at 2:04 AM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 10 January 2014 00:36, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 1:20 AM, Merlin Moncure wrote:

 I'm piling on: it's not clear at all to me why you've special cased
 this to lower_bound=1.  First of all, there are other reasons to check
 length than iteration.


 Yes, I agree. A length function that returned 0 for empty arrays would
 be far from useless.


 Can you point me to some examples?


 The example I see all the time is code like

 if array_length(nodes, 1)  5 then
 ... do something ...

 then you realise (or not as the case may be) that this doesn't work
 for empty arrays, and have to remember to wrap it in a coalesce call.

 Simply being able to write

 if cardinality(nodes)  5 then
... do something ...

 is not just shorter, easier to type and easier to read, it is far less
 likely to be the source of subtle bugs.

right -- exactly.  or, 'ORDER BY cardinatility(nodes)', etc etc.
Furthermore, we already have pretty good support for iteration with
arrays via unnest().  What's needed for better iteration support (IMO)
is a function that does what unnest does but returns an array on
indexes (one per dimsension) -- a generalization of the
_pg_expandarray function.  Lets' say 'unnest_dims'.  'unnest_dims' is
non-trivial to code in user land while 'array_length' is an extremely
trivial wrapper to array_upper().

cardinality() (which is much better name for the function IMSNHO)
gives a*b*c values say for a 3d array also does something non-trivial
*particularly in the case of offset arrays*.

On Fri, Jan 10, 2014 at 3:36 AM, Marko Tiikkaja ma...@joh.to wrote:
 I guess what I truly want is a less generic type that's like an array, but 
 always one-dimensional with a lower bound of 1.

Your function would be the only one in the array API that implemented
special behaviors like that.  That's suggests to me that the less
generic function belongs in user land, not in the core array API.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure

On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 10:41 AM, Merlin Moncure wrote:

 What's needed for better iteration support (IMO)
 is a function that does what unnest does but returns an array on
 indexes (one per dimsension) -- a generalization of the
 _pg_expandarray function.  Lets' say 'unnest_dims'.


 So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1,
 '{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a
 functionality I've considered us to have been missing for a long time.

not quite.  it returns int[], anyelement: so, using your example, you'd get:

[1,1], 1
[1,2], 2
[2,1], 3
[2,2], 4

like unnest() it would fully decompose the array do individual
elements.  what you have above slices the array which is useful,but
probably shouldn't live under the 'unnest' name -- perhaps 'slice'.
Pavel added it to pl/pgsql under the FOREACH syntax (FYI).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-10 Thread Merlin Moncure

On Fri, Jan 10, 2014 at 6:00 AM, Florian Pflug f...@phlo.org wrote:
 On Jan10, 2014, at 11:00 , Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 3:52 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/10/14, 10:41 AM, Merlin Moncure wrote:

 What's needed for better iteration support (IMO)
 is a function that does what unnest does but returns an array on
 indexes (one per dimsension) -- a generalization of the
 _pg_expandarray function.  Lets' say 'unnest_dims'.


 So  unnest_dims('{{1,2},{3,4}}'::int[])  would return  VALUES (1,
 '{1,2}'::int[]), (2, '{3,4}'::int[])?  If so, then yes, that's a
 functionality I've considered us to have been missing for a long time.

 not quite.  it returns int[], anyelement: so, using your example, you'd get:

 [1,1], 1
 [1,2], 2
 [2,1], 3
 [2,2], 4

 Now that we have WITH ORDINALITY, it'd be sufficient to have a
 variant of array_dims() that returns int[][] instead of text, say
 array_dimsarray(). Your unnest_dims could then be written as

   unnest(array_dimsarray(array)) with ordinality

hm, not quite following that.  maybe an example?

my issue with 'WITH ORDINALITY' (while it's pretty neat) is that it
doesn't give you the dimension coordinate of each datum so you can't
really use it to slice.  with unnest_dims(), you an slice, say via:

select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[])
group by dims[1];
or
select array_agg(value) from (unnest_dims('{{1,2},{3,4}}'::int[])
where dims[1] = 2;

not super elegant, but good enough for most uses I think.   anyways,
getting back on topic, the question on the table is cardinality() vs
array_length, right?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Disallow arrays with non-standard lower bounds

2014-01-10 Thread Merlin Moncure

On Fri, Jan 10, 2014 at 4:10 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Fri, Jan 10, 2014 at 1:26 PM, Jim Nasby j...@nasby.net wrote:

 On 1/9/14, 10:58 PM, Tom Lane wrote:

 Jim Nasby j...@nasby.net writes:

 ISTM that allowing users to pick arbitrary lower array bounds was a huge
 mistake. I've never seen anyone make use of it, can't think of any
 legitimate use cases for it, and hate the stupendous amount of extra code
 needed to deal with it.


 You lack imagination, sir.


 Considering what you'd normally want to do in SQL, the only example I can
 think of is to not have the argument over 0 vs 1 based.

 Actually, I was thinking there might be some computational problems where
 changing lower bound would be nice, but then again, what other languages
 actually support this?

 Perl does, though they regret it bitterly.

What does it matter?  Our arrays have had the capability for years and
years and because it's cleaner is simply not justification to break
people's applications.  Why are we even considering this?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] array_length(anyarray)

2014-01-09 Thread Merlin Moncure

On Thu, Jan 9, 2014 at 11:08 AM, Marko Tiikkaja ma...@joh.to wrote:
 On 1/9/14 5:44 PM, Florian Pflug wrote:

 On Jan9, 2014, at 14:57 , Dean Rasheed dean.a.rash...@gmail.com wrote:

 On 19 December 2013 08:05, Pavel Stehule pavel.steh...@gmail.com wrote:

 length should be irrelevant to fact so array starts from 1, 0 or
 anything
 else


 Yes, this should just return the number of elements, and 0 for an empty
 array.


 +1. Anything that complains about arrays whose lower bound isn't 1 really
 needs a *way* less generic name than array_length().


 Problem is, if you're operating on an array which could have a lower bound
 that isn't 1, why would you look at the length in the first place?  You
 can't access any elements by index, you'd need to look at array_lower().
 You can't iterate over the array by index, you'd need to do  array_lower()
 .. array_lower() + array_length(), which doesn't make sense.  And then
 there's the myriad of stuff you can do with unnest() without actually having
 to look at the length.  Same goes for multi-dimensional arrays: you have
 even less things you can do there with only a length.

I'm piling on: it's not clear at all to me why you've special cased
this to lower_bound=1.  First of all, there are other reasons to check
length than iteration.  If you want your code to blow up with non 1
based array, that should be checked in userland I think (perhaps with
a constraint); the server API function should implement as many
reasonable behaviors as possible.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2014 at 9:46 AM, Florian Weimer fwei...@redhat.com wrote:
 On 01/03/2014 04:20 PM, Tom Lane wrote:

 I think Florian has a good point there, and the reason is this: what
 you are talking about will be of exactly zero use to applications that
 want to see the results of one query before launching the next.  Which
 eliminates a whole lot of apps.  I suspect that almost the *only*
 common use case in which a stream of queries can be launched without
 feedback is going to be bulk data loading.  It's not clear at all
 that pipelining the PQexec code path is the way to better performance
 for that --- why not use COPY, instead?


 The data I encounter has to be distributed across multiple tables. Switching
 between the COPY TO commands would again need client-side buffering and
 heuristics for sizing these buffers.  Lengths of runs vary a lot in my case.

 I also want to use binary mode as a far as possible to avoid the integer
 conversion overhead, but some columns use custom enum types and are better
 transferred in text mode.

 Some INSERTs happen via stored procedures, to implement de-duplication.

 These issues could be addressed by using temporary staging tables. However,
 when I did that in the past, this caused pg_shdepend bloat. Carefully
 reusing them when possible might avoid that.  Again, due to the variance in
 lengths of runs, the staging tables are not always beneficial.

 I understand that pipelining introduces complexity.  But solving the issues
 described above is no picnic, either.

Maybe consider using libpqtypes (http://libpqtypes.esilo.com/)?  It
transfers most everything in binary (enums notably are handled as
strings).  A typical usage of libpqtypes would be to arrange multiple
records into an array on the client then hand them off to a stored
procedure on the server side (perhaps over an asynchronous call while
you assemble the next batch).  libpqtypes was written for C
applications with very high performance requirements (for non
performance critical cases we might use json instead).  In my
experience it's not too difficult to arrange an assembly/push loop
that amortizes the round trip overhead to zero; it's not as efficient
as COPY but much more flexible and will blow away any scheme that
sends data row per query.

I agree with Tom that major changes to the libpq network stack is
probably not a good idea.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: Async query processing

2014-01-03 Thread Merlin Moncure

On Fri, Jan 3, 2014 at 11:06 AM, Claudio Freire klaussfre...@gmail.com wrote:
 On Fri, Jan 3, 2014 at 12:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Claudio Freire klaussfre...@gmail.com writes:
 On Fri, Jan 3, 2014 at 10:22 AM, Florian Weimer fwei...@redhat.com wrote:
 Loading data into the database isn't such an uncommon task.  Not everything
 is OLTP.

 Truly, but a sustained insert stream of 10 Mbps is certainly way
 beyond common non-OLTP loads. This is far more specific than non-OLTP.

 I think Florian has a good point there, and the reason is this: what
 you are talking about will be of exactly zero use to applications that
 want to see the results of one query before launching the next.  Which
 eliminates a whole lot of apps.  I suspect that almost the *only*
 common use case in which a stream of queries can be launched without
 feedback is going to be bulk data loading.  It's not clear at all
 that pipelining the PQexec code path is the way to better performance
 for that --- why not use COPY, instead?

 You're forgetting ORM workloads.

 ORMs can usually plan the inserts to be in a sequence that both don't
 require feedback (except the knowledge that they were successful), and
 that do not violate constraints.

 Flushing a whole object hierarchy for instance, can be done without
 feedback. Not even serial columns need feedback, since many ORMs
 (SQLAlchemy, Hibernate) support allocation of ID sequences in batches
 (by issuing a proper select nextval).

 I agree, that with the proposed API, it's too error prone to be
 useful. But I also think, if the API is simple and fool-proof enough,
 it could be build them and they will come. I know I'll be happy to
 implement support for SQLAlchemy (since it will benefit me), if the
 API resembles the proposition below (at least in simplicity).

 Per-query expectations could be such a thing. And it can even work with 
 PQexec:

 PQexec(con, SELECT nextval('a_id_seq') FROM generate_series(1,10););
 --read--
 PQexec(con, SELECT nextval('b_id_seq') FROM generate_series(1,10););
 --read--
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 ... 9 times...
 PQexec(con, INSERT INTO a (...);, PQEXPECT_NO_RESULT | PQASYNC_CORK);
 PQexec(con, INSERT INTO b (...);, PQEXPECT_NO_RESULT | PQASYNC);
 do {
// do something useful
 } while (PQflush());

 Here, the PQASYNC flag would temporarily switch to non-blocking I/O,
 and buffer what cannot be sent. PQASNC_CORK, would only buffer (only
 send if the buffer is full). After any ASYNC call, PQflush would be
 necessary (to flush the send queue and to consume the expected
 responses), but I can imagine any synchronous call (PQexec,
 PQsendQuery or whatever) could detect a non-empty buffer and just
 blockingly flush right there.

 This can benefit many useful patterns. ORM flush, is one, if there can
 be preallocation of IDs (which I know at least SQLAlchemy and
 Hibernate both support).

 Execute-many of prepared statements is another one, quite common.

 I'm not sure what would happen if one of the queries returned an
 error. If in a transaction, all the following queries would error out
 I'd imagine. If not, they would simply be executed blindly.. am I
 correct?

Long term, I'd rather see an optimized 'ORM flush' assemble the data
into a structured data set (perhaps a JSON document) and pass it to
some receiving routine that decomposed it into records.  This is a
better way to so things on so many levels.  Maybe I'm an old cranky
guy yelling at pigeons, but I don't think the current approach that
many ORMs take is going to withstand the test of time.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple read-write masters in a cluster with wal-streaming synchronization

2014-01-02 Thread Merlin Moncure

On Tue, Dec 31, 2013 at 3:51 PM, Mark Dilger markdil...@yahoo.com wrote:
 The BDR documentation
 http://wiki.postgresql.org/images/7/75/BDR_Presentation_PGCon2012.pdf
 says,

 Physical replication forces us to use just one
  node: multi-master required for write scalability

 Physical replication provides best read scalability

 I am inclined to agree with the second statement, but
 I think my proposal invalidates the first statement, at
 least for a particular rigorous partitioning over which
 server owns which data.

 In my own workflow, I load lots of data from different
 sources.  The partition the data loads into depends on
 which source it came from, and it is never mixed or
 cross referenced in any operation that writes the data.
 It is only mixed in the sense that applications query
 data from multiple sources.

 So for me, multi-master with physical replication seems
 possible, and would presumably provide the best
 read scalability.  I doubt that I am in the only database
 user who has this kind of workflow.

 The alternatives are ugly.  I can load data from separate
 sources into separate database servers *without* replication
 between them, but then the application layer has to
 emulate queries across the data.  (Yuck.)  Or I can use
 logical replication such as BDR, but then the servers
 are spending more effort than with physical replication,
 so I get less bang for the buck when I purchase more
 servers to add to the cluster.  Or I can use FDW to access
 data from other servers, but that means the same data
 may be pulled across the wire arbitrarily many times, with
 corresponding impact on the bandwidth.

 Am I missing something here?  Does BDR really provide
 an equivalent solution?

I think BDR is better: while it does only support schema-equivalent
replication that is the typical case for distributed write systems
like this.  Also, there are a lot less assumptions about the network
architecture in the actual data itself (for example, what happens when
you want to change onwer/mege/split data?).  IMNSHO, It's better that
each node is managing WAL for itself, not the other way around except
in the very special case you want an exact replica of the database on
each node at all times as with the current HS/SR.

A **huge** amount of work has/is being put in to wal based logical
replication support (LLSR in the BDR docs) that should mostly combine
the flexibility of trigger based logical replication with the
robustness of wal based replication that we have in core now.  LLSR a
low level data transmission framework that can be wrapped by higher
level user facing stuff like BDR.  LLSR, by the way, does not come
attached with the assumption that all databases have the same schema.
If I were you, I'd be studying up on LLSR and seeing how it could be
molded into the use cases you're talking about.  From a development
point of view, the replication train hasn't just left the station,
it's a space shuttle that just broke out of earth's orbit.  By my
reckoning a new 'from the ground up' implementation of replication
requiring in-core changes has an exactly zero percent change of being
adopted.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

2013-12-16 Thread Merlin Moncure

On Sat, Dec 14, 2013 at 6:20 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-12-13 15:49:45 -0600, Merlin Moncure wrote:
 On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote:
  On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  And while we're on the subject ... isn't bgworker_die() utterly and
  completely broken?  That unconditional elog(FATAL) means that no process
  using that handler can do anything remotely interesting, like say touch
  shared memory.
 
  Yeah, but for the record (since I see I got cc'd here), that's not my
  fault.  I moved it into bgworker.c, but it's been like that since
  Alvaro's original commit of the bgworker facility
  (da07a1e856511dca59cbb1357616e26baa64428e).


 Is this an edge case or something that will hit a lot of users?
 Arbitrary server panics seems pretty serious...

 Is your question about the bgworker part you're quoting or about the
 stuck spinlock stuff? I don't think the bgworker bug is too bad in
 practice but the one in handle_sig_alarm() stuff certainly is.

 I think while it looks possible to hit problems without statement/lock
 timeout, it's relatively unlikely that those are hit in practice.

Well, both -- I was just wondering out loud what the severity level of
this issue was.  In particular, is it advisable for the general public
avoid this release?   My read on this is 'probably'.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] stuck spinlock

2013-12-13 Thread Merlin Moncure

On Fri, Dec 13, 2013 at 12:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 13, 2013 at 11:26 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 And while we're on the subject ... isn't bgworker_die() utterly and
 completely broken?  That unconditional elog(FATAL) means that no process
 using that handler can do anything remotely interesting, like say touch
 shared memory.

 Yeah, but for the record (since I see I got cc'd here), that's not my
 fault.  I moved it into bgworker.c, but it's been like that since
 Alvaro's original commit of the bgworker facility
 (da07a1e856511dca59cbb1357616e26baa64428e).


Is this an edge case or something that will hit a lot of users?
Arbitrary server panics seems pretty serious...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-12 Thread Merlin Moncure

On Thu, Dec 12, 2013 at 4:02 AM, knizhnik knizh...@garret.ru wrote:
 On 12/12/2013 11:42 AM, Pavel Stehule wrote:

 it is interesting idea. For me, a significant information from comparation,
 so we do some significantly wrong. Memory engine should be faster naturally,
 but I don't tkink it can be 1000x.


 Sorry, but I didn't  fabricate this results:
 Below is just snapshot from my computer:


 postgres=# select DbItem_load();
  dbitem_load
 -
  998
 (1 row)

 postgres=# \timing
 Timing is on.
 postgres=# select cs_used_memory();
  cs_used_memory
 
  4441894912
 (1 row)

 postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
  (select (cs_project_agg(ss1.*)).* from
(select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
 cs_hash_sum(q.score*q.volenquired,
 q.trader||q.desk||q.office) s1,
  cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2)
 ss1) ss2;
  agg_val  |   cs_cut
 --+
  1.50028393511844 | (John Coltrane,New York Corporates,New York)
 
 Time: 506.125 ms

 postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
 by (trader,desk,office);
 ...
 Time: 449328.645 ms
 postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
 by (trader,desk,office);
 ...
 Time: 441530.689 ms

 Please notice that time of second execution is almost the same as first,
 although all data can fit in cache!

 Certainly it was intersting to me to understand the reason of such bad
 performance.
 And find out two things:

 1.
  select sum(score*volenquired)/sum(volenquired) from DbItem group by
 (trader,desk,office);
 and
  select sum(score*volenquired)/sum(volenquired) from DbItem group by
 trader,desk,office;

 are not the same queries (it is hard to understand to C programmer:)
 And first one is executed significantly slower.

 2. It is not enough to increase shared_buffers parameter in
 postgresql.conf.
 work_mem is also very important. When I increased it to 1Gb from default
 1Mb, then time of query execution is reduced to
 7107.146 ms. So the real difference is ten times, not 1000 times.

Yeah.  It's not fair to compare vs an implementation that is
constrained to use only 1mb.  For analytics work huge work mem is
pretty typical setting.   10x improvement is believable considering
you've removed all MVCC overhead, locking, buffer management, etc. and
have a simplified data structure.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-12 Thread Merlin Moncure

On Thu, Dec 12, 2013 at 12:18 PM, knizhnik knizh...@garret.ru wrote:
 IMHO it is strange to see such small default values in postgresql
 configuration.

This (low default work mem) is because of three things:

1) Most queries do not really need a lot of work mem
2) Work mem stacks with each query using it -- so with your 1mb
setting vs 1000 connections, you get a gigabyte.  So, some
conservatism is justified although this setting tended to be much more
dangerous in the old days when we measured memory in megabytes.
3) Postgres does not query available physical memory for default
settings due to portability issues.  So we tend to tune to common
denominator.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread Merlin Moncure

On Mon, Dec 9, 2013 at 1:40 PM, knizhnik knizh...@garret.ru wrote:
 Hello!

 I want to annouce my implementation of In-Memory Columnar Store extension
 for PostgreSQL:

  Documentation: http://www.garret.ru/imcs/user_guide.html
  Sources: http://www.garret.ru/imcs-1.01.tar.gz

 Any feedbacks, bug reports and suggestions are welcome.

 Vertical representation of data is stored in PostgreSQL shared memory.
 This is why it is important to be able to utilize all available physical
 memory.
 Now servers with Tb or more RAM are not something exotic, especially in
 financial world.
 But there is limitation in Linux with standard 4kb pages  for maximal size
 of mapped memory segment: 256Gb.
 It is possible to overcome this limitation either by creating multiple
 segments - but it requires too much changes in PostgreSQL memory manager.
 Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
 system).

 I found several messages related with MAP_HUGETLB flag, the most recent one
 was from 21 of November:
 http://www.postgresql.org/message-id/20131125032920.ga23...@toroid.org

 I wonder what is the current status of this patch?

I looked over your extension.  I think it's a pretty amazing example
of the postgres extension and type systems -- up there with postgis.
Very well done.  How long did this take you to write?

MAP_HUGETLB patch was marked 'returned with feedback'.
https://commitfest.postgresql.org/action/patch_view?id=1308.  It seems
likely to be revived, perhaps in time for 9.4.

Honestly, I think your efforts here provide more argument for adding
huge tbl support.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] In-Memory Columnar Store

2013-12-11 Thread Merlin Moncure

On Wed, Dec 11, 2013 at 10:08 AM, knizhnik knizh...@garret.ru wrote:
 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer.
 Just defining insertion per-row trigger with empty procedure increase time
 of insertion of 6 million records twice - from 7 till 15 seconds. If trigger
 procedure is not empty, then time is increased proportionally number of
 performed calls.
 In my case inserting data with propagation it in columnar store using
 trigger takes about 80 seconds. But if I first load data without triggers in
 PostgreSQL table and then
 insert it in columnar store using load function (implemented in C), then
 time will be 7+9=16 seconds.

Yeah. For this problem, we either unfortunately have to try to try to
use standard sql functions in such away that supports inlining (this
is a black art mostly, and fragile), or move logic out of the function
and into the query via things like window functions, or just deal with
the performance hit.  postgres flavored SQL is pretty much the most
productive language on the planet AFAIC, but the challenge is always
performance, performance.

Down the line, I am optimistic per call function overhead can be
optimized, probably by expanding what can be inlined somehow.  The
problem is that this requires cooperation from the language executors
this is not currently possible through the SPI interface, so I really
don't know.

 Certainly I realize that plpgsql is interpreted language. But for example
 also interpreted Python is able to do 100 times more calls per second.
 Unfortunately profiler doesn;t show some bottleneck - looks like long
 calltime is caused by large overhead of initializing and resetting memory
 context and copying arguments data.

 2. Inefficient implementation of expanding composite type columns using
 (foo()).* clause. In this case function foo() will be invoked as much times
 as there are fields in the returned composite type. Even in case of placing
 call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still
 sometimes performs redundant calls which can be avoided using hack with
 adding OFFSET 1 clause.

Yeah, this is long standing headache.   LATERAL mostly deals with this
but most cases (even with pre-9.3) can be worked around one way or
another.

 3. 256Gb limit for used shared memory segment size at Linux.

I figure this will be solved fairly soon. It's a nice problem to have.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compression of tables

2013-12-10 Thread Merlin Moncure

On Tue, Dec 10, 2013 at 1:18 AM, Thomas Munro mu...@ip9.org wrote:
 Hi

 I have been wondering what the minimum useful heap table compression
 system would be for Postgres, in order to reduce disk footprint of
 large mostly static datasets.  Do you think an approach similar to the
 static row-level compression of DB2 could make sense?  I imagine
 something like this:

 1.  You have a table which already has data in it.

 2.  You run a COMPRESS operation, which builds a static dictionary,
 and rewrites the whole table with compressed frozen tuples.  Frozen
 tuples have CTIDs just like regular tuples, and can be pointed to by
 indexes.  They are decompressed on the fly when needed.

 Clearly things get tricky once you need to update rows.  Assume for
 simplicity that future UPDATEs and INSERTs produce normal,
 non-compressed tuples that would only be compressed by a subsequent
 COMPRESS operation.  The question is how to deal with the existing
 compressed rows when UPDATEd or DELETEd.  Some approaches:

 1.  Just don't allow updates of compressed rows (!).

 2.  Exclusively lock the whole page when trying to update any
 compressed row, while you explode it into regular uncompressed tuples
 on new pages which you can work on (!).

 3.  Pull the minimum header fields out of the compressed tuples so
 that the MVCC machinery can work, to support updates of compressed
 tuples.  Perhaps just the t_xmax, t_ctid values (t_xmin == frozen is
 implied), so that a writer can update them.  This means an overhead of
 at least 10 bytes per tuple over the compressed size (plus the item
 offsets in the page header).

 4.  Something far cleverer.

 Well, these are straw-man suggestions really and I probably don't
 understand enough about PG internals (MVCC and implications for
 VACUUM) to be making them.  But I'm curious to know if anyone has
 researched something like this.

 For example, I have a system that occupies a couple of TB on disk, but
 only a few to a few hundred MB per day change, mostly adding data to
 an active partition.  I periodically run CLUSTER on any partition that
 has pg_stat.correlation  0.9 (this effectively just re-CLUSTERs the
 active one).  At the same times I would COMPRESS, and the DB could
 potentially fit on much smaller SSDs.

 Most commercial database systems I encounter these days are using
 compression of some sort (more sophisticated than the above,
 with dynamic dictionaries, and sometimes column based storage etc).

postgres compresses TOASTED data: one strategy could be to arrange
your data somehow to utilize TOAST.

I doubt you'll ever see generally heap compressed data in the way
you're thinking: postgres has a strong informal policy of not
implementing features which are dubious and or excessively complicated
with limited benefit, particularly if there are ways to handle this
outside the database; there are various operating system level tricks
that can cause a compressed file or even an entire tablespace (o/s
folder) masquerade as a regular structures.  So maybe you are asking
for a feature we already have: CREATE TABLESPACE.

For example take a look here:
https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F

(out of curiosity, if this strategy fits the bill for you I wouldn't
mind seeing a follow up on how this handles your static data use
case).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON decoding plugin

2013-12-10 Thread Merlin Moncure

On Mon, Dec 9, 2013 at 10:53 AM, Euler Taveira eu...@timbira.com.br wrote:
 On 09-12-2013 13:12, Merlin Moncure wrote:
 This is pretty neat.   Couple minor questions:
 *) Aren't you *en*coding data into json, not the other way around (decoding?)

 Yes. The 'decoding' came from the functionality (logical decoding) and
 because the POC plugin is named 'test_decoding'. I also think that
 'json_decoding' doesn't say much about the module purpose. I confess
 that I don't like the name but can't come up with a good name. Maybe
 'wal2json' or 'logrep2json'? Could you suggest something?

I'm partial to wal2json actually.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] JSON decoding plugin

2013-12-09 Thread Merlin Moncure

On Mon, Dec 9, 2013 at 7:03 AM, Euler Taveira eu...@timbira.com.br wrote:
 Hi,

 A few months ago, it was proposed [1] that would be interested to have a
 json output plugin for logical decoding. Here it is.

 Each transaction is a JSON object that can contain xid (optional),
 timestamp (optional), and change array. Each change's element is a
 command that was decoded and it can contains: kind (I/U/D), schema
 (optional), table, columnnames, columntypes (optional), columnvalues,
 and oldkeys (only for U/D). columnnames, columntypes and columnvalues
 are arrays. oldkeys is an object that contains the following arrays:
 keynames, keytypes (optional), and keyvalues.

 The JSON objects are serialized if you are decoding a serie of
 transactions. Here is an output example:

 {
 xid: 702,
 change: [
 {
 kind: insert,
 schema: public,
 table: foo,
 columnnames: [a, b, c],
 columntypes: [int4, int4, text],
 columnvalues: [1, 2, test]
 }
 ,{
 kind: update,
 schema: public,
 table: foo,
 columnnames: [a, b, c],
 columntypes: [int4, int4, text],
 columnvalues: [1, 2, test2],
 oldkeys: {
 keynames: [a, b],
 keytypes: [int4, int4],
 keyvalues: [1, 2]
 }
 }
 ]
 }
 {
 xid: 703,
 change: [
 {
 kind: update,
 schema: public,
 table: foo,
 columnnames: [a, b, c],
 columntypes: [int4, int4, text],
 columnvalues: [1, 3, test2],
 oldkeys: {
 keynames: [a, b],
 keytypes: [int4, int4],
 keyvalues: [1, 2]
 }
 }
 ]
 }
 {
 xid: 704,
 change: [
 {
 kind: delete,
 schema: public,
 table: foo,
 oldkeys: {
 keynames: [a, b],
 keytypes: [int4, int4],
 keyvalues: [1, 3]
 }
 }
 ]
 }


 Some data types was adapted to conform with JSON spec. NAN and Infinity
 are not valid JSON symbols so their representation is NULL (as some JSON
 implementations). Due to JSON datatype simplicity, I represent the vast
 majority of Postgres datatypes as string (However, I admit that we could
 mimic the json datatype conversion rules).

 The oldkeys treatment follows what was defined by the commit [2]. It uses:

 (i) primary key (default behavior);
 (ii) unique index (if REPLICA IDENTITY USING INDEX is defined for table);
 (iii) full tuple (if REPLICA IDENTITY FULL is defined for table);
 (iv) nothing means an error (if REPLICA IDENTITY NOTHING is defined for
 table).

 The TOAST columns have a special treatment for UPDATEs. If a tuple that
 contains a TOAST field is updated, the TOAST field is included iif it is
 changed too. It means that unchanged TOAST field are omitted from
 columns* arrays. This means less overhead while transmitting,
 processing and applying changes.

 By design, (i) output plugin doesn't know about aborted transactions and
 (ii) subtransactions are reordered into a toplevel transaction and only
 the committed pieces are passed to the plugin.

 You can test it firing the regression tests (e.g. 'make test') or using
 the following steps?

 postgresql.conf:
 wal_level = logical
 max_wal_senders = 2
 max_logical_slots = 2

 start collecting WAL records:

 $ pg_recvlogical --slot=foo -d euler -f /dev/stdout
 --plugin=json_decoding_plugin --init

 [execute some transactions]

 start printing decoded transactions:

 $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --start

 stop collecting WAL records:

 $ pg_recvlogical --slot=foo -d euler -f /dev/stdout --stop


 Comments?

This is pretty neat.   Couple minor questions:
*) Aren't you *en*coding data into json, not the other way around (decoding?)
*) Consider generating a long bytea instead of explicitly writing a
32kb sql into the patch.
*) You've built your own json serializer here.  Maybe some code can be
shared with the json type?
*) Consider removing 'plugin ' from the name of the plugin.
--plugin=json_decoding etc.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] Status of FDW pushdowns

2013-12-04 Thread Merlin Moncure

On Mon, Dec 2, 2013 at 10:26 PM, David Fetter da...@fetter.org wrote:
 On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote:
 On 11/28/2013 03:24 AM, David Fetter wrote:
  WITH, or SRF, or whatever, the point is that we need to be able to
  specify what we're sending--probably single opaque strings delimited
  just as we do other strings--and what we might get back--errors only,
  rows, [sets of] refcursors are the ones I can think of offhand.

 So, you're thinking of something like:

 WITH FOREIGN somecte AS $$... foreign query ...$$
 SELECT ...
 FROM somecte;

 I was picturing something a little more like an SRF which would take
 one opaque string, the remote command, some descriptor, perhaps an
 enum, of what if anything might come back.  Long ago, I implemented a
 similar thing in DBI-Link.  It was called

 remote_exec_dbh(data_source_id integer, query text, returns_rows bool)

Couple thoughts:
*) Any 'pass through' API should support parameterization (the FDW may
not support that, but many will and API should allow for it).   Lack
of parameterization is a major downside of dblink.  The function could
be set up to be variadic for the parameters.

*) For a connectivity APIs of this style, Dblink-ish mechanic of
separating command execution from data returning commands is likely
the right way to go.  Also, probably better to stick with SRF
mechanics if we go the 'function route'.  So basically we are making
dblink for FDW, adding parameterization and some concept of utilizing
the foreign server.

All this is assuming we are adding a special remote execution function
('fdwlink').  While that would be great, it's a significant deviation
from the standard into postgresql specific SRF syntax.   If some of
the qual pushdown deparsing functionality could be put inside the
internal FDW API, then you'd get the best of both worlds.  Maybe you'd
still want a dblink style extension anyways, but it wouldn't be as
critical.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Status of FDW pushdowns

2013-12-04 Thread Merlin Moncure

On Wed, Dec 4, 2013 at 1:39 PM, David Fetter da...@fetter.org wrote:
 On Wed, Dec 04, 2013 at 12:43:44PM -0600, Merlin Moncure wrote:
 On Mon, Dec 2, 2013 at 10:26 PM, David Fetter da...@fetter.org wrote:
  On Tue, Dec 03, 2013 at 11:15:36AM +0800, Craig Ringer wrote:
  On 11/28/2013 03:24 AM, David Fetter wrote:
   WITH, or SRF, or whatever, the point is that we need to be able to
   specify what we're sending--probably single opaque strings delimited
   just as we do other strings--and what we might get back--errors only,
   rows, [sets of] refcursors are the ones I can think of offhand.
 
  So, you're thinking of something like:
 
  WITH FOREIGN somecte AS $$... foreign query ...$$
  SELECT ...
  FROM somecte;
 
  I was picturing something a little more like an SRF which would take
  one opaque string, the remote command, some descriptor, perhaps an
  enum, of what if anything might come back.  Long ago, I implemented a
  similar thing in DBI-Link.  It was called
 
  remote_exec_dbh(data_source_id integer, query text, returns_rows bool)

 Couple thoughts:
 *) Any 'pass through' API should support parameterization (the FDW may
 not support that, but many will and API should allow for it).   Lack
 of parameterization is a major downside of dblink.  The function could
 be set up to be variadic for the parameters.

 I don't know for sure that that needs to be in version 1 of this.  It
 definitely shouldn't block implementing the non-parameterized one.

I'm not making the case it should be version anything.  But, if you
went dblink style, you'd want to go variadic.  It's not really any
extra work and you can always embed the string if the FDW driver
doesn't support parameterization.

 What the standard has is literally insane.

Not sure I agree.  The guiding principle of the standard
implementation AIUI is that it wants to connectivity management via
syntax and keep the DML abstractions clean (minus some
un-implementable things like RI triggers).  In other words, you write
exactly the same queries for native and foreign tables.  This makes
things much easier for people who just want to write SQL the classical
way and not get into funky vendor specific APIs.

The downside of SQL-MED, particularly the way postgres implemented the
driver API, is that each driver is responsible for for all
optimization efforts and I think this is bad.  So I'm openly wondering
if the FDW API should expose optional query rewriting hooks.  The
odbc-fdw and jdbc-fdw drivers for example could then benefit from
those hooks so that qual pushdown could be implemented with far less
code duplication and effort and a *much* broader set of problems could
be addressed by FDW.  For non- or exotic- SQL implementations those
hooks could be implemented locally by the driver or disabled if
doesn't make sense to use them.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Why we are going to have to go DirectIO

2013-12-04 Thread Merlin Moncure

On Wed, Dec 4, 2013 at 2:31 PM, Jonathan Corbet cor...@lwn.net wrote:
 For those interested in the details... (1) It's not quite 50/50, that's one
 bound for how the balance is allowed to go.  (2) Anybody trying to add
 tunables to the kernel tends to run into resistance.  Exposing thousands of
 knobs tends to lead to a situation where you *have* to be an expert on all
 those knobs to get decent behavior out of your system.  So there is a big
 emphasis on having the kernel tune itself whenever possible.  Here is a
 situation where that is not always happening, but a fix (which introduces
 no knob) is in the works.

I think there are interesting parallels here with the 'query plan
hints' debate.  In both cases I think the conservative voices are
correct: better not to go crazy adding knobs.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Status of FDW pushdowns

2013-11-27 Thread Merlin Moncure

On Wed, Nov 27, 2013 at 4:20 AM, Shigeru Hanada
shigeru.han...@gmail.com wrote:
 Hi Merlin,

 2013/11/22 Merlin Moncure mmonc...@gmail.com:
 On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada
 shigeru.han...@gmail.com wrote:
 2013/11/22 Tom Lane t...@sss.pgh.pa.us:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote:
 I know join pushdowns seem insignificant, but it helps to restrict what
 data must be passed back because you would only pass back joined rows.

 By 'insignificant' you mean 'necessary to do any non-trivial real
 work'.   Personally, I'd prefer it if FDW was extended to allow
 arbitrary parameterized queries like every other database connectivity
 API ever made ever.

 [ shrug... ]  So use dblink.  For better or worse, the FDW stuff is
 following the SQL standard's SQL/MED design, which does not do it
 like that.

 Pass-through mode mentioned in SQL/MED standard might be what he wants.

 happen to have a link handy?

 SQL/MED standard doesn't say much about PASS THROUGH mode, especially
 about interaction between client.  Besides it, I think it would be
 nice to allow arbitrary FDW as backend of dblink interface like this:

 postgres= SELECT dblink_connect('con1', 'server name of an FDW');
 postgres= SELECT * FROM dblink('con1', 'some query written in remote
 syntax') as t(/* record type definition */...);

 This provides a way to execute query without defining foreign table.

yeah.  (thanks for indulging -- this is barely on topic I guess).

if it were possible to create a supporting function (say, fdw_link)
that could somehow interface with a previously established server, it
could probably be worked out.   Then all FDW could leverage
parameterization without having to copy and paste the pgsql-fdw qual
push code.  But that would be a fairly large break from the rest of
the FDW syntax and having to define the record at each call site is
admittedly a bit of a headache.

Hm, another way to think about this would be to somehow abstract the
qual push into a library so that it could be accessed by other FDWs if
they opted in.  This would address my chief complaint that only the
pgsql-fdw (the only database for which we already have an in-core high
quality connection api) driver could tap the excellent work you've
done.  If this were even possible, it would probably result in more
fdw API changes.

If my:

SELECT * FROM big_sql_server_foreign_table WHERE id = x;

was fast, that'd be pretty nice.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql shows line number

2013-11-26 Thread Merlin Moncure

On Tue, Nov 26, 2013 at 9:22 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Tiikkaja ma...@joh.to writes:
 On 11/26/13 2:24 PM, Pavel Stehule wrote:
 yes, but we can define new statement like \sq+ (Show Query)

 There's already \p; might not be too difficult to add a \p+ which would
 also show the line numbers.

 I don't actually see the point of this.  If you're working with queries
 that are long enough that counting lines is even slightly difficult,
 surely you're going to use \e to correct the problem?  So it seems to
 me that the existing \e nnn facility pretty much does what's needed,
 and there's no great need to clutter your regular display with line
 numbers.

I agree in the sense line numbers don't help.  But \e is not really a
solution to the problem.  A pretty common thing for me to see (as an
artifact of my styling) in the log:

ERROR:  syntax error at or near from
LINE 223: from
  ^

Not very helpful.  Also good luck if your SQL is dynamically
generated.  What I think would help *would* be able to optionally add
a some lines of context: then you'd at least have a shot at eyeballing
the error etc.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new unicode table border styles for psql

2013-11-25 Thread Merlin Moncure

On Mon, Nov 25, 2013 at 3:33 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Pavel Stehule pavel.steh...@gmail.com writes:
 there is other issue - simply parser will be really user unfriendly, and
 user friendly parser will not by simply :(

 If simple things are hard to implement, get yourself better tools.

 Each time we get on the topic of improving scripting abilities for our
 interactive tool, it's always the same problem: having to invent a
 scripting language with a whole parser is just too much work.

 Maybe it's time we step back a little and consider real scripting
 solutions to embed into psql, and pgbench too:

I'm thinking (did I miss something?) that Pavel was commenting merely
on the parsing of setting unicode border characters, not the wider
scripting of psql.  (psql scripting is a fun topic to discuss though
:-)).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Put json type into alphabetical order in manual table

2013-11-25 Thread Merlin Moncure

On Mon, Nov 25, 2013 at 3:07 PM, Andreas Karlsson andr...@proxel.se wrote:
 Hi,

 When looking at table 8-1 at
 http://www.postgresql.org/docs/9.3/static/datatype.html i noticed that all
 types except for json was in alphabetical order. I have attached a patch
 which fixes this.

 By the way should character and character varying be swapped in that table
 too?

I would.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Status of FDW pushdowns

2013-11-22 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 6:43 PM, Shigeru Hanada
shigeru.han...@gmail.com wrote:
 2013/11/22 Tom Lane t...@sss.pgh.pa.us:
 Merlin Moncure mmonc...@gmail.com writes:
 On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote:
 I know join pushdowns seem insignificant, but it helps to restrict what
 data must be passed back because you would only pass back joined rows.

 By 'insignificant' you mean 'necessary to do any non-trivial real
 work'.   Personally, I'd prefer it if FDW was extended to allow
 arbitrary parameterized queries like every other database connectivity
 API ever made ever.

 [ shrug... ]  So use dblink.  For better or worse, the FDW stuff is
 following the SQL standard's SQL/MED design, which does not do it
 like that.

 Pass-through mode mentioned in SQL/MED standard might be what he wants.

happen to have a link handy?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new unicode table border styles for psql

2013-11-22 Thread Merlin Moncure

On Fri, Nov 22, 2013 at 2:23 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello


 2013/11/21 Merlin Moncure mmonc...@gmail.com

 On Thu, Nov 21, 2013 at 1:09 AM, Pavel Stehule pavel.steh...@gmail.com
 wrote:
  Hello
 
  I wrote new styles for  psql table borders.
 
  http://postgres.cz/wiki/Pretty_borders_in_psql
 
  This patch is simply and I am think so some styles can be interesting
  for
  final presentation.
 
 great. hm, maybe we could integrate color? (see:

 http://merlinmoncure.blogspot.com/2012/09/psql-now-with-splash-of-color.html).


 it is next possible enhancing - I would to go forward in small steps, please
 :)

 minimally (and independent on proposed patch) we can introduce some like
 final regexp filtering  - that can be used for this or other purposes.

Yeah.  A per field regexp would do the trick.  As you have it, I like
Peter's idea best.  Being able to specify the various character codes
makes a lot of sense.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new unicode table border styles for psql

2013-11-22 Thread Merlin Moncure

On Fri, Nov 22, 2013 at 8:45 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Pavel Stehule escribió:

 2013/11/21 Peter Eisentraut pete...@gmx.net

  Maybe make the border setting a string containing the various characters
  by index.  Then everyone can create their own crazy borders.
 
 I seriously though about it, but not sure if it is good way.

 How about having a single unicode line style, and then have a
 different \pset setting to determine exactly what chars to print?  This
 wouldn't allow for programmability, but it seems better UI to me.
 This proliferation of unicode line style names seems odd.

That makes sense to me, especially if you could pass escapes.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] why semicolon after begin is not allowed in postgresql?

2013-11-22 Thread Merlin Moncure

On Fri, Nov 22, 2013 at 4:34 PM, Mike Blackwell mike.blackw...@rrd.com wrote:
 I believe the section you are reading refers to the BEGIN keyword in the
 procedural language plpgsql, not the SQL 'BEGIN' command.  The issue stems
 from confusing two distinct languages both of which, along with several more
 procedural languages, are documented in the same manual.

This is inherited constraint from Oracle pl/sql which pl/pgsql is, uh,
inspired by.  In pl/sql, all block opening constructs (THEN, LOOP,
BEGIN) do not get semi-colons.  BEGIN is a weird case because it's
(quite unfortunately) also the same thing that explicitly opens a
transaction in vanilla SQL; you use a semi-colon there as with any
standard SQL statement.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-11-21 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote:
 Hmm. All callers of RecoveryInProgress() must be prepared to handle the case
 that RecoveryInProgress() returns true, but the system is no longer in
 recovery. No matter what locking we do in RecoveryInProgress(), the startup
 process might finish recovery just after RecoveryInProgress() has returned.

 True.

 What about the attached? It reads the shared variable without a lock or
 barrier. If it returns 'true', but the system in fact just exited recovery,
 that's OK. As explained above, all the callers must tolerate that anyway.
 But if it returns 'false', then it performs a full memory barrier, which
 should ensure that it sees any other shared variables as it is after the
 startup process cleared SharedRecoveryInProgress (notably,
 XLogCtl-ThisTimeLineID).

 I'd argue that we should also remove the spinlock in StartupXLOG and
 replace it with a write barrier. Obviously not for performance reasons,
 but because somebody might add more code to run under that spinlock.

 Looks good otherwise, although a read memory barrier ought to suffice.

This code is in a very hot code path.  Are we *sure* that the read
barrier is fast enough that we don't want to provide an alternate
function that only returns the local flag?  I don't know enough about
them to say either way.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-11-21 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 9:09 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-11-21 09:08:05 -0600, Merlin Moncure wrote:
 This code is in a very hot code path.  Are we *sure* that the read
 barrier is fast enough that we don't want to provide an alternate
 function that only returns the local flag?  I don't know enough about
 them to say either way.

 A read barrier is just a compiler barrier on x86.

That's good enough for me then.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PERFORM] Cpu usage 100% on slave. s_lock problem.

2013-11-21 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 10:37 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 21.11.2013 17:08, Merlin Moncure wrote:

 On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund and...@2ndquadrant.com
 wrote:

 On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote:

 Hmm. All callers of RecoveryInProgress() must be prepared to handle the
 case
 that RecoveryInProgress() returns true, but the system is no longer in
 recovery. No matter what locking we do in RecoveryInProgress(), the
 startup
 process might finish recovery just after RecoveryInProgress() has
 returned.


 True.

 What about the attached? It reads the shared variable without a lock or
 barrier. If it returns 'true', but the system in fact just exited
 recovery,
 that's OK. As explained above, all the callers must tolerate that
 anyway.
 But if it returns 'false', then it performs a full memory barrier, which
 should ensure that it sees any other shared variables as it is after the
 startup process cleared SharedRecoveryInProgress (notably,
 XLogCtl-ThisTimeLineID).


 I'd argue that we should also remove the spinlock in StartupXLOG and
 replace it with a write barrier. Obviously not for performance reasons,
 but because somebody might add more code to run under that spinlock.

 Looks good otherwise, although a read memory barrier ought to suffice.


 This code is in a very hot code path.  Are we *sure* that the read
 barrier is fast enough that we don't want to provide an alternate
 function that only returns the local flag?  I don't know enough about
 them to say either way.


 In my patch, I put the barrier inside the if (!LocalRecoveryInProgress)
 block. That codepath can only execute once in a backend, so performance is
 not an issue there. Does that look sane to you?

oh right -- certainly!

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Status of FDW pushdowns

2013-11-21 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 9:05 AM, Bruce Momjian br...@momjian.us wrote:
 Where are we on the remaining possible pushdowns for foreign data
 wrappers, particularly the Postgres one?  I know we do WHERE restriction
 pushdowns in 9.3, but what about join and aggregate pushdowns?  Is
 anyone working on those?

 I know join pushdowns seem insignificant, but it helps to restrict what
 data must be passed back because you would only pass back joined rows.

By 'insignificant' you mean 'necessary to do any non-trivial real
work'.   Personally, I'd prefer it if FDW was extended to allow
arbitrary parameterized queries like every other database connectivity
API ever made ever.  But in lieu of that, I'll take as much push down
power as possible :-D.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new unicode table border styles for psql

2013-11-21 Thread Merlin Moncure

On Thu, Nov 21, 2013 at 1:09 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 Hello

 I wrote new styles for  psql table borders.

 http://postgres.cz/wiki/Pretty_borders_in_psql

 This patch is simply and I am think so some styles can be interesting for
 final presentation.

great. hm, maybe we could integrate color? (see:
http://merlinmoncure.blogspot.com/2012/09/psql-now-with-splash-of-color.html).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-19 Thread Merlin Moncure

On Tue, Nov 19, 2013 at 11:59 AM, Josh Berkus j...@agliodbs.com wrote:
 On 11/19/2013 08:14 AM, Robert Haas wrote:
 On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

 I don't even have to think hard.  If you want your JSON to be
 human-readable, it's entirely possible that you want it stored using
 the same whitespace that was present on input.  There is a valid use
 case for normalizing whitespace, too, of course.

 Given that JSON is a data interchange format, I suspect that there are
 an extremely large combination of factors which would result in an
 unimplementably large number of parser settings.  For example, I
 personally would have use for a type which allowed the storage of JSON
 *fragments*.  Therefore I am interested only in supporting two:

 a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's

I'm uncomfortable with the word 'legacy'.   This suggests the new type
will essentially deprecate the old type.  jsonb will be likely be
pessimal to large serializations.   If you're not manipulating and
searching the documents, why would you use it?  It's going to take
more space on disk and memory and should provide little benefit for
present *as well as future code* .  (note, it will provide extreme
benefits for nosql type uses which is of huge strategic importance for
the project).  json and jsonb APIs should work together cleanly, and
the documentation should suggest which types are different and better
for which cases.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure

On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan and...@dunslane.net wrote:
 I don't think any name that doesn't begin with json is acceptable. I could
 live with jsonb. It has the merit of brevity, but maybe it's a tad too
 close to json to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?

*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a record_to_jsonb etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?

Maybe we can cheat a little bit overload the functions so that one the
existing APIs (hstore or json) can be recovered -- only adding what
minimal functionality needs to be added to handle the type distinction
(mostly on serialization routines and casts).  What I'm driving at
here is that it would be nice if the API was not strongly bifurcated:
this would cause quite a bit of mindspace confusion.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-18 Thread Merlin Moncure

On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus j...@agliodbs.com wrote:
 Merlin,

 *) Aside from the text in and out routines, how is 'jsbonb' different
 from the coming 'nested hstore'?   Enough to justify two code bases?

 In/out functions and defaults are all different.  Otherwise, the two
 types will be accessing the same code base, so no duplication.  Think of
 is as XML vs. TEXT.

 Maybe we can cheat a little bit overload the functions so that one the
 existing APIs (hstore or json) can be recovered -- only adding what
 minimal functionality needs to be added to handle the type distinction
 (mostly on serialization routines and casts).  What I'm driving at
 here is that it would be nice if the API was not strongly bifurcated:
 this would cause quite a bit of mindspace confusion.

 I'll also note that for functions designed for output to the client, it
 doesn't make much of a difference whether the result is JSON or JSONB,
 since the string representation will be identical.  However, it makes a
 difference if the data is going to be stored, since a double conversion
 on a large JSON value would be expensive.

Hm, but it would matter wouldn't it...the jsonb representation would
give output with the record fields reordered, deduplicated, etc.
Also, presumably, the jsonb serialization would be necessarily slower
for exactly that reason, although perhaps not enough to matter much.

 In other words, we need a version of each function which outputs JSONB,
 but that version doesn't have to be the default if users don't specify.

 Note that this raises the issue of first alternate data type ambiguity
 again for overloading builtin functions.  We really need that method of
 prefering a specific version of the function ...

You'd need explicit jsonb creating functions: record_to_jsonb,
array_to_jsonb etc.  AFAIK, these functions would be the only ones
that would have to explicitly reference the jsonb type if you don't
count casts.

It's tempting to *not* make those functions as that would only require
the user to specify jsonb for table columns...you'd then go from json
to jsonb with a cast, perhaps even an implicit one.  The disadvantage
there is that you'd then get unsimplified json always.

Hm -- on that note, is it technically feasible to *not* duplicate the
json API implementations, but just (ab)use implicit casting between
the APIs?  That way the text API would own all the serialization
routines as it does now but you'd run mutation and searching through
jsonb...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure

On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing ha...@2ndquadrant.com wrote:
 On 11/14/2013 08:17 PM, Merlin Moncure wrote:
 On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
 da...@justatheory.com wrote:
 On Nov 14, 2013, at 7:07 AM, Merlin Moncure mmonc...@gmail.com wrote:

 This is exactly what needs to be done, full stop (how about: hstore).
 It really comes down to this: changing the serialization behaviors
 that have been in production for 2 releases (three if you count the
 extension) is bad enough, but making impossible some legal json
 constructions which are currently possible is an unacceptable
 compatibility break.  It's going to break applications I've currently
 put into production with no clear workaround.  This is quite frankly
 not ok and and I'm calling foul.  The RFC may claim that these
 constructions are dubious but that's irrelevant.  It's up to the
 parser to decide that and when serializing you are not in control of
 the parser.
 The current JSON type preserves key order and duplicates. But is it 
 documented that this is a feature, or something to be guaranteed?
 It doesn't, but the row_to_json function has a very clear mechanism of
 action.  And, 'not being documented' is not the standard for latitude
 to make arbitrary changes to existing function behaviors.
 the whole hash*() function family was changed based on not documented
 premise, so we do have a precedent .

 In my experience, no JSON parser guarantees key order or duplication.
 I found one in about two seconds.  http://docs.python.org/2/library/json.html

 object_pairs_hook, if specified will be called with the result of
 every JSON object decoded with an ordered list of pairs. The return
 value ofobject_pairs_hook will be used instead of the dict. This
 feature can be used to implement custom decoders that rely on the
 order that the key and value pairs are decoded (for example,
 collections.OrderedDict() will remember the order of insertion). If
 object_hook is also defined, the object_pairs_hooktakes priority.

 That makes the rest of your argument moot.  Plus, I quite clearly am
 dealing with parsers that do.
 I am sure you could also devise an json encoding scheme
 where white space is significant ;)

 The question is, how much of it should json *type* support.

 As discussed in other thread, most of your requirements
 would be met by having json/row/row set-to-text serializer
 functions which output json-formatted text.

No, that would not work putting aside the fact it would require
rewriting heaps of code.  What I do now inside the json wrapping
routines is create things like

{
  x: [
{dynamic object},
{dynamic object},
...
  ],
  y: ...,
  ...
}

The only way to do it is to build 'dynamic object' into json in
advance of the outer xxx_to_json call.  The 'dynamic object' is
created out of a json builder that takes a paired array -- basically a
variant of Andrew's 'json_build' upthread.  If the 'json serializer'
outputted text, the 'outer' to_json call would then re-escape the
object.  I can't use hstore for that purpose precisely because of the
transformations it does on the object.

Stepping back, I'm using json serialization as a kind of 'supercharged
crosstab'.  To any client that can parse json, json serialization
completely displaces crosstabbing -- it's superior in every way.  I
am, if you may, kind of leading research efforts in the area and I can
tell you with absolute certainty that breaking this behavior is a
mistake.

Forcing hstore-ish output mechanisms removes the ability to handle
certain important edge cases that work just fine today. If that
ability was taken away, it would be a very bitter pill for me to
swallow and would have certain ramifications for me professionally; I
went out on a pretty big limb and pushed pg/json aggressively (over
strenuous objection) in an analytics product which is now in the final
stages of beta testing.  I would hate to see the conclusion of the
case study be Ultimately we had to migrate the code back to Hibernate
due to compatibility issues.

Here are the options on the table:
1) convert existing json type to binary flavor (notwithstanding objections)
2) maintain side by side types, one representing binary, one text.
unfortunately, i think the text one must get the name 'json' due to
unfortunate previous decision.
3) merge the behaviors into a single type and get the best of both
worlds (as suggested upthread).

I think we need to take a *very* hard look at #3 before exploring #1
or #2: Haven't through it through yet but it may be possible to handle
this in such a way that will be mostly transparent to the end user and
may have other benefits such as a faster path for serialization.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proof of concept: standalone backend with full FE/BE protocol

2013-11-15 Thread Merlin Moncure

On Fri, Nov 15, 2013 at 6:06 AM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 But: I very, very much agree with the other concerns around this. This
 should be a patch to fix single user mode, not one to make postgres into
 a single process database. It's not, and trying to make it by using
 single user mode for it will start to hinder development of normal
 postgres because we suddenly need to be concerned about performance and
 features in situations where we previously weren't.

 +1

 Maybe what needs to happen to this patch is away to restrict its usage
 to --single. I'm thinking that postgres --single maybe could be made to
 fork the server process underneath the psql controler client process
 transparently.

I personally would prefer not to do that.  My main non-administrative
interest in this mode is doing things like benchmarking protocol
overhead.  I'm OK with not supporting (and optimizing) for single user
code paths but I don't like the idea of building walls that serve no
purpose other than to make it difficult for other people mess around.
Just document strenuously that this mode is not intended for
application bundling and move on...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure

On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com wrote:
 On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).

 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.

 If it’s possible to preserve order and still get the advantages of binary 
 representation --- which are substantial (see 
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and 
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of 
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It has 
 caused us a fair number of headaches at $work.

Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered).  The binary representation is the hstore-ish
variant.  The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there.  otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode.   So, if you went: 'select '{a:
1, a: 2}'::json', you'd get the same thing back.  (This is how
it works now.).  but, if you went: 'insert into foo select '{a: 1,
  a: 2}'::json returning *', you'd get {a: 2} back essentially
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such.  It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data.  They create *only* the
text mode.  However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant.  This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise.  so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling  json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure

On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan and...@dunslane.net wrote:

 On 11/15/2013 03:25 PM, Merlin Moncure wrote:

 On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler da...@justatheory.com
 wrote:

 On Nov 15, 2013, at 6:35 AM, Merlin Moncure mmonc...@gmail.com wrote:

 Here are the options on the table:
 1) convert existing json type to binary flavor (notwithstanding
 objections)
 2) maintain side by side types, one representing binary, one text.
 unfortunately, i think the text one must get the name 'json' due to
 unfortunate previous decision.
 3) merge the behaviors into a single type and get the best of both
 worlds (as suggested upthread).

 I think we need to take a *very* hard look at #3 before exploring #1
 or #2: Haven't through it through yet but it may be possible to handle
 this in such a way that will be mostly transparent to the end user and
 may have other benefits such as a faster path for serialization.

 If it’s possible to preserve order and still get the advantages of binary
 representation --- which are substantial (see
 http://theory.so/pg/2013/10/23/testing-nested-hstore/ and
 http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of
 examples) --- without undue maintenance overhead, then great.

 I am completely opposed to duplicate key preservation in JSON, though. It
 has caused us a fair number of headaches at $work.

 Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

 *) 'json' type internally has a binary as well a text representation.
 The text representation is basically the current type behavior
 (duduplicated unordered).  The binary representation is the hstore-ish
 variant.  The text mode is discarded when it's deemed no longer
 appropriate to be needed, and, once gone, can never be rebuilt as it
 was.

 *) only the binary internal representation ever gets stored to disk
 (or anything else).

 *) the text mode is preferred for output if it is there.  otherwise, a
 deduplicated, reordered text representation is generated

 *) When literal text is casted to json, the binary structure is built
 up and kept alongside binary mode.   So, if you went: 'select '{a:
 1, a: 2}'::json', you'd get the same thing back.  (This is how
 it works now.).  but, if you went: 'insert into foo select '{a: 1,
a: 2}'::json returning *', you'd get {a: 2} back essentially
 (although technically that would be a kind of race).

 *) When the json is stored to table, the text representation gets
 immediately discarded on the basis that it's no longer the true
 representation of the data.

 *) Ditto when making any equality operation (not as sure on this point).

 *) Ditto when doing any operation that mutates the structure in any
 way. the text representation is immutable except during serialization
 and if it gets invalidated it gets destroyed.

 *) New API function: json_simplify(); or some such.  It reorders and
 dedups from user's point of view (but really just kills off the text
 representation)

 *) once the text mode is gone, you get basically the proposed 'hstore'
 behavior.

 *) serialization functions are generally used in contexts that do not
 store anything but get output as query data.  They create *only* the
 text mode.  However, if the resultant json is stored anywhere, the
 text mode is destroyed and replaced with binary variant.  This is both
 a concession to the current behavior and an optimization of
 'serialization-in-query' for which I think the binary mode is pessimal
 performance wise.  so, xxx_to_json serialization functions work
 exactly as they do now which fixes my problem essentially.

 *) if you are unhappy with duplicates in the above, just get use to
 calling  json_simpify() on the serialized json (or deal with in on the
 client side).

 This is all pretty glossy, but maybe there is a way forward...



 It's making my head hurt, to be honest, and it sounds like a recipe for
 years and years of inconsistencies and bugs.

 I don't want to have two types, but I think I'd probably rather have two
 clean types than this. I can't imagine it being remotely acceptable to have
 behaviour depend in whether or not something was ever stored, which is what
 this looks like.

Well, maybe so.  My main gripe with the 'two types' solutions is that:
1) current type is already in core (that is, not an extension). In
hindsight, I think this was a huge mistake.
2) current type has grabbed the 'json' type name and the 'json_xxx' API.
3) current type is getting used all over the place

'Two types' means that (AIUI) you can't mess around with the existing
API too much. And the new type (due out in 2016?) will be something of
a second citizen.  The ramifications of dealing with the bifurcation
is what makes *my* head hurt.  Every day the json stuff is getting
more and more widely adopted.  9.4 isn't going to drop until 2014 best
case and it won't be widely deployed in the enterprise until 2015 and
beyond.  So you're going to have a huge code base

Re: [HACKERS] additional json functionality

2013-11-15 Thread Merlin Moncure

On Fri, Nov 15, 2013 at 2:54 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/15/2013 12:25 PM, Merlin Moncure wrote:
 Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

 *) 'json' type internally has a binary as well a text representation.
 The text representation is basically the current type behavior

 snip long detailed explanation of behavior-dependant type

 That's not at all workable.  Users would be completely unable to predict
 or understand the JSON type and how it acts.  That's not just violating
 POLS; that's bashing POLS' head in with a shovel.

All right: make a new type then, and leave the current one alone please.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1941 matches

Mail list logo