On Mon, Dec 23, 2013 at 7:49 AM, Robert Haas <robertmh...@gmail.com> wrote: > I don't think this is a project to rush through. We've lived without > MERGE/UPSERT for several years now, and we can live without it for > another release cycle while we try to reach agreement on the way > forward. I can tell that you're convinced you know the right way > forward here, and you may be right, but I don't think you've convinced > everyone else - maybe not even anyone else.
That may be. Attention from reviewers has been in relatively short supply. Not that that isn't always true. > I wouldn't suggest modeling anything you do on the way hash indexes > using heavyweight locks. That is a performance disaster, not to > mention being basically a workaround for the fact that whoever wrote > the code originally didn't bother figuring out any way that splitting > a bucket could be accomplished in a crash-safe manner, even in theory. > If it weren't for that, we'd be using buffer locks there. Having looked at the code for the first time recently, I'd agree that hash indexes are a disaster. A major advantage of The Lehman and Yao Algorithm, as prominently noted in the paper, is that exclusive locks are only acquired on leaf pages to increase concurrency. Since I only propose to extend this to a heavyweight page lock, and still only for an instant, it seems reasonable to assume that the performance will be acceptable for an initial version of this. It's not as if most places will have to pay any heed to this heavyweight lock - index scans and non-upserting inserts are generally unaffected. We can later optimize performance as we measure a need to do so. Early indications are that the performance is reasonable. Holding value locks for more than an instant doesn't make sense. The reason is simple: when upserting, we're tacitly only really looking for violations on one particular unique index. We just lock them all at once because the implementation doesn't know *which* unique index. So in actuality, it's really no different from existing potential-violation handling for unique indexes, except we have to do some extra work in addition to the usual restart from scratch stuff (iff we have multiple unique indexes). > To be honest, I am still not altogether sold on any part of this > feature. I don't like the fact that it violates MVCC - although I > admit that some form of violation is inevitable in any feature in this > area unless we're content to live with many serialization failures, I > don't like the particular way it violates MVCC Discussions around visibility issues have not been very useful. As I've said, I don't like the term "MVCC violation", because it's suggestive of some classical, codified definition of MVCC, a definition that doesn't actually exist anywhere, even in research papers, AFAICT. So while I understand your concerns around the modifications to HeapTupleSatisfiesMVCC(), and while I respect that we need to be mindful of the performance impact, my position is that if that really is what we need to do, we might as well be honest about it, and express intent succinctly and directly. This is a position that is orthogonal to the proposed syntax, even if that is convenient to my patch. It's already been demonstrated that yes, the MVCC violation can be problematic when we call HeapTupleSatisfiesUpdate(), which is a bug that was fixed by making another modest modification to HeapTupleSatisfiesUpdate(). It is notable that that bug would have still occurred had a would-be-HTSMVCC-invisible tuple been passed through any other means. What problem, specifically, do you envisage avoiding by doing it some other way? What other way do you have in mind? We invested huge effort into more granular FK locking when we had a few complaints about it. I wouldn't be surprised if that effort modestly regressed HeapTupleSatisfiesMVCC(). On the other hand, this feature has been in very strong demand for over a decade, and has a far smaller code footprint. I don't want to denigrate the FK locking stuff in any way - it is a fantastic piece of work - but it's important to have a sense of proportion about these things. In order to make visibility work in the way we require, we're almost always just doing additional checking of infomask bits, and the t_infomask variable is probably already in a CPU register (this is a huge simplification, but is essentially true). Like you, I have noted that HeapTupleSatisfiesMVCC() is a fairly hot routine during profiling before, but it's not *that* hot. It's understandable that you raise these points, but from my perspective it's hard to address your concerns without more concrete objections. > I don't like the > syntax (returns rejects? blech) I suppose it isn't ideal in some ways. On the other hand, it is extremely flexible, with many of the same advantages of SQL MERGE. Importantly, it will facilitate merging as part of conflict resolution on multi-master replication systems, which I think is of considerable strategic importance even beyond having a simple upsert. I would like to see us get this into 9.4, and get MERGE into 9.5. > and I don't like the fact that > getting the locking right, or even getting the semantics right, seems > to be so darned hard. I think we're in real danger of building > something that will be too complex, or just too weird, for users to > use, and too complex to maintain as well. Please don't conflate confusion or uncertainty around alternative approaches with confusion or uncertainty around mine - *far* more time has been spent discussing the former. While I respect the instinct that says we ought to be very conservative around changing anything that the btree AM does, I really don't think my design is itself all that complicated. I've been very consistent even in the face of strong criticism. What I have now is essentially the same design as back in early September. After the initial ON DUPLICATE KEY IGNORE patch in August, I soon realized that value locking and row locking could not be sensibly considered in isolation, and over the objections of others pushed ahead with integrating the two. I believe now as I believed then that value locks need to be cheap to release (or it at least needs to be possible), and that it was okay to drop all value locks when we need to deal with a possible conflict/getting an xid shared lock - if those unlocked pages have separate conflicts on our next attempt, the feature is being badly misused (for upserting) or it doesn't matter because we only need one conclusive "No" answer (for IGNOREing, but also for upserting). I have been trying to focus attention of these aspects throughout this discussion. I'm not sure how successful I was here. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers