On Fri, Jan 13, 2017 at 5:57 AM, Amit Kapila wrote:
> Sure, we can do that way and I agree that it is worth considering.
> Few cases where it can be really costly is when undo chain overflows
> to multiple pages and those pages doesn't exist in cache. I think the
> reason why it is probable is th
On Thu, Jan 12, 2017 at 8:56 PM, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 5:18 AM, Amit Kapila wrote:
>> Moving further, I have thought about consistent reads and different
>> formats for storing undo with pros and cons of each format.
>>
>> Consistent Reads
>>
>>
On Wed, Jan 11, 2017 at 5:18 AM, Amit Kapila wrote:
> Moving further, I have thought about consistent reads and different
> formats for storing undo with pros and cons of each format.
>
> Consistent Reads
>
> If the page is modified by any transaction that is not visib
On Tue, Jan 10, 2017 at 7:28 PM, Amit Kapila wrote:
> On Mon, Jan 9, 2017 at 11:47 PM, Robert Haas wrote:
>>
>> Yes, something like this can be done. You don't really need any new
>> page-level header data, because you can get the XIDs from the TPD
>> entry (or from the page itself if there's on
On Mon, Jan 9, 2017 at 11:47 PM, Robert Haas wrote:
> On Mon, Jan 9, 2017 at 7:50 AM, Amit Kapila wrote:
>> One idea could be that we have some fixed number of
>> slots (i think we can make it variable as well, but for simplicity,
>> lets consider it as fixed) in the page header which will store
On Mon, Jan 9, 2017 at 7:50 AM, Amit Kapila wrote:
> Okay, I see the point. I think here UNDO pointer can be marked
> invalid either during page-pruning or vacuum.
I explicitly want to avoid that, because it means re-dirtying the
page. The UNDO pointer becomes invalid by removing the UNDO log t
On Fri, Jan 6, 2017 at 7:41 PM, Robert Haas wrote:
> On Fri, Jan 6, 2017 at 6:28 AM, Amit Kapila wrote:
>>> Also, I'm thinking the bit could be stored in the line pointer rather
>>> than the tuple, because with this design we don't need
>>> LP_UNUSED/LP_NORMAL/LP_REDIRECT/LP_DEAD any more. We co
On Fri, Jan 6, 2017 at 6:28 AM, Amit Kapila wrote:
>> Also, I'm thinking the bit could be stored in the line pointer rather
>> than the tuple, because with this design we don't need
>> LP_UNUSED/LP_NORMAL/LP_REDIRECT/LP_DEAD any more. We could use one
>> bit to indicate dead or not-dead and the s
On Thu, Jan 5, 2017 at 9:25 PM, Robert Haas wrote:
> On Wed, Jan 4, 2017 at 6:05 AM, Amit Kapila wrote:
>> Okay, so this optimization can work only after all the active
>> transactions operating on a page are finished. If that is true, in
>> some cases such a design can consume a lot of CPU trav
On Thu, Jan 5, 2017 at 6:51 AM, Amit Kapila wrote:
> UNDO has to be kept till heap page is marked as all visible. This is
> required to check the visibility of index. Now, I think the page can
> be marked as all visible when we removed corresponding dead entries in
> heap. I think the main poi
On Wed, Jan 4, 2017 at 6:05 AM, Amit Kapila wrote:
> Okay, so this optimization can work only after all the active
> transactions operating on a page are finished. If that is true, in
> some cases such a design can consume a lot of CPU traversing all the
> tuples in a page for un-setting the bit,
On Wed, Jan 4, 2017 at 8:35 PM, Dilip Kumar wrote:
> On Wed, Jan 4, 2017 at 4:35 PM, Amit Kapila wrote:
>> In this new system, I
>> think we can't remove undo entries of heap page till we clear
>> corresponding index entries. I think we need to somehow collect the
>> old values from undo correspo
On Wed, Jan 4, 2017 at 4:35 PM, Amit Kapila wrote:
> In this new system, I
> think we can't remove undo entries of heap page till we clear
> corresponding index entries. I think we need to somehow collect the
> old values from undo corresponding to index and then scan the index
> remove the index
On Tue, Dec 6, 2016 at 9:35 PM, Robert Haas wrote:
> On Mon, Dec 5, 2016 at 4:49 AM, Amit Kapila wrote:
>> I can very well understand the reason for not doing so (IIUC, it is
>> about complexity and time to get it right when we are already trying
>> to solve one big and complex problem of the sys
On Mon, Dec 5, 2016 at 4:49 AM, Amit Kapila wrote:
> I can very well understand the reason for not doing so (IIUC, it is
> about complexity and time to get it right when we are already trying
> to solve one big and complex problem of the system), but saying most
> of the benefit can be realized by
On Thu, Dec 1, 2016 at 8:55 PM, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 12:21 AM, Amit Kapila wrote:
>
> I see what you're going for, but I'm not sure it's worth it. I mean,
> say you just have one bit per index tuple. If it's set, the heap
> tuple is definitely all-visible; if it's clear,
On Fri, Dec 2, 2016 at 5:01 AM, Alexander Korotkov
wrote:
> Idea of storing just one visibility bit in index tuple is a subject of
> serious doubts for me.
>
> 1. When definitely-all-visible isn't set then we have to recheck during
> scanning heap, right?
> But our current recheck (i.e. rechecking
On Thu, Dec 1, 2016 at 6:25 PM, Robert Haas wrote:
> There's a couple of possible designs here, but there is the
> possibility for extra hops in some cases. But there are things we can
> do to mitigate that.
>
> 1. If each tuple has a definitely-all-visible bit, you can check that
> first; if it
On Tue, Nov 29, 2016 at 12:21 AM, Amit Kapila wrote:
>> I think we need to avoid putting the visibility information in the
>> index because that will make the index much bigger.
>
> I agree that there will be an increase in index size, but it shouldn't
> be much if we have transaction information
On Tue, Nov 29, 2016 at 8:21 AM, Amit Kapila
wrote:
> On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas
> wrote:
> > On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila
> wrote:
> >> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas
> wrote:
> >>> Well, my original email did contain a discussion of the need
On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas wrote:
> On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila wrote:
>> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>>> Well, my original email did contain a discussion of the need for
>>> delete-marking. I said that you couldn't do in-place update
On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila wrote:
> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>> Well, my original email did contain a discussion of the need for
>> delete-marking. I said that you couldn't do in-place updates when
>> indexed columns were modified unless the index AM
On Sun, Nov 27, 2016 at 09:19:06AM +0530, Amit Kapila wrote:
> At this point, index scan for value 2 will find index tuple of step-1
> (2) and will conclude 2,def as a right tuple, but actually, that is
> wrong as the step-1 (2) index tuple should not be visible to the user.
> Do you also this as a
On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>
> Well, my original email did contain a discussion of the need for
> delete-marking. I said that you couldn't do in-place updates when
> indexed columns were modified unless the index AMs had support for
> delete-marking, which is the same poi
On Sun, Nov 27, 2016 at 8:26 PM, Tsunakawa, Takayuki
wrote:
> I see. autovacuum is certainly almost unpredictable, at least for those who
> are not aware of its existence and tuning. Recently, one of our customers
> faced the inability to perform INSERTs because of xid wraparound.
> Their work
From: Robert Haas [mailto:robertmh...@gmail.com]
> On Thu, Nov 24, 2016 at 2:32 AM, Tsunakawa, Takayuki
> wrote:
> > IMHO, overall, there should be pros and cons of the current approach and
> the new UNDo one (like Oracle?), depending on the workload. Under
> update-heavy workload, the UNDO metho
On Sat, Nov 26, 2016 at 10:49 PM, Amit Kapila wrote:
> On Fri, Nov 25, 2016 at 11:23 PM, Bruce Momjian wrote:
>> On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
>>> I agree up to a point. I think we need to design our own system as
>>> well as we can, not just copy what others have
On Fri, Nov 25, 2016 at 11:23 PM, Bruce Momjian wrote:
> On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
>> I agree up to a point. I think we need to design our own system as
>> well as we can, not just copy what others have done. For example, the
>> design I sketched will work with
On 24 November 2016 at 23:03, Robert Haas wrote:
>> For snapshot isolation Oracle has yet a *third* copy of the data in a
>> space called the "rollback segment(s)".
>
> My understanding is that this isn't correct. I think the rollback
> segments are what they call the thing that stores UNDO. See
On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
> I agree up to a point. I think we need to design our own system as
> well as we can, not just copy what others have done. For example, the
> design I sketched will work with all of PostgreSQL's existing index
> types. You need to mod
On Fri, Nov 25, 2016 at 8:03 PM, Amit Kapila wrote:
>>
>
> Another way to do is to write UNDO log for split operation (with exact
> record locations on pages that are split) so that you don't need to
> perform this expensive search operation. I can understand writing
> UNDO for split could be cos
On Thu, Nov 24, 2016 at 10:09 PM, Robert Haas wrote:
>
> I don't really understand how a system of this type copes with page
> splits. Suppose there are entries 1000, 2000, ..., 1e6 in the btree.
> I now start two overlapping transactions, one inserting all the
> positive odd values < 1e6 and the
2016-11-25 1:44 GMT+01:00 Robert Haas :
> On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule
> wrote:
> >> I think that the whole emphasis on whether and to what degree this is
> >> like Oracle is somewhat misplaced. I would look at it a different
> >> way. We've talked many times over the years ab
On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule wrote:
>> I think that the whole emphasis on whether and to what degree this is
>> like Oracle is somewhat misplaced. I would look at it a different
>> way. We've talked many times over the years about how PostgreSQL is
>> optimized for aborts. Eve
> I think that the whole emphasis on whether and to what degree this is
> like Oracle is somewhat misplaced. I would look at it a different
> way. We've talked many times over the years about how PostgreSQL is
> optimized for aborts. Everybody that I've heard comment on that issue
> thinks that
On Thu, Nov 24, 2016 at 11:06 AM, Greg Stark wrote:
> Fwiw, Oracle does not use the undo log for snapshot fetches. It's used
> only for transaction rollback and recovery.
>
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)". When you up
On Thu, Nov 24, 2016 at 2:32 AM, Tsunakawa, Takayuki
wrote:
> IMHO, overall, there should be pros and cons of the current approach and the
> new UNDo one (like Oracle?), depending on the workload. Under update-heavy
> workload, the UNDO method may be better. OTOH, under the mostly-INSERT
> wo
> FWIW, while this is basically true, the idea of repurposing UNDO to be
> usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
> nothing about MVCC.
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)".
UNDO and rollback
On Thu, Nov 24, 2016 at 04:06:14PM +, Greg Stark wrote:
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)". When you update a row in a
> block you save the whole block in the rollback segment. When you try
> to access a block you che
On Wed, Nov 23, 2016 at 5:18 PM, Thomas Munro
wrote:
> On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan wrote:
>> * Our behavior with many duplicates in secondary indexes is pretty bad
>> in general, I suspect.
>
> From the pie-in-the-sky department: I believe there are
> snapshot-based systems
On 23 November 2016 at 04:28, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote:
>> This basic DO-UNDO-REDO protocol has been well-understood for
>> decades.
>
> FWIW, while this is basically true, the idea of repurposing UNDO to be
> usable for MVCC is definitely an Ora
On Wed, Nov 23, 2016 at 11:35:38PM -0800, Peter Geoghegan wrote:
> On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki
> wrote:
> > IMHO, overall, there should be pros and cons of the current approach and
> > the new UNDo one (like Oracle?), depending on the workload. Under
> > update-heavy w
On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki
wrote:
> IMHO, overall, there should be pros and cons of the current approach and the
> new UNDo one (like Oracle?), depending on the workload. Under update-heavy
> workload, the UNDO method may be better. OTOH, under the mostly-INSERT
> w
IMHO, overall, there should be pros and cons of the current approach and the
new UNDo one (like Oracle?), depending on the workload. Under update-heavy
workload, the UNDO method may be better. OTOH, under the mostly-INSERT
workload (like data warehouse?), the current method will be better beca
On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan wrote:
> * Our behavior with many duplicates in secondary indexes is pretty bad
> in general, I suspect.
>From the pie-in-the-sky department: I believe there are
snapshot-based systems that don't ever have more than one entry for a
logical row in
On Tue, Nov 22, 2016 at 11:18 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all over. This was what this guy felt in
>>> circa 2001; I'd
Robert Haas wrote:
> To implement this in PostgreSQL, I believe we would need support for
> UNDO.
> - Reading a page that has been recently modified gets significantly
> more expensive; it is necessary to read the associated UNDO entries
> and do a bunch of calculation that is significantly more c
On Tue, Nov 22, 2016 at 8:45 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> The best thing by far about an alternative design like this is that it
>> performs *consistently*.
>
> Really? I think it just moves the issues somewhere else.
Definitely, yes.
* HOT is great and all, but HOT-safety
On Wed, Nov 23, 2016 at 10:07 AM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 8:18 PM, Tom Lane wrote:
> > Ultimately, I doubt that update-in-place buys much that we don't already
> > have with HOT updates (which postdate this old conversation, btw).
> > If you want MVCC semantics, you need
On 23/11/16 16:31, Tom Lane wrote:
Robert Haas writes:
[ Let's invent Oracle-style UNDO logs ]
I dunno. I remember being told years ago, by an ex-Oracle engineer,
that he thought our approach was better. I don't recall all the details
of the conversation but I think his key point was basica
Peter Geoghegan writes:
> The best thing by far about an alternative design like this is that it
> performs *consistently*.
Really? I think it just moves the issues somewhere else.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
On Tue, Nov 22, 2016 at 8:18 PM, Tom Lane wrote:
> Ultimately, I doubt that update-in-place buys much that we don't already
> have with HOT updates (which postdate this old conversation, btw).
> If you want MVCC semantics, you need to hold both versions of the tuple
> *somewhere*. Oracle's soluti
On Wed, Nov 23, 2016 at 9:48 AM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all over. This was what this guy felt in
>>> circa 2001; I'd h
On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote:
> This basic DO-UNDO-REDO protocol has been well-understood for
> decades.
FWIW, while this is basically true, the idea of repurposing UNDO to be
usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
nothing about MVCC.
--
Pete
On Wed, Nov 23, 2016 at 9:32 AM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> - Reading a page that has been recently modified gets significantly
>>> more expensive; it is necessary to read the associated UNDO entries
>>> and do a bunch of calculation that is sign
Peter Geoghegan writes:
> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>> Oracle spends a lot of time on this, and it's really cache-inefficient
>> because the data is spread all over. This was what this guy felt in
>> circa 2001; I'd have to think that the cache unfriendliness problem is
>>
On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>> - Reading a page that has been recently modified gets significantly
>> more expensive; it is necessary to read the associated UNDO entries
>> and do a bunch of calculation that is significantly more complex than
>> what is required today.
Someon
On Tue, Nov 22, 2016 at 10:41 PM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:39 PM, Robert Haas wrote:
>>> Heikki's been fooling with some ideas that I think have more promise.
>>> I wish he'd get to the point of presenting them publicly rather than
>>> just over beers at conferences.
>>
On Tue, Nov 22, 2016 at 7:39 PM, Robert Haas wrote:
>> Heikki's been fooling with some ideas that I think have more promise.
>> I wish he'd get to the point of presenting them publicly rather than
>> just over beers at conferences.
>
> That would be good, too!
I was told that TED was kind of form
On Tue, Nov 22, 2016 at 10:31 PM, Tom Lane wrote:
> Robert Haas writes:
>> [ Let's invent Oracle-style UNDO logs ]
>
> I dunno. I remember being told years ago, by an ex-Oracle engineer,
> that he thought our approach was better. I don't recall all the details
> of the conversation but I think
Robert Haas writes:
> [ Let's invent Oracle-style UNDO logs ]
I dunno. I remember being told years ago, by an ex-Oracle engineer,
that he thought our approach was better. I don't recall all the details
of the conversation but I think his key point was basically this:
> - Reading a page that ha
61 matches
Mail list logo