On Fri, Jan 13, 2017 at 5:57 AM, Amit Kapila wrote:
> Sure, we can do that way and I agree that it is worth considering.
> Few cases where it can be really costly is when undo chain overflows
> to multiple pages and those pages doesn't exist in cache. I think the
>
On Thu, Jan 12, 2017 at 8:56 PM, Robert Haas wrote:
> On Wed, Jan 11, 2017 at 5:18 AM, Amit Kapila wrote:
>> Moving further, I have thought about consistent reads and different
>> formats for storing undo with pros and cons of each format.
>>
>>
On Wed, Jan 11, 2017 at 5:18 AM, Amit Kapila wrote:
> Moving further, I have thought about consistent reads and different
> formats for storing undo with pros and cons of each format.
>
> Consistent Reads
>
> If the page is modified by any
On Tue, Jan 10, 2017 at 7:28 PM, Amit Kapila wrote:
> On Mon, Jan 9, 2017 at 11:47 PM, Robert Haas wrote:
>>
>> Yes, something like this can be done. You don't really need any new
>> page-level header data, because you can get the XIDs from the
On Mon, Jan 9, 2017 at 11:47 PM, Robert Haas wrote:
> On Mon, Jan 9, 2017 at 7:50 AM, Amit Kapila wrote:
>> One idea could be that we have some fixed number of
>> slots (i think we can make it variable as well, but for simplicity,
>> lets consider
On Mon, Jan 9, 2017 at 7:50 AM, Amit Kapila wrote:
> Okay, I see the point. I think here UNDO pointer can be marked
> invalid either during page-pruning or vacuum.
I explicitly want to avoid that, because it means re-dirtying the
page. The UNDO pointer becomes invalid
On Fri, Jan 6, 2017 at 7:41 PM, Robert Haas wrote:
> On Fri, Jan 6, 2017 at 6:28 AM, Amit Kapila wrote:
>>> Also, I'm thinking the bit could be stored in the line pointer rather
>>> than the tuple, because with this design we don't need
>>>
On Fri, Jan 6, 2017 at 6:28 AM, Amit Kapila wrote:
>> Also, I'm thinking the bit could be stored in the line pointer rather
>> than the tuple, because with this design we don't need
>> LP_UNUSED/LP_NORMAL/LP_REDIRECT/LP_DEAD any more. We could use one
>> bit to indicate
On Thu, Jan 5, 2017 at 9:25 PM, Robert Haas wrote:
> On Wed, Jan 4, 2017 at 6:05 AM, Amit Kapila wrote:
>> Okay, so this optimization can work only after all the active
>> transactions operating on a page are finished. If that is true, in
>> some
On Thu, Jan 5, 2017 at 6:51 AM, Amit Kapila wrote:
> UNDO has to be kept till heap page is marked as all visible. This is
> required to check the visibility of index. Now, I think the page can
> be marked as all visible when we removed corresponding dead entries in
>
On Wed, Jan 4, 2017 at 6:05 AM, Amit Kapila wrote:
> Okay, so this optimization can work only after all the active
> transactions operating on a page are finished. If that is true, in
> some cases such a design can consume a lot of CPU traversing all the
> tuples in a
On Wed, Jan 4, 2017 at 8:35 PM, Dilip Kumar wrote:
> On Wed, Jan 4, 2017 at 4:35 PM, Amit Kapila wrote:
>> In this new system, I
>> think we can't remove undo entries of heap page till we clear
>> corresponding index entries. I think we need to
On Wed, Jan 4, 2017 at 4:35 PM, Amit Kapila wrote:
> In this new system, I
> think we can't remove undo entries of heap page till we clear
> corresponding index entries. I think we need to somehow collect the
> old values from undo corresponding to index and then scan the
On Tue, Dec 6, 2016 at 9:35 PM, Robert Haas wrote:
> On Mon, Dec 5, 2016 at 4:49 AM, Amit Kapila wrote:
>> I can very well understand the reason for not doing so (IIUC, it is
>> about complexity and time to get it right when we are already trying
On Mon, Dec 5, 2016 at 4:49 AM, Amit Kapila wrote:
> I can very well understand the reason for not doing so (IIUC, it is
> about complexity and time to get it right when we are already trying
> to solve one big and complex problem of the system), but saying most
> of the
On Thu, Dec 1, 2016 at 8:55 PM, Robert Haas wrote:
> On Tue, Nov 29, 2016 at 12:21 AM, Amit Kapila wrote:
>
> I see what you're going for, but I'm not sure it's worth it. I mean,
> say you just have one bit per index tuple. If it's set, the heap
On Fri, Dec 2, 2016 at 5:01 AM, Alexander Korotkov
wrote:
> Idea of storing just one visibility bit in index tuple is a subject of
> serious doubts for me.
>
> 1. When definitely-all-visible isn't set then we have to recheck during
> scanning heap, right?
> But our
On Thu, Dec 1, 2016 at 6:25 PM, Robert Haas wrote:
> There's a couple of possible designs here, but there is the
> possibility for extra hops in some cases. But there are things we can
> do to mitigate that.
>
> 1. If each tuple has a definitely-all-visible bit, you can
On Tue, Nov 29, 2016 at 12:21 AM, Amit Kapila wrote:
>> I think we need to avoid putting the visibility information in the
>> index because that will make the index much bigger.
>
> I agree that there will be an increase in index size, but it shouldn't
> be much if we
On Tue, Nov 29, 2016 at 8:21 AM, Amit Kapila
wrote:
> On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas
> wrote:
> > On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila
> wrote:
> >> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas
On Mon, Nov 28, 2016 at 11:01 PM, Robert Haas wrote:
> On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila wrote:
>> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>>> Well, my original email did contain a discussion of the
On Sun, Nov 27, 2016 at 10:44 PM, Amit Kapila wrote:
> On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>> Well, my original email did contain a discussion of the need for
>> delete-marking. I said that you couldn't do in-place updates when
>>
On Sun, Nov 27, 2016 at 09:19:06AM +0530, Amit Kapila wrote:
> At this point, index scan for value 2 will find index tuple of step-1
> (2) and will conclude 2,def as a right tuple, but actually, that is
> wrong as the step-1 (2) index tuple should not be visible to the user.
> Do you also this as
On Mon, Nov 28, 2016 at 4:50 AM, Robert Haas wrote:
>
> Well, my original email did contain a discussion of the need for
> delete-marking. I said that you couldn't do in-place updates when
> indexed columns were modified unless the index AMs had support for
>
On Sun, Nov 27, 2016 at 8:26 PM, Tsunakawa, Takayuki
wrote:
> I see. autovacuum is certainly almost unpredictable, at least for those who
> are not aware of its existence and tuning. Recently, one of our customers
> faced the inability to perform INSERTs
From: Robert Haas [mailto:robertmh...@gmail.com]
> On Thu, Nov 24, 2016 at 2:32 AM, Tsunakawa, Takayuki
> wrote:
> > IMHO, overall, there should be pros and cons of the current approach and
> the new UNDo one (like Oracle?), depending on the workload. Under
>
On Sat, Nov 26, 2016 at 10:49 PM, Amit Kapila wrote:
> On Fri, Nov 25, 2016 at 11:23 PM, Bruce Momjian wrote:
>> On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
>>> I agree up to a point. I think we need to design our own system as
>>>
On Fri, Nov 25, 2016 at 11:23 PM, Bruce Momjian wrote:
> On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
>> I agree up to a point. I think we need to design our own system as
>> well as we can, not just copy what others have done. For example, the
>> design I
On 24 November 2016 at 23:03, Robert Haas wrote:
>> For snapshot isolation Oracle has yet a *third* copy of the data in a
>> space called the "rollback segment(s)".
>
> My understanding is that this isn't correct. I think the rollback
> segments are what they call the
On Thu, Nov 24, 2016 at 12:23:28PM -0500, Robert Haas wrote:
> I agree up to a point. I think we need to design our own system as
> well as we can, not just copy what others have done. For example, the
> design I sketched will work with all of PostgreSQL's existing index
> types. You need to
On Fri, Nov 25, 2016 at 8:03 PM, Amit Kapila wrote:
>>
>
> Another way to do is to write UNDO log for split operation (with exact
> record locations on pages that are split) so that you don't need to
> perform this expensive search operation. I can understand writing
>
On Thu, Nov 24, 2016 at 10:09 PM, Robert Haas wrote:
>
> I don't really understand how a system of this type copes with page
> splits. Suppose there are entries 1000, 2000, ..., 1e6 in the btree.
> I now start two overlapping transactions, one inserting all the
> positive
2016-11-25 1:44 GMT+01:00 Robert Haas :
> On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule
> wrote:
> >> I think that the whole emphasis on whether and to what degree this is
> >> like Oracle is somewhat misplaced. I would look at it a different
>
On Thu, Nov 24, 2016 at 6:20 PM, Pavel Stehule wrote:
>> I think that the whole emphasis on whether and to what degree this is
>> like Oracle is somewhat misplaced. I would look at it a different
>> way. We've talked many times over the years about how PostgreSQL is
>>
> I think that the whole emphasis on whether and to what degree this is
> like Oracle is somewhat misplaced. I would look at it a different
> way. We've talked many times over the years about how PostgreSQL is
> optimized for aborts. Everybody that I've heard comment on that issue
> thinks that
On Thu, Nov 24, 2016 at 11:06 AM, Greg Stark wrote:
> Fwiw, Oracle does not use the undo log for snapshot fetches. It's used
> only for transaction rollback and recovery.
>
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback
On Thu, Nov 24, 2016 at 2:32 AM, Tsunakawa, Takayuki
wrote:
> IMHO, overall, there should be pros and cons of the current approach and the
> new UNDo one (like Oracle?), depending on the workload. Under update-heavy
> workload, the UNDO method may be better.
> FWIW, while this is basically true, the idea of repurposing UNDO to be
> usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
> nothing about MVCC.
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)".
UNDO and
On Thu, Nov 24, 2016 at 04:06:14PM +, Greg Stark wrote:
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)". When you update a row in a
> block you save the whole block in the rollback segment. When you try
> to access a block you
On Wed, Nov 23, 2016 at 5:18 PM, Thomas Munro
wrote:
> On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan wrote:
>> * Our behavior with many duplicates in secondary indexes is pretty bad
>> in general, I suspect.
>
> From the pie-in-the-sky
On 23 November 2016 at 04:28, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote:
>> This basic DO-UNDO-REDO protocol has been well-understood for
>> decades.
>
> FWIW, while this is basically true, the idea of repurposing UNDO to
On Wed, Nov 23, 2016 at 11:35:38PM -0800, Peter Geoghegan wrote:
> On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki
> wrote:
> > IMHO, overall, there should be pros and cons of the current approach and
> > the new UNDo one (like Oracle?), depending on the
On Wed, Nov 23, 2016 at 11:32 PM, Tsunakawa, Takayuki
wrote:
> IMHO, overall, there should be pros and cons of the current approach and the
> new UNDo one (like Oracle?), depending on the workload. Under update-heavy
> workload, the UNDO method may be better.
IMHO, overall, there should be pros and cons of the current approach and the
new UNDo one (like Oracle?), depending on the workload. Under update-heavy
workload, the UNDO method may be better. OTOH, under the mostly-INSERT
workload (like data warehouse?), the current method will be better
On Wed, Nov 23, 2016 at 6:01 PM, Peter Geoghegan wrote:
> * Our behavior with many duplicates in secondary indexes is pretty bad
> in general, I suspect.
>From the pie-in-the-sky department: I believe there are
snapshot-based systems that don't ever have more than one entry for
On Tue, Nov 22, 2016 at 11:18 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all
Robert Haas wrote:
> To implement this in PostgreSQL, I believe we would need support for
> UNDO.
> - Reading a page that has been recently modified gets significantly
> more expensive; it is necessary to read the associated UNDO entries
> and do a bunch of calculation that is significantly more
On Tue, Nov 22, 2016 at 8:45 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> The best thing by far about an alternative design like this is that it
>> performs *consistently*.
>
> Really? I think it just moves the issues somewhere else.
Definitely, yes.
*
On Wed, Nov 23, 2016 at 10:07 AM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 8:18 PM, Tom Lane wrote:
> > Ultimately, I doubt that update-in-place buys much that we don't already
> > have with HOT updates (which postdate this old conversation, btw).
>
On 23/11/16 16:31, Tom Lane wrote:
Robert Haas writes:
[ Let's invent Oracle-style UNDO logs ]
I dunno. I remember being told years ago, by an ex-Oracle engineer,
that he thought our approach was better. I don't recall all the details
of the conversation but I think
Peter Geoghegan writes:
> The best thing by far about an alternative design like this is that it
> performs *consistently*.
Really? I think it just moves the issues somewhere else.
regards, tom lane
--
Sent via pgsql-hackers mailing list
On Tue, Nov 22, 2016 at 8:18 PM, Tom Lane wrote:
> Ultimately, I doubt that update-in-place buys much that we don't already
> have with HOT updates (which postdate this old conversation, btw).
> If you want MVCC semantics, you need to hold both versions of the tuple
>
On Wed, Nov 23, 2016 at 9:48 AM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> Oracle spends a lot of time on this, and it's really cache-inefficient
>>> because the data is spread all
On Tue, Nov 22, 2016 at 7:01 PM, Robert Haas wrote:
> This basic DO-UNDO-REDO protocol has been well-understood for
> decades.
FWIW, while this is basically true, the idea of repurposing UNDO to be
usable for MVCC is definitely an Oracleism. Mohan's ARIES paper says
On Wed, Nov 23, 2016 at 9:32 AM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>>> - Reading a page that has been recently modified gets significantly
>>> more expensive; it is necessary to read the associated UNDO entries
>>> and
Peter Geoghegan writes:
> On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>> Oracle spends a lot of time on this, and it's really cache-inefficient
>> because the data is spread all over. This was what this guy felt in
>> circa 2001; I'd have to think that
On Tue, Nov 22, 2016 at 7:31 PM, Tom Lane wrote:
>> - Reading a page that has been recently modified gets significantly
>> more expensive; it is necessary to read the associated UNDO entries
>> and do a bunch of calculation that is significantly more complex than
>> what is
On Tue, Nov 22, 2016 at 10:41 PM, Peter Geoghegan wrote:
> On Tue, Nov 22, 2016 at 7:39 PM, Robert Haas wrote:
>>> Heikki's been fooling with some ideas that I think have more promise.
>>> I wish he'd get to the point of presenting them publicly rather
On Tue, Nov 22, 2016 at 7:39 PM, Robert Haas wrote:
>> Heikki's been fooling with some ideas that I think have more promise.
>> I wish he'd get to the point of presenting them publicly rather than
>> just over beers at conferences.
>
> That would be good, too!
I was told
On Tue, Nov 22, 2016 at 10:31 PM, Tom Lane wrote:
> Robert Haas writes:
>> [ Let's invent Oracle-style UNDO logs ]
>
> I dunno. I remember being told years ago, by an ex-Oracle engineer,
> that he thought our approach was better. I don't recall all
Robert Haas writes:
> [ Let's invent Oracle-style UNDO logs ]
I dunno. I remember being told years ago, by an ex-Oracle engineer,
that he thought our approach was better. I don't recall all the details
of the conversation but I think his key point was basically this:
>
61 matches
Mail list logo