Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Amit Kapila
On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

>
>
> On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila 
> wrote:
>>
>>
>> I think Oracle just copies the changed part of old row to rollback
>> segment.
>> Also in Redo logs, it just writes the changed column value (both old and
>> new).  So for the case we are discussing in this thread (one changed
>> column out of 200 columns), Oracle will just write the old value of that
>> column in Redo and then in rollback segment, and write the new value
>> in Redo and then do the in-place update in heap row.
>>
>>
> In that case, readers would pay the penalty for constructing the row.
>

Readers that have snapshot older than update-transaction needs to
pay such cost, otherwise all newer transactions can directly read from
page.  Also not all old-transaction readers have to pay any such cost.

Not only that, such a design has an advantage that the bloat due to
older data won't be there.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Ashutosh Bapat
On Mon, Oct 26, 2015 at 4:09 PM, Amit Kapila 
wrote:

> On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <
> ashutosh.ba...@enterprisedb.com> wrote:
>
>>
>>
>> On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila 
>> wrote:
>>>
>>>
>>> I think Oracle just copies the changed part of old row to rollback
>>> segment.
>>> Also in Redo logs, it just writes the changed column value (both old and
>>> new).  So for the case we are discussing in this thread (one changed
>>> column out of 200 columns), Oracle will just write the old value of that
>>> column in Redo and then in rollback segment, and write the new value
>>> in Redo and then do the in-place update in heap row.
>>>
>>>
>> In that case, readers would pay the penalty for constructing the row.
>>
>
> Readers that have snapshot older than update-transaction needs to
> pay such cost, otherwise all newer transactions can directly read from
> page.  Also not all old-transaction readers have to pay any such cost.
>
>
Can you please explain your last sentence?


>
> Not only that, such a design has an advantage that the bloat due to
> older data won't be there.
>

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Ashutosh Bapat
On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila 
wrote:

> On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов 
> wrote:
> >
> >
> > - Цитат от Kisung Kim (ks...@bitnine.co.kr), на 26.10.2015 в 04:36
> -
> >
> > > However, what I want to know is about the update performance difference
> > > between PG and Oracle if there any.
> > > The case I described is for exaggerating the difference between PG and
> > > Oracle.
> > >
> > > I want to explain for our clients that PG's update performance is
> > > comparable to Oracle's.
> > >
> >
> > Oracle is also using MVCC but copies the old row in the rollback segment
> and
> > rewrites the values in-place.
>
> I think Oracle just copies the changed part of old row to rollback segment.
> Also in Redo logs, it just writes the changed column value (both old and
> new).  So for the case we are discussing in this thread (one changed
> column out of 200 columns), Oracle will just write the old value of that
> column in Redo and then in rollback segment, and write the new value
> in Redo and then do the in-place update in heap row.
>
>
In that case, readers would pay the penalty for constructing the row.
PostgreSQL will not incur the cost of reconstruction. Either writer or
reader is bound to pay penalty. If the user's load is reader heavy it makes
sense to use something like PG, else something like what is described above.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Jim Nasby

On 10/25/15 9:36 PM, Kisung Kim wrote:

I want to explain for our clients that PG's update performance is
comparable to Oracle's.


There's really only 2 ways you can answer that. You can either handwave 
the question away ("Yes, update performance is comparable."), or you 
have to do actual benchmarking. Trying to answer this from a theoretical 
standpoint is completely useless because there's an absurd number of 
things that will affect this:


Number of columns
Data types
Size of overall transaction
Percent of transactions that roll back
Size of table
What % of table is updated every day
Underlying hardware
What OS the database is running on
What filesystem the database is running on

... and that's just off the top of my head.

Or to look at it another way, I guarantee you can create a scenario 
where Postgres beats the pants off Oracle, *or vice versa*. So you have 
to either go with an answer along the lines of "For most workloads the 
performance of both databases is similar." or you have to benchmark the 
actual application in question. Most performance issues you find will 
probably be correctable with a moderate amount of work.


To me, the real tradeoff between Postgres and Oracle (or any other 
commercial database) is whether you'd rather spend money on expert 
employees or software contracts.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Gavin Flower

On 27/10/15 11:37, Jim Nasby wrote:

On 10/25/15 9:36 PM, Kisung Kim wrote:

I want to explain for our clients that PG's update performance is
comparable to Oracle's.


There's really only 2 ways you can answer that. You can either 
handwave the question away ("Yes, update performance is comparable."), 
or you have to do actual benchmarking. Trying to answer this from a 
theoretical standpoint is completely useless because there's an absurd 
number of things that will affect this:


Number of columns
Data types
Size of overall transaction
Percent of transactions that roll back
Size of table
What % of table is updated every day
Underlying hardware
What OS the database is running on
What filesystem the database is running on

... and that's just off the top of my head.

Or to look at it another way, I guarantee you can create a scenario 
where Postgres beats the pants off Oracle, *or vice versa*. So you 
have to either go with an answer along the lines of "For most 
workloads the performance of both databases is similar." or you have 
to benchmark the actual application in question. Most performance 
issues you find will probably be correctable with a moderate amount of 
work.


To me, the real tradeoff between Postgres and Oracle (or any other 
commercial database) is whether you'd rather spend money on expert 
employees or software contracts.


And of course, on how you alter the tuning parameters in 
postgresql.conf, like temp_buffers and work_mem.  The 'correct' values 
will depend on your workload and amount of RAM etc.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread Amit Kapila
On Mon, Oct 26, 2015 at 4:31 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:
>
> On Mon, Oct 26, 2015 at 4:09 PM, Amit Kapila 
> wrote:
>
>> On Mon, Oct 26, 2015 at 12:07 PM, Ashutosh Bapat <
>> ashutosh.ba...@enterprisedb.com> wrote:
>>
>>>
>>>
>>> On Mon, Oct 26, 2015 at 10:19 AM, Amit Kapila 
>>> wrote:


 I think Oracle just copies the changed part of old row to rollback
 segment.
 Also in Redo logs, it just writes the changed column value (both old and
 new).  So for the case we are discussing in this thread (one changed
 column out of 200 columns), Oracle will just write the old value of that
 column in Redo and then in rollback segment, and write the new value
 in Redo and then do the in-place update in heap row.


>>> In that case, readers would pay the penalty for constructing the row.
>>>
>>
>> Readers that have snapshot older than update-transaction needs to
>> pay such cost, otherwise all newer transactions can directly read from
>> page.  Also not all old-transaction readers have to pay any such cost.
>>
>>
> Can you please explain your last sentence?
>
>
At broad level, it works this way: when the transaction starts, it
makes a note of the current SCN and then while reading a table or
an index page, it uses the SCN number to determine if the page contains
the effects of transactions that should not be visible to the current
transaction. If the page is found to contain the effects of invisible
transactions, then it recreates an older version of the page by undoing
the effects of each such transaction. Now once this older version
of page is recreated, this can be used to fetch the rows for transactions
older than the current transaction which has updated the page and
newer than the transaction which has recreated the page.  For details
you can read the blog [1] written by sometime back.


[1] -
http://amitkapila16.blogspot.in/2015/03/different-approaches-for-mvcc-used-in.html



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] questions about PG update performance

2015-10-26 Thread José Luis Tallón

On 10/26/2015 05:49 AM, Amit Kapila wrote:
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов > wrote:

>
>
> - Цитат от Kisung Kim (ks...@bitnine.co.kr 
), на 26.10.2015 в 04:36 -

>
> > However, what I want to know is about the update performance 
difference

> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback 
segment and

> rewrites the values in-place.

I think Oracle just copies the changed part of old row to rollback 
segment.

Also in Redo logs, it just writes the changed column value (both old and
new).  So for the case we are discussing in this thread (one changed
column out of 200 columns), Oracle will just write the old value of that
column in Redo and then in rollback segment, and write the new value
in Redo and then do the in-place update in heap row.


IMV, where Oracle is heavily optimized for "most DML transactions will 
commit successfully" and "no long-running transactions shall ever 
exists" / "not many transactions will have to read previous 
snapshots"(based on PI), Postgres does not actually make any such 
assumptions.


Hence, for long running transactions / massive concurrency-many 
clients reading and writing older snapshots, Postgres will be faster 
(less work to do compared to re-constructing rows based on PIs)


Plus, for updates where the size of the NEW row is bigger than the 
previous one (think adding text) the overhead is actually greater for 
Oracle (plus, they don't compress variable length values by default / no 
TOAST )... so here Postgres would be faster.
For text-intensive workloads, Postgres is measurably faster than 
Oracle mostly due to this fact (plus much more efficient in it use of 
storage/RAM...)




In PostgreSQL, the whole new row is written in heap and diff tuple 
(difference

of old and new tuple; this optimization is done in 9.4, commit a3115f0d)
in WAL.  I think we can try to optimize and use the same technique for
heap as used for WAL to make PostgreSQL more efficient for such 
scenario's,

however as of today, my guess is that PostgreSQL's update would be lagging
in this area.


Yup. But see above for a potential reason where it might not be that 
bad, especially after the optimization you mention.



> It is still 2 writes as in Postgres.

The difference is in the amount of data written per write.


Yes, but compressed (for varlena-based datum/data), batched 
(group-commit) so mostly sequential, and non-duplicated (WAL vs REDO+UNDO).


So I guess the difference is quite small nowadays, and differences will 
be heavily influenced by actual workload.



Just my 2 (euro-) cents.


/ J.L.



Re: [HACKERS] questions about PG update performance

2015-10-25 Thread Kisung Kim
2015-10-26 11:12 GMT+09:00 Michael Paquier :

>
>
> On Mon, Oct 26, 2015 at 10:52 AM, Kisung Kim  wrote:
>
>> Because of the internal implementation of MVCC in PG
>> the update of a row is actually a insertion of a new version row.
>> So if the size of a row is huge, then it incurs some overhead compare to
>> in-place update strategy.
>>
>
> Yeah, that's how an UPDATE in Postgres for MVCC usage. The xmax of the old
> row is updated, and a new row is inserted with an xmin equal to the
> previous xmax. So if you update tuple fields one by one the cost is going
> to be high.
>
>
>> Let's assume that a table has 200 columns,
>> and a user updates one of the columns of one row in the table.
>> Then PG will rewrite the whole contents of the updated row
>> including the updated columns and not-updated columns.
>>
>
> When a table has a large number of columns, usually I would say that you
> have a normalization problem and such schemas could be split into a smaller
> set of tables, minimizing the UPDATE cost.
>
>
>> I'm not sure about the implementation of Oracle's update.
>> But if the Oracle can overwrite only the updated column,
>> the performance difference between Oracle and PG in that case may be
>> significant.
>>
>> I researched about this issues in mailing list and google.
>> But I've not found anything related to this issues.
>>
>
> What you are looking at here is columnar storage, Alvaro and 2nd Quadrant
> folks have been doing some work in this area recently:
> http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsql
> Also, you may want to have a look at cstore_fdw:
> https://github.com/citusdata/cstore_fdw.
> Regards,
> --
> Michael
>

Thank you for your reply.
I already know about the column store and that it is optimized for the case
I described.

However, what I want to know is about the update performance difference
between PG and Oracle if there any.
The case I described is for exaggerating the difference between PG and
Oracle.

I want to explain for our clients that PG's update performance is
comparable to Oracle's.

Regards,


Re: [HACKERS] questions about PG update performance

2015-10-25 Thread Любен Каравелов

- Цитат от Kisung Kim (ks...@bitnine.co.kr), на 26.10.2015 в 04:36 -

> However, what I want to know is about the update performance difference
> between PG and Oracle if there any.
> The case I described is for exaggerating the difference between PG and
> Oracle.
> 
> I want to explain for our clients that PG's update performance is
> comparable to Oracle's.
> 

Oracle is also using MVCC but copies the old row in the rollback segment and
rewrites the values in-place. It is still 2 writes as in Postgres. The
difference is on roll-back and cleaning the row when it is not needed
anymore.

Regards,

--
Luben Karavelov



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] questions about PG update performance

2015-10-25 Thread Amit Kapila
On Mon, Oct 26, 2015 at 9:03 AM, Любен Каравелов  wrote:
>
>
> - Цитат от Kisung Kim (ks...@bitnine.co.kr), на 26.10.2015 в 04:36
-
>
> > However, what I want to know is about the update performance difference
> > between PG and Oracle if there any.
> > The case I described is for exaggerating the difference between PG and
> > Oracle.
> >
> > I want to explain for our clients that PG's update performance is
> > comparable to Oracle's.
> >
>
> Oracle is also using MVCC but copies the old row in the rollback segment
and
> rewrites the values in-place.

I think Oracle just copies the changed part of old row to rollback segment.
Also in Redo logs, it just writes the changed column value (both old and
new).  So for the case we are discussing in this thread (one changed
column out of 200 columns), Oracle will just write the old value of that
column in Redo and then in rollback segment, and write the new value
in Redo and then do the in-place update in heap row.

In PostgreSQL, the whole new row is written in heap and diff tuple
(difference
of old and new tuple; this optimization is done in 9.4, commit a3115f0d)
in WAL.  I think we can try to optimize and use the same technique for
heap as used for WAL to make PostgreSQL more efficient for such scenario's,
however as of today, my guess is that PostgreSQL's update would be lagging
in this area.

> It is still 2 writes as in Postgres.

The difference is in the amount of data written per write.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


[HACKERS] questions about PG update performance

2015-10-25 Thread Kisung Kim
Dear,

I have a question about update performance of PG.

Because of the internal implementation of MVCC in PG
the update of a row is actually a insertion of a new version row.
So if the size of a row is huge, then it incurs some overhead compare to
in-place update strategy.

Let's assume that a table has 200 columns,
and a user updates one of the columns of one row in the table.
Then PG will rewrite the whole contents of the updated row
including the updated columns and not-updated columns.

I'm not sure about the implementation of Oracle's update.
But if the Oracle can overwrite only the updated column,
the performance difference between Oracle and PG in that case may be
significant.

I researched about this issues in mailing list and google.
But I've not found anything related to this issues.

Thank you.

Kisung Kim.






(C)Bitnine, Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.co.kr
Office phone : 070-4800-3321
Mobile phone : 010-7136-0834
Fax : 02-568-1332


Re: [HACKERS] questions about PG update performance

2015-10-25 Thread Michael Paquier
On Mon, Oct 26, 2015 at 10:52 AM, Kisung Kim  wrote:

> Because of the internal implementation of MVCC in PG
> the update of a row is actually a insertion of a new version row.
> So if the size of a row is huge, then it incurs some overhead compare to
> in-place update strategy.
>

Yeah, that's how an UPDATE in Postgres for MVCC usage. The xmax of the old
row is updated, and a new row is inserted with an xmin equal to the
previous xmax. So if you update tuple fields one by one the cost is going
to be high.


> Let's assume that a table has 200 columns,
> and a user updates one of the columns of one row in the table.
> Then PG will rewrite the whole contents of the updated row
> including the updated columns and not-updated columns.
>

When a table has a large number of columns, usually I would say that you
have a normalization problem and such schemas could be split into a smaller
set of tables, minimizing the UPDATE cost.


> I'm not sure about the implementation of Oracle's update.
> But if the Oracle can overwrite only the updated column,
> the performance difference between Oracle and PG in that case may be
> significant.
>
> I researched about this issues in mailing list and google.
> But I've not found anything related to this issues.
>

What you are looking at here is columnar storage, Alvaro and 2nd Quadrant
folks have been doing some work in this area recently:
http://www.postgresql.org/message-id/20150831225328.GM2912@alvherre.pgsql
Also, you may want to have a look at cstore_fdw:
https://github.com/citusdata/cstore_fdw.
Regards,
-- 
Michael