Re: [PERFORM] Insert vs Update

2015-07-15 Thread Guillaume Lelarge
Le 15 juil. 2015 11:16 PM, David G. Johnston david.g.johns...@gmail.com
a écrit :

 On Wed, Jul 15, 2015 at 4:53 PM, Michael Nolan htf...@gmail.com wrote:

 On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco robert.difa...@gmail.com
wrote:


 Thanks David, my example was a big simplification, but I appreciate
your guidance. The different event types have differing amounts of related
data. Query speed on this schema is not important, it's really the write
speed that matters. So I was just wondering given the INSERT or UPDATE
approach (with no indexed data being changed) if one is likely to be
substantially faster than the other.


 As I understand how ACID compliance is done, updating a record will
require updating any indexes for that record, even if the index keys are
not changing.  That's because any pending transactions still need to be
able to find the 'old' data, while new transactions need to be able to find
the 'new' data.  And ACID also means an update is essentially a
delete-and-insert.


 ​I might be a bit pedantic here but what you describe is a byproduct of
the specific​ implementation that PostgreSQL uses to affect Consistency
(the C in ACID) as opposed to a forgone outcome in being ACID compliant.

 http://www.postgresql.org/docs/9.4/static/mvcc-intro.html

 I'm out of my comfort zone here but the HOT optimization is designed to
leverage the fact that an update to a row that does not affect indexed
values is able to leave the index alone and instead during index lookup the
index points to the old tuple, notices that there is a chain present, and
walks that chain to find the currently active tuple.


That's true as long as the old and new tuples are stored in the same block.

 In short, if the only index is a PK an update of the row can avoid
touching that index.

 I mentioned that going from NULL to Not NULL may disrupt this but I'm
thinking I may have mis-spoken.

 Also, with separate tables the amount of data to write is going to be
less because you'd have fewer columns on the affected tables.

 While an update is a delete+insert a delete is mostly just a bit-flip
action - at least mid-transaction.  Depending on volume, though, the
periodic impact of vaccuming may want to be taken into consideration.

-- 
Guillaume


Re: [PERFORM] Insert vs Update

2015-07-15 Thread David G. Johnston
On Wed, Jul 15, 2015 at 12:16 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 First off I apologize if this is question has been beaten to death. I've
 looked around for a simple answer and could not find one.

 Given a database that will not have it's PKEY or indices modified, is it
 generally faster to INSERT or UPDATE data. And if there is a performance
 difference is it substantial?

 I have a situation where I can easily do one or the other to the same
 effect. For example, I have a journaling schema with a limited number of
 states for an entry. Currently each state is it's own table so I just
 insert them as they occur. But I could easily have a single entry table
 where the row is updated with column information for states (after the
 entry's initial insertion).

 Not a big deal but since it's so easy for me to take either approach I was
 wondering if one was more efficient (for a large DB) than another.


​There is HOT (heap only tuple?) optimization that can occur if only
non-indexed data is altered.  I do not recall the specifics.

Dave
​


Re: [PERFORM] Insert vs Update

2015-07-15 Thread David G. Johnston
On Wednesday, July 15, 2015, Robert DiFalco robert.difa...@gmail.com
wrote:

 First off I apologize if this is question has been beaten to death. I've
 looked around for a simple answer and could not find one.

 Given a database that will not have it's PKEY or indices modified, is it
 generally faster to INSERT or UPDATE data. And if there is a performance
 difference is it substantial?


This seems odd.  If you have an option to update but choose to insert what
becomes of the other record?


Re: [PERFORM] Insert vs Update

2015-07-15 Thread Michael Nolan
I

On Wed, Jul 15, 2015 at 12:16 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 First off I apologize if this is question has been beaten to death. I've
 looked around for a simple answer and could not find one.

 Given a database that will not have it's PKEY or indices modified, is it
 generally faster to INSERT or UPDATE data. And if there is a performance
 difference is it substantial?

 I have a situation where I can easily do one or the other to the same
 effect. For example, I have a journaling schema with a limited number of
 states for an entry. Currently each state is it's own table so I just
 insert them as they occur. But I could easily have a single entry table
 where the row is updated with column information for states (after the
 entry's initial insertion).

 Not a big deal but since it's so easy for me to take either approach I was
 wondering if one was more efficient (for a large DB) than another.

 Thanks


If you think of an update as a delete-insert operation (glossing over the
fine points of what has to be done for ACID), it seems pretty clear that an
update involves more work than an insert.  Measuring that impact on
performance is probably a bit more challenging, because it's going to be
dependent on the specific table and the contents of the row, among other
things.
--
Mike Nolan
no...@tssi.com


Re: [PERFORM] Insert vs Update

2015-07-15 Thread Robert DiFalco
On Wed, Jul 15, 2015 at 11:15 AM, David G. Johnston 
david.g.johns...@gmail.com wrote:


 ​Yes, you are trying to choose between a bunch of one-to-one (optional)
 relationships versus adding additional columns to a table all of which can
 be null.

 ​I'd argue that neither option is normal (in the DB normalization sense).

 CREATE TABLE meal (meal_id bigserial)
 CREATE TABLE meal_even​t_type (meal_event_id bigserial)
 CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at
 timestamptz)

 ​So now the decision is one of how to denormalize.  materialzed views and
 two ways to do so.  The specific solution would depend in part on the final
 application queries that you need to write.

 If you do want to model the de-normalized form, which I would likely be
 tempted to do given a fixed set of events that do not require additional
 related attributes, would be to place the few event timestamps on the main
 table and UPDATE them to non-null.

 In the normal form you will likely find partial indexes to be quite useful.

 David J.
 ​


Thanks David, my example was a big simplification, but I appreciate your
guidance. The different event types have differing amounts of related data.
Query speed on this schema is not important, it's really the write speed
that matters. So I was just wondering given the INSERT or UPDATE approach
(with no indexed data being changed) if one is likely to be substantially
faster than the other.


Re: [PERFORM] Insert vs Update

2015-07-15 Thread David G. Johnston
On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco robert.difa...@gmail.com
wrote:

 The different event types have differing amounts of related data.


​On this basis alone I would select the multiple-table version as my
baseline and only consider something different if the performance of this
was insufficient and I could prove that an alternative arrangement was more
performant.

A single optional date with meta-data embedded in the column name​

​is usually workable but if you then have a bunch of other columns with
name like:

preparation_date, preparation_col1, preparation_col2, consumed_col1,
consumed_col2, consumed_date
​

​I would find that to be undesirable.

You may be able to put Table Inheritance to good use here...

I do not know (but doubt) if HOT optimization works when going from NULL to
non-NULL since the former is stored in a bitmap while the later occupies
normal relation space and thus the update would likely end up writing an
entirely new​ record upon each event category recording.

David J.


Re: [PERFORM] Insert vs Update

2015-07-15 Thread Robert DiFalco
On Wed, Jul 15, 2015 at 12:32 PM, David G. Johnston 
david.g.johns...@gmail.com wrote:


 You may be able to put Table Inheritance to good use here...

 I do not know (but doubt) if HOT optimization works when going from NULL
 to non-NULL since the former is stored in a bitmap while the later occupies
 normal relation space and thus the update would likely end up writing an
 entirely new​ record upon each event category recording.

 David J.



Thanks!


Re: [PERFORM] Insert vs Update

2015-07-15 Thread Robert DiFalco
On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston 
david.g.johns...@gmail.com wrote:

 On Wednesday, July 15, 2015, Robert DiFalco robert.difa...@gmail.com
 wrote:

 First off I apologize if this is question has been beaten to death. I've
 looked around for a simple answer and could not find one.

 Given a database that will not have it's PKEY or indices modified, is it
 generally faster to INSERT or UPDATE data. And if there is a performance
 difference is it substantial?


 This seems odd.  If you have an option to update but choose to insert what
 becomes of the other record?



Consider the two pseudo-schemas, I'm just making this up for example
purposes:

SCHEMA A
=
meal(id SEQUENCE,user_id, started DEFAULT NOW())
meal_prepared(ref_meal_id, prepared DEFAULT NOW())
meal_abandoned(ref_meal_id, abandoned ...)
meal_consumed(ref_meal_id, consumed ...)
etc.

Then in response to different meal events you always have an insert.

aMealId = INSERT INTO meal(user_id) VALUES (aUserId);

When preparation starts:

INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);

And so on for each event.

Compare that to this:

SCHEMA B
=
meal_event(id, started, prepared, abandoned, consumed, ...)

The start of the meal is an INSERT:

aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());

When preparation starts:

UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;

And so on.

Basically the same data, in one case you always do inserts and add new
tables for new events. In the other case you only insert once and then
update for each state, then you add columns if you have new states.

As I said this is just an example. But in SCHEMA A you have only inserts,
lots of tables and in SCHEMA B you have a lot of updates and a lot of
possibly NULL columns if certain events don't occur.

Is that more clear?

R.


Re: [PERFORM] Insert vs Update

2015-07-15 Thread David G. Johnston
On Wed, Jul 15, 2015 at 1:56 PM, Robert DiFalco robert.difa...@gmail.com
wrote:



 On Wed, Jul 15, 2015 at 10:33 AM, David G. Johnston 
 david.g.johns...@gmail.com wrote:

 On Wednesday, July 15, 2015, Robert DiFalco robert.difa...@gmail.com
 wrote:

 First off I apologize if this is question has been beaten to death. I've
 looked around for a simple answer and could not find one.

 Given a database that will not have it's PKEY or indices modified, is it
 generally faster to INSERT or UPDATE data. And if there is a performance
 difference is it substantial?


 This seems odd.  If you have an option to update but choose to insert
 what becomes of the other record?



 Consider the two pseudo-schemas, I'm just making this up for example
 purposes:

 SCHEMA A
 =
 meal(id SEQUENCE,user_id, started DEFAULT NOW())
 meal_prepared(ref_meal_id, prepared DEFAULT NOW())
 meal_abandoned(ref_meal_id, abandoned ...)
 meal_consumed(ref_meal_id, consumed ...)
 etc.

 Then in response to different meal events you always have an insert.

 aMealId = INSERT INTO meal(user_id) VALUES (aUserId);

 When preparation starts:

 INSERT INTO meal_prepared(ref_meal_id) VALUES (aMealId);

 And so on for each event.

 Compare that to this:

 SCHEMA B
 =
 meal_event(id, started, prepared, abandoned, consumed, ...)

 The start of the meal is an INSERT:

 aMealId = INSERT INTO meal_event(user_id, started) VALUES (aUserId, NOW());

 When preparation starts:

 UPDATE meal_event SET prepared = NOW() WHERE id = aMealId;

 And so on.

 Basically the same data, in one case you always do inserts and add new
 tables for new events. In the other case you only insert once and then
 update for each state, then you add columns if you have new states.

 As I said this is just an example. But in SCHEMA A you have only inserts,
 lots of tables and in SCHEMA B you have a lot of updates and a lot of
 possibly NULL columns if certain events don't occur.

 Is that more clear?


​Yes, you are trying to choose between a bunch of one-to-one (optional)
relationships versus adding additional columns to a table all of which can
be null.

​I'd argue that neither option is normal (in the DB normalization sense).

CREATE TABLE meal (meal_id bigserial)
CREATE TABLE meal_even​t_type (meal_event_id bigserial)
CREATE TABLE meal_event (meal_id bigint, meal_event_id bigint, occurred_at
timestamptz)

​So now the decision is one of how to denormalize.  materialzed views and
two ways to do so.  The specific solution would depend in part on the final
application queries that you need to write.

If you do want to model the de-normalized form, which I would likely be
tempted to do given a fixed set of events that do not require additional
related attributes, would be to place the few event timestamps on the main
table and UPDATE them to non-null.

In the normal form you will likely find partial indexes to be quite useful.

David J.
​


Re: [PERFORM] Insert vs Update

2015-07-15 Thread David G. Johnston
On Wed, Jul 15, 2015 at 4:53 PM, Michael Nolan htf...@gmail.com wrote:

 On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco robert.difa...@gmail.com
 wrote:


 Thanks David, my example was a big simplification, but I appreciate your
 guidance. The different event types have differing amounts of related data.
 Query speed on this schema is not important, it's really the write speed
 that matters. So I was just wondering given the INSERT or UPDATE approach
 (with no indexed data being changed) if one is likely to be substantially
 faster than the other.


 As I understand how ACID compliance is done, updating a record will
 require updating any indexes for that record, even if the index keys are
 not changing.  That's because any pending transactions still need to be
 able to find the 'old' data, while new transactions need to be able to find
 the 'new' data.  And ACID also means an update is essentially a
 delete-and-insert.


​I might be a bit pedantic here but what you describe is a byproduct of the
specific​ implementation that PostgreSQL uses to affect Consistency (the C
in ACID) as opposed to a forgone outcome in being ACID compliant.

http://www.postgresql.org/docs/9.4/static/mvcc-intro.html

I'm out of my comfort zone here but the HOT optimization is designed to
leverage the fact that an update to a row that does not affect indexed
values is able to leave the index alone and instead during index lookup the
index points to the old tuple, notices that there is a chain present, and
walks that chain to find the currently active tuple.

In short, if the only index is a PK an update of the row can avoid touching
that index.

I mentioned that going from NULL to Not NULL may disrupt this but I'm
thinking I may have mis-spoken.

Also, with separate tables the amount of data to write is going to be less
because you'd have fewer columns on the affected tables.

While an update is a delete+insert a delete is mostly just a bit-flip
action - at least mid-transaction.  Depending on volume, though, the
periodic impact of vaccuming may want to be taken into consideration.

David J.


Re: [PERFORM] Insert vs Update

2015-07-15 Thread Michael Nolan
On Wed, Jul 15, 2015 at 3:16 PM, Robert DiFalco robert.difa...@gmail.com
wrote:


 Thanks David, my example was a big simplification, but I appreciate your
 guidance. The different event types have differing amounts of related data.
 Query speed on this schema is not important, it's really the write speed
 that matters. So I was just wondering given the INSERT or UPDATE approach
 (with no indexed data being changed) if one is likely to be substantially
 faster than the other.


As I understand how ACID compliance is done, updating a record will require
updating any indexes for that record, even if the index keys are not
changing.  That's because any pending transactions still need to be able to
find the 'old' data, while new transactions need to be able to find the
'new' data.  And ACID also means an update is essentially a
delete-and-insert.
--
Mike Nolan


[PERFORM] Insert vs Update

2015-07-15 Thread Robert DiFalco
First off I apologize if this is question has been beaten to death. I've
looked around for a simple answer and could not find one.

Given a database that will not have it's PKEY or indices modified, is it
generally faster to INSERT or UPDATE data. And if there is a performance
difference is it substantial?

I have a situation where I can easily do one or the other to the same
effect. For example, I have a journaling schema with a limited number of
states for an entry. Currently each state is it's own table so I just
insert them as they occur. But I could easily have a single entry table
where the row is updated with column information for states (after the
entry's initial insertion).

Not a big deal but since it's so easy for me to take either approach I was
wondering if one was more efficient (for a large DB) than another.

Thanks!