Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-11-25 Thread Chris Withers

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-11-24 Thread Chris Withers

Hey Tom,

I appreciate you're busy, but did you ever get a chance to look at this?

On 19/09/2016 08:40, Chris Withers wrote:

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the
range just started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be
surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending
type (inclusive/exclusive) would just be sentinel values)

How would I verify your suspicions?


cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-19 Thread Jeff Janes
On Fri, Sep 16, 2016 at 2:01 AM, Chris Withers 
wrote:

> Hi All,
>
> I have quite a few tables that follow a pattern like this:
>
>  Table "public.my_model"
>   Column |   Type| Modifiers
> +---+---
>   period | tsrange   | not null
>   key| character varying | not null
>   value  | integer   |
> Indexes:
>  "my_model_pkey" PRIMARY KEY, btree (period, key)
>  "my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, key
> WITH =)
> Check constraints:
>  "my_model_period_check" CHECK (period <> 'empty'::tsrange)
>

Try swapping the order of the columns in the exclude constraint.  You want
the more selective criterion to appear first in the index/constraint.
Presumably "key with =" is the most selective, especially if many of your
periods are unbounded.

Cheers,

Jeff


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-19 Thread Chris Withers

On 16/09/2016 15:29, Tom Lane wrote:

Chris Withers  writes:

On 16/09/2016 14:54, Igor Neyman wrote:

So, what is the value for "end ts", when the record is inserted (the range just 
started)?



It's open ended, so the period is [start_ts, )


I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.


If that's so, that's a little disappointing...
(I'd have thought the special case end value (open ended) and the ending 
type (inclusive/exclusive) would just be sentinel values)


How would I verify your suspicions?

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or 
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with a 
start ts running on from that.


Of course, the adds are just inserting new rows.

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.


thats expensive, as it has to reindex that row.   and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.


Right, but these rows have no natural primary key. Would it help if I 
just added an auto-incrementing integer key? Would that make a positive 
difference or would it just be a wasted column?



I expect the expensive one is the constraint that ensures no periods
overlap for the given key.I'm not sure how that can be done short of
a full scan for each update/insert.


Indeed, I wonder if making the constraint deferrable might help for the 
bulk case?



it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?


You mean:

PRIMARY KEY, btree (period, key) as opposed to




--
john r pierce, recycling bits in santa cruz




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Tom Lane
Chris Withers  writes:
> On 16/09/2016 14:54, Igor Neyman wrote:
>> So, what is the value for "end ts", when the record is inserted (the range 
>> just started)?

> It's open ended, so the period is [start_ts, )

I've not looked at the GiST range opclass, but I would not be surprised if
having lots of those is pretty destructive to the index's ability to be
selective about && searches.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 14:54, Igor Neyman wrote:


-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce ; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on 
tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period column 
for a row to set its end ts, and then inserting a new row with a start ts running on from 
that.

Of course, the adds are just inserting new rows.

So, what is the value for "end ts", when the record is inserted (the range just 
started)?


It's open ended, so the period is [start_ts, )

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Igor Neyman

-Original Message-
From: pgsql-general-ow...@postgresql.org 
[mailto:pgsql-general-ow...@postgresql.org] On Behalf Of Chris Withers
Sent: Friday, September 16, 2016 6:47 AM
To: John R Pierce ; pgsql-general@postgresql.org
Subject: Re: [GENERAL] performance problems with bulk inserts/updates on 
tsrange with gist-based exclude constrains

On 16/09/2016 10:26, John R Pierce wrote:
> On 9/16/2016 2:23 AM, John R Pierce wrote:
>>
>> wait, what is a tsrange?   the standard textsearch data types in
>> postgres are tsvector and tsquery,
>
> never mind,  I should have known, its a timestamp range.   ...
>
>
> when you do updates, are you changing any of the indexed fields, or 
> just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period column 
for a row to set its end ts, and then inserting a new row with a start ts 
running on from that.

Of course, the adds are just inserting new rows.

cheers,

Chris



So, what is the value for "end ts", when the record is inserted (the range just 
started)?

Regards,
Igor Neyman

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 12:00, John R Pierce wrote:

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?

Yeah, it's a temporal table, so "updates" involve modifying the period
column for a row to set its end ts, and then inserting a new row with
a start ts running on from that.


thats expensive, as it has to reindex that row.   and range indexes are
more expensive than timestamp indexes

modifiyng the primary key is kind of a violation of one of the basic
rules of relational databases as it means the row can't be referenced by
another table.


Right, but these rows have no natural primary key. Would it help if I 
just added an auto-incrementing integer key? Would that make a positive 
difference or would it just be a wasted column?



I expect the expensive one is the constraint that ensures no periods
overlap for the given key.I'm not sure how that can be done short of
a full scan for each update/insert.


Indeed, I wonder if making the constraint deferrable might help for the 
bulk case?



it might actually perform better
if you write the index with the key first as presumably the key is
invariant ?


You mean:

PRIMARY KEY, btree (key1, key2, period)

as opposed to

PRIMARY KEY, btree (period, key)

Interesting, I'd assumed postgres would optimise that under the covers...

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 3:46 AM, Chris Withers wrote:


when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with 
a start ts running on from that. 


thats expensive, as it has to reindex that row.   and range indexes are 
more expensive than timestamp indexes


modifiyng the primary key is kind of a violation of one of the basic 
rules of relational databases as it means the row can't be referenced by 
another table.


I expect the expensive one is the constraint that ensures no periods 
overlap for the given key.I'm not sure how that can be done short of 
a full scan for each update/insert.   it might actually perform better 
if you write the index with the key first as presumably the key is 
invariant ?




--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread Chris Withers

On 16/09/2016 10:26, John R Pierce wrote:

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or
just "value" ?
Yeah, it's a temporal table, so "updates" involve modifying the period 
column for a row to set its end ts, and then inserting a new row with a 
start ts running on from that.


Of course, the adds are just inserting new rows.

cheers,

Chris


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:23 AM, John R Pierce wrote:


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,


never mind,  I should have known, its a timestamp range.   ...


when you do updates, are you changing any of the indexed fields, or just 
"value" ?





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:12 AM, John R Pierce wrote:

  Column |   Type| Modifiers
+---+---
  period | tsrange   | not null 


wait, what is a tsrange?   the standard textsearch data types in 
postgres are tsvector and tsquery,



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] performance problems with bulk inserts/updates on tsrange with gist-based exclude constrains

2016-09-16 Thread John R Pierce

On 9/16/2016 2:01 AM, Chris Withers wrote:

Hi All,

I have quite a few tables that follow a pattern like this:

 Table "public.my_model"
  Column |   Type| Modifiers
+---+---
  period | tsrange   | not null
  key| character varying | not null
  value  | integer   |
Indexes:
 "my_model_pkey" PRIMARY KEY, btree (period, key)
 "my_model_period_key_excl" EXCLUDE USING gist (period WITH &&, 
key WITH =)

Check constraints:
 "my_model_period_check" CHECK (period <> 'empty'::tsrange)

So, a primary key of a period column and one or more other columns 
(usually int or string) and an exclude constraint to prevent overlaps, 
and a check constraint to prevent empty ranges.


However, I'm hitting performance problems on moderate bulk inserts and 
updates, with ~700k rows taking around 13 minutes. Profiling my python 
code suggests that most of the time is being taken by Postgres (9.4 in 
this case...)


What can I do to speed things up? Is there a different type of index I 
can use to achieve the same exclude constraint? Is there something I 
can do to have the index changes only done on the commit of the bulk 
batches? 


if (period,key) is unique, by virtue of being the primary key, then 
whats the point of the exclusion ??


I'm curious, how fast do your insert/updates run if you remove the key 
exclusion and check constraint ?  tsvector operations are a lot more 
complicated than simple matches in indexing





--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general