[GENERAL] Curious planning decision

2017-08-10 Thread Steve Rogerson
I have a query that behaves in an odd way, specifically it does different
things depending on if I use "=" or ">=" :

mydb=# explain analyze select count(*) from ctable c   where  c.date1 >=
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in 
('TEXT4');
   QUERY
PLAN   
-
 Aggregate  (cost=104183.81..104183.82 rows=1 width=0) (actual
time=520.221..520.221 rows=1 loops=1)
   ->  Bitmap Heap Scan on ctable c  (cost=83843.79..104161.45 rows=8947
width=0) (actual time=515.833..520.192 rows=540 loops=1)
 Recheck Cond: ((date1 >= '2017-08-10'::date) AND (col2 = 637) AND
((col3)::text = 'TEXT3'::text))
 Rows Removed by Index Recheck: 4894
 Filter: ((col4)::text <> 'TEXT4'::text)
 Rows Removed by Filter: 360
 Heap Blocks: exact=1159
 ->  BitmapAnd  (cost=83843.79..83843.79 rows=10294 width=0) (actual
time=515.629..515.629 rows=0 loops=1)
   ->  Bitmap Index Scan on ctable_date1  (cost=0.00..22534.71
rows=598419 width=0) (actual time=5.204..5.204 rows=37536 loops=1)
 Index Cond: (date1 >= '2017-08-10'::date)
   ->  Bitmap Index Scan on ctable_col2_col3  (cost=0.00..61304.36
rows=1158379 width=0) (actual time=505.479..505.479 rows=1213850 loops=1)
 Index Cond: ((col2 = 637) AND ((col3)::text = 
'TEXT3'::text))
 Planning time: 0.306 ms
 Execution time: 520.283 ms
(14 rows)

mydb=# explain analyze select count(*) from ctable c   where  c.date1 =
'2017-08-10' and c.col2 = '637' and c.col3 = 'TEXT3' and c.col4 not in 
('TEXT4');
   
QUERY
PLAN


---
 Aggregate  (cost=9929.67..9929.68 rows=1 width=0) (actual time=10.500..10.501
rows=1 loops=1)
   ->  Index Scan using ctable_date1_col3_col4_in on ctable c 
(cost=0.57..9929.05 rows=245 width=0) (actual time=0.062..10.456 rows=540 
loops=1)
 Index Cond: ((date1 = '2017-08-10'::date) AND ((col3)::text =
'TEXT3'::text))
 Filter: (((col4)::text <> 'TEXT4'::text) AND (col2 = 637))
 Rows Removed by Filter: 9059
 Planning time: 0.499 ms
 Execution time: 10.557 ms
(7 rows)

The used indexes are:
Indexes:

"ctable_date1" btree (date1)
"ctable_col2_col3" btree (col2, col3)
"ctable_date1_col2_col3" btree (date1, col2, col3)

ctable has about 60,000,000 rows.

[ Caveat: I sanitized the query and I might not have this character perfect,
though I believe it's ok ]
The date is "today" at the time of writing. The table is analyzed, to pg
should "know" that date1 <= "today" hence there is no data beyond today.

On similar - and more relevant lines the same query with a date range so as
above but "c.date1 >= '2017-08-01' and c.date1 <= '2017-08-10'" uses the
date1_col2_col3 index and takes about 143ms, whereas "c.date1 >= '2017-07-31'
and c.date1 <= '2017-08-10'" uses the date1/col2_col3 indexes takes about 902ms.


Is there a planner option I can tweak that might help?

Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Oddity with time zones.

2017-04-03 Thread Steve Rogerson
# select (now());
  now
---
 2017-04-03 11:57:09.891043+01
(1 row)

sjr_local1db=# select (now() AT TIME ZONE  'UTC');
  timezone

 2017-04-03 10:57:11.714571
(1 row)

sjr_local1db=# select (now() AT TIME ZONE  'UTC') AT TIME ZONE 'UTC';
   timezone
---
 2017-04-03 11:57:14.088515+01
(1 row)



This makes no sense to me.

Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] How well does PostgreSQL 9.6.1 support unicode?

2016-12-21 Thread Steve Rogerson
On 21/12/16 05:24, Tom Lane wrote:
> James Zhou  writes:
>>   - *But their sorting order seems to be undefined. Can anyone comment
>>   the sorting rules?*
> 
> Well, it would depend on lc_collate, which you have not told us, and
> it would also depend on how well your platform's strcoll() function
> implements that collation; but you have not told us what platform this
> is running on.

As I understand it, when you first initialise pg with initdb, it inherits the
collation of the process that runs the initdb.
Having said that see:

https://www.postgresql.org/docs/9.6/static/collation.html

"If the operating system provides support for using multiple locales within a
single program (newlocale and related functions), then when a database cluster
is initialized, initdb populates the system catalog pg_collation with
collations based on all the locales it finds on the operating system at the 
time."

So the pg is capable, in principle at least,  of using any of the locales
available at the time that initdb is run.

> 
> Most of the other behaviors you mention are also partly or wholly
> dependent on which software you use with Postgres and whether you've
> correctly configured that software.  So it's pretty hard to answer
> this usefully with only this much info.
> 

The more recent versions of perl (see http://perldoc.perl.org/perlunicode.htm
- maybe other languages) knows, not only about code points, but also
"graphemes", so in the appropriate context "LATIN CAPITAL LETTER E WITH ACUTE"
can be  considered to be "equal" to "LATIN CAPITAL LETTER E"  together with
"COMBINING ACUTE ACCENT", although they are 1 and 2 unicode characters
respectively so this effects notions of equality as well as collation. This
has implications for pg varchar(N) fields etc.

I would be interest to know what support pg has/will have for graphemes.

Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] setting time zone in a function

2016-04-21 Thread Steve Rogerson
On 21/04/16 14:47, Tom Lane wrote:
> Steve Rogerson  writes:
>> I want the time zone if a function  a bit like ...
>> CREATE OR REPLACE FUNCTION
>>short_name (timestamp with time zone, varchar) RETURNS varchar  AS $$
>> DECLARE
>> ...
>> BEGIN
>> SET LOCAL TIME ZONE $2;
> 
> Nope, that won't work: in general you can't put parameters into
> utility commands.  Use set_config():
>

That works. Thanks.

>   PERFORM set_config('timezone', $2, true);
> 
> Also, I think "SET LOCAL" has transaction duration, not function duration,
> so you're going to have to work harder than this if you want the setting
> to be local to this function.  SET LOCAL/set_config(true) inside an
> exception block might work --- and you may well want an exception block
> anyway, to trap bad timezone names.  Or you could explicitly save and
> restore the previous setting, which is more code but might be faster
> than an exception block.



I wonder what counts as a valid time zone, I wasn't expecting this:

# set timezone = '==2.77';
SET
# select now();
  now
---
 2016-04-18 09:40:52.089375-77
(1 row)

In my context I'm expecting an Olson type designation, "Europe/Madrid", I
guess that's hard to check for. I *think* I can live with the consequences, or
rather let a higher level deal with the problem. I would consider the above
time zone to be invalid.


Steve




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] setting time zone in a function

2016-04-21 Thread Steve Rogerson
I want the time zone if a function  a bit like ...

CREATE OR REPLACE FUNCTION
   short_name (timestamp with time zone, varchar) RETURNS varchar  AS $$
DECLARE
...
BEGIN
SET LOCAL TIME ZONE $2;
SELECT to_char($1, 'TZ')::varchar
...
END $$
LANGUAGE 'plpgsql' VOLATILE;


But as written is doesn't compile complaining about the "naked" $2. If I quote
the $2 it sets the time zone to $2 literally (I didn't know you could do that,
but that's another issue). Escaping the quotes either as \' or '' doesn't 
compile.

Help.


Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres and timezones

2016-01-21 Thread Steve Rogerson
On 20/01/16 19:19, Vik Fearing wrote:
> On 01/20/2016 04:24 PM, Steve Rogerson wrote:
>>> Postgres doesn't store original TZ. It does recalculation to local TZ. If 
>>> you
>>> need original TZ, you have to store it separetely.
>>>
>>
>> I know and that's what I'm trying to deal with. Given I know the origin TZ  -
>> as in Europe/Lisbon I'm trying to determine the short name so I can store it.
> 
> I would recommend against storing the abbreviation.  The abbreviations
> are not globally unique and don't follow daylight savings.  If you want
> to store the original time zone, I would use the full name.
> 
> Something like this might be relative to your interests:
> 
> INSERT INTO tbl (ts, tz)two 
> VALUES ('2016-01-20 00:00', current_setting('TimeZone'));
> 
> This will do the right thing regardless of where the client is (unless
> it's set to "localtime" and then it's useless).
> 

That doesn't work for two reasons.

1. In my application the data comes from several time zones, mostly European
but also Australia, the US, so the "current_setting" is often inapproriate.
2. There are two special times in the year, when the clocks change. The
awkward one is when the clocks go back. For example this year, for me,
the 2016-10-27 01:30 happens twice for my current setting, once as BST and
once as GMT.

We actually store UTC + the offset interval + the short name. The latter being
mostly for reporting purposes.

The issue is that we can't - from postgres - determine the appropriate short
name directly.

I can by other means though, say the DateTime module in perl though anything
that talks to the Olson database on whatever system would do as well.

The only way I can think of doing it is by doing in pg is by the two step (ok
more that two if I wrap in a transaction or reset the TZ) method :

# SET TIME ZONE 'Europe/Lisbon';
# SELECT to_char('2016-07-20 00:00'::timestamp with time zone , 'TZ');
 to_char
-
 WEST
(1 row)

Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres and timezones

2016-01-20 Thread Steve Rogerson
On 20/01/16 15:38, Steve Crawford wrote:
> Is this of any use?
> 
> select * from pg_timezone_names where name = 'Europe/Lisbon';
> name  | abbrev | utc_offset | is_dst
> ---+++
> Europe/Lisbon | WET| 00:00:00   | f
> 

A bit - but what's missing is to know if daylight savings is in effect. So it
could be WET or WEST so we need a date time and an TZ to work it out.


Steve







-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres and timezones

2016-01-20 Thread Steve Rogerson
On 20/01/16 13:27, Pavel Stehule wrote:
> 
>
> 
> Postgres doesn't store original TZ. It does recalculation to local TZ. If you
> need original TZ, you have to store it separetely.
> 

I know and that's what I'm trying to deal with. Given I know the origin TZ  -
as in Europe/Lisbon I'm trying to determine the short name so I can store it.

I guess I'll have to use something other than pg to do it.

Steve




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Postgres and timezones

2016-01-20 Thread Steve Rogerson
On 20/01/16 12:53, Albe Laurenz wrote:
> Steve Rogerson wrote:
>> Hi, this is wrong:
>>
>> # select to_char('2016-01-20 00:00'::timestamp at time zone 'Europe/Lisbon',
>> 'TZ');
>>  to_char
>> -
>>  GMT
>> (1 row)
>>
>>
>> It should be WET, "Western European Time". Is there something I'm doing 
>> wrong?
> 
> That query will always give you your local timezone.
> 
> Here in Austria I get:
> us
> test=> select to_char('2016-01-20 00:00'::timestamp at time zone 
> 'Asia/Yerevan', 'TZ');
> ┌─┐
> │ to_char │
> ├─┤
> │ CET │
> └─┘
> (1 row)
> 
> Yours,
> Laurenz Albe
> 
That seems odd, but never mind. I'll ask the direct qn then given the above is
it possible to determine the short TZ, say WET in my example.

Thinking about it, probably not as I suspect that pg only stores the offset in
seconds(?) from UTC, so once it has parsed "2016- 'Europe/Lisbon'" it has
lost track of the origin TZ and in that case what else could "to_char( ...,
'TZ') mean then other than the current client TZ.

Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Postgres and timezones

2016-01-20 Thread Steve Rogerson
Hi, this is wrong:

# select to_char('2016-01-20 00:00'::timestamp at time zone 'Europe/Lisbon',
'TZ');
 to_char
-
 GMT
(1 row)


It should be WET, "Western European Time". Is there something I'm doing wrong?


Steve


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Unique constraints and indexes.

2016-01-05 Thread Steve Rogerson
On 05/01/16 19:47, Tom Lane wrote:
> Steve Rogerson  writes:
>> Is this a bug? I create a "unique" index, directly but it doesn't add a 
>> unique
>> constraint. Add a unique constraint and it adds the index and the constraint.
> 
> That's operating as designed.  A unique constraint needs an index,
> but not vice versa.


I can see that might be plausible , hence the question but as a "unique index"
imposes as constraint they seem equivalent. What's the functional difference
between the two situations?

Steve




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Unique constraints and indexes.

2016-01-05 Thread Steve Rogerson
Is this a bug? I create a "unique" index, directly but it doesn't add a unique
constraint. Add a unique constraint and it adds the index and the constraint.
(pg version 9.4.5 on fedora 22, but also occurs in other versions).
Functionally I can't see a difference.

mydb=# create table test_table ( f1 bigint, f2 bigint);
CREATE TABLE
mydb=# create unique index test_table_un on test_table (f1, f2);
CREATE INDEX
mydb=# \d test_table
  Table "public.test_table"
 Column |  Type  | Modifiers
++---
 f1 | bigint |
 f2 | bigint |
Indexes:
"test_table_un" UNIQUE, btree (f1, f2)

mydb=# select conindid, contype, conname from pg_constraint where conname like
'test_table%';
 conindid | contype | conname
--+-+-
(0 rows)



-- --
mydb=# drop table test_table;
DROP TABLE
mydb=# create table test_table ( f1 bigint, f2 bigint);
CREATE TABLE
mydb=# alter table test_table add constraint test_table_un unique (f1,f2);
ALTER TABLE
mydb=# \d test_table
  Table "public.test_table"
 Column |  Type  | Modifiers
++---
 f1 | bigint |
 f2 | bigint |
Indexes:
"test_table_un" UNIQUE CONSTRAINT, btree (f1, f2)

mydb=# select conindid, contype, conname from pg_constraint where conname like
'test_table%';
 conindid | contype |conname
--+-+---
  4284073 | u   | test_table_un
(1 row)

mydb=#


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] A question about plans and multi-key pks

2015-05-01 Thread Steve Rogerson
We have a table:

sjr_local1db=# \d cons_refs
   Table "public.cons_refs"
  Column   |  Type  |   Modifiers

---++--
 code  | bigint | not null
...
 ref_type  | character varying(20)  | not null
 ref_text  | character varying(128) | not null
Indexes:
"cons_refs_pk" PRIMARY KEY, btree (code, ref_type)
"cons_refs_ref_type_ref_text_in" btree (ref_type, lower(ref_text::text))

I do a select :

ricodb=> explain analyze select * from cons_refs where ref_type = 'some_ref'
AND cons_code = 12345678;
QUERY PLAN

-
 Index Scan using cons_refs_pk on cons_refs  (cost=0.00..37.89 rows=67
width=68) (actual time=0.020..0.021 rows=1 loops=1)
   Index Cond: ((code = 12345678) AND ((ref_type)::text = 'some_ref'::text))
 Total runtime: 0.036 ms
(3 rows)


Why does the plan come up with 67 rows, when by definition there can be at
most one as we are searching on the pk?

[pg 9.0.7 on RHEL 5.7]





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Composite fields and the query planner

2013-05-09 Thread Steve Rogerson

On 05/05/13 15:06, Tom Lane wrote:

Steve Rogerson  writes:

I'm seeing a problem with the query planner not doing what's expected, and I
think it is because we are using composite fields. Here is a stripped down
example.
create type type1 as ( part1 varchar, part2 varchar);
create table table1 (field1 type1, field2 varchar);
create function get_part1(type1) returns varchar as $$ select ($1).part1 $$
language sql;
create index i1 on table1 (get_part1(field1));
create index i2 on table1 (field2);

I tested this example in HEAD and 9.0.x and didn't see any particular
problem with rowcount estimates for the get_part1() expression.  You
do have to have the i1 index in place when the table is analyzed, else
ANALYZE won't collect any stats about the expression.

regards, tom lane
In the end it turned out to be another issue.  As best as I can see, what 
happened was that if the value of field2 was not in the common values list, 
then it would  use  the two indexes, otherwise it would use one index and 
filter by field2. It takes about 10 times longer to get the two list and "AND" 
them than doing one and filtering.  I think we've found a work around, using a 
combined index.


Having said all that I did discover an unrelated problem with composite 
fields. Basically it doesn't seem to consult the pg_stats data to decide if to 
do a seq scan or an index scan. Always saying that it should do an index 
scan.  I'm not quite sure how it could compare the value that is indexed with 
the pg_stats data, so I guess it either decides it's all less than or all 
greater than the value you are checking for.



Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Composite fields and the query planner

2013-05-07 Thread Steve Rogerson

On 05/05/13 15:06, Tom Lane wrote:

Steve Rogerson  writes:

I'm seeing a problem with the query planner not doing what's expected, and I
think it is because we are using composite fields. Here is a stripped down
example.


I tested this example in HEAD and 9.0.x and didn't see any particular
problem with rowcount estimates for the get_part1() expression.  You
do have to have the i1 index in place when the table is analyzed, else
ANALYZE won't collect any stats about the expression.

regards, tom lane


I should have said I am using 9.0.7 - also I have "ANALYZEd" all the relevant 
tables. Having said that I am having problems re-creating a rich enough example.


Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Composite fields and the query planner

2013-05-05 Thread Steve Rogerson
I'm seeing a problem with the query planner not doing what's expected, and I 
think it is because we are using composite fields. Here is a stripped down 
example.


create type type1 as ( part1 varchar, part2 varchar);
create table table1 (field1 type1, field2 varchar);
create function get_part1(type1) returns varchar as $$ select ($1).part1 $$ 
language sql;

create index i1 on table1 (get_part1(field1));
create index i2 on table1 (field2);


If we do

select * from table1 where get_part1(field1) > 'val';

it correctly uses index i1, but if I do

select * from table1 where get_part1(field1) > 'val1' and field2 = 'val2';

It - essentially - gets the hits from both indexes and then "ands" them 
together whatever the data distribution. In the case I am  interested in the 
data is actually a timestamp, and I am typically looking from things that have 
happened in the last little while. Typically I might get about 100 hits out of 
about 20,000,000 records. "field2" is column that could return up to 10% of 
the data. If we split field1 into two normal fields   and index the first part 
 naturally, the query planner correctly returns the recent ones and filters 
them out according to field2.


Naively, it looks to me that issue is that pg_stats table is holding data 
about the columns ( it seems even if the are not indexed )  and not storing 
information about the values actually stored in the index. As a result the 
query planner has no information about the data distribution so can't do it's job.


For many reason's we don't want to have to split up the data into separate 
parts, unless we really really have to.



Steve


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general