Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2012-03-04 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Mon, Jan 30, 2012 at 1:39 AM, Jeff Davis pg...@j-davis.com wrote:
 Marking ready for committer, but please apply my comment fixes at your
 discretion.

 Patch with your comment fixes is attached.

Applied with revisions, some cosmetic, some not so much.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2012-01-29 Thread Jeff Davis
On Tue, 2012-01-24 at 16:07 +0400, Alexander Korotkov wrote:
 Hi!
 
 
 New version of patch is attached.

Thank you for the updates. I have a small patch attached.

The only code change I made was very minor: I changed the constants used
in the penalty function because your version used INFINITE_BOUND_PENALTY
when adding an empty range, and that didn't quite make sense to me. If
I'm mistaken you can leave it as-is.

I also attached range-gist-test.sql, which I used for a performance
test. I mix various types of ranges together in a larger table of 1.1M
tuples. And then I create a smaller table that only contains normal
ranges and empty ranges. There are two tests:
  1. Create an index on the big table
  2. Do a range join (using overlaps rather than equals) where the
smaller table is on the outer side of a nested loop join and an index
scan over the larger table on the inner.

The index creation time reduces by a small amount with the patch, from
around 16s without the patch to around 13s with the patch. The query
time, however, dropped from around 26s to around 14s! Almost 2x speedup
with the patch!

Moreover, looking at the loop timing in the explain analyze output, it
goes from about 7..24 ms per loop down to about 1.5..13 ms per loop.
That seems to indicate that the index distribution is better, with more
queries returning quickly.

So, great work Alexander! Very convincing results.

Marking ready for committer, but please apply my comment fixes at your
discretion.

Regards,
Jeff Davis

PS: the test was run on my workstation (Intel(R) Core(TM) i7-2600 CPU @
3.40GHz) with work_mem=512MB, shared_buffers=512MB, and
checkpoint_segments=32. The rest of the settings were default.




range-gist-comments.patch.gz
Description: GNU Zip compressed data

\timing on

drop table big;
drop table small;

create temp table tmp_foo(i int, ir int8range);
insert into tmp_foo select g % 100, 'empty'::int8range from generate_series(1,5) g;
insert into tmp_foo select g % 100, int8range(NULL,NULL) from generate_series(1,1) g;
insert into tmp_foo select g % 100, int8range(NULL,((random()-0.5)*g*10)::int8) from generate_series(1,2) g;
insert into tmp_foo select g % 100, int8range(((random()-0.5)*g*10)::int8,NULL) from generate_series(1,2) g;
insert into tmp_foo select g % 100,
  int8range(
(g*10 + 10*(random()-0.5))::int8,
(g*10 + 10 + 10*(random()-0.5))::int8 )
  from generate_series(1,100) g;

create table big as select * from tmp_foo order by random();
drop table tmp_foo;

create table tmp_foo(i int, ir int8range);
insert into tmp_foo select g*1000 % 100, 'empty'::int8range from generate_series(1,50) g;
insert into tmp_foo select g*1000 % 100,
  int8range(
(g*10*1000 + 10*(random()-0.5))::int8,
(g*10*1000 + 10 + 10*(random()-0.5))::int8 )
  from generate_series(1,1000) g;

create table small as select * from tmp_foo order by random();
drop table tmp_foo;

vacuum;
vacuum;
vacuum;

create index big_idx on big using gist (ir);

analyze;

set enable_bitmapscan=false;

explain analyze select sum(upper(intersection) - lower(intersection))
from (
  select small.ir * big.ir as intersection from small,big
  where small.ir  big.ir and not lower_inf(big.ir) and not upper_inf(big.ir)
) s;

set enable_bitmapscan to default;
set enable_indexscan=false;

explain analyze select sum(upper(intersection) - lower(intersection))
from (
  select small.ir * big.ir as intersection from small,big
  where small.ir  big.ir and not lower_inf(big.ir) and not upper_inf(big.ir)
) s;

set enable_indexscan to default;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2012-01-29 Thread Alexander Korotkov
On Mon, Jan 30, 2012 at 1:39 AM, Jeff Davis pg...@j-davis.com wrote:

 Thank you for the updates. I have a small patch attached.

 The only code change I made was very minor: I changed the constants used
 in the penalty function because your version used INFINITE_BOUND_PENALTY
 when adding an empty range, and that didn't quite make sense to me. If
 I'm mistaken you can leave it as-is.

 I also attached range-gist-test.sql, which I used for a performance
 test. I mix various types of ranges together in a larger table of 1.1M
 tuples. And then I create a smaller table that only contains normal
 ranges and empty ranges. There are two tests:
  1. Create an index on the big table
  2. Do a range join (using overlaps rather than equals) where the
 smaller table is on the outer side of a nested loop join and an index
 scan over the larger table on the inner.

 The index creation time reduces by a small amount with the patch, from
 around 16s without the patch to around 13s with the patch. The query
 time, however, dropped from around 26s to around 14s! Almost 2x speedup
 with the patch!

 Moreover, looking at the loop timing in the explain analyze output, it
 goes from about 7..24 ms per loop down to about 1.5..13 ms per loop.
 That seems to indicate that the index distribution is better, with more
 queries returning quickly.

 So, great work Alexander! Very convincing results.

Great! Thank you for reviewing this patch!


 Marking ready for committer, but please apply my comment fixes at your
 discretion.

Patch with your comment fixes is attached.

-
With best regards,
Alexander Korotkov.


rangetypegist-0.7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2012-01-24 Thread Alexander Korotkov
Hi!

New version of patch is attached.
On Thu, Dec 22, 2011 at 11:46 AM, Jeff Davis pg...@j-davis.com wrote:

 A few comments:

 * In range_gist_picksplit, it would be nice to have a little bit more
 intuitive description of what's going on with the nonEmptyCount and
 nonInfCount numbers. For instance, it appears to depend on the fact that
 a range must either be in nonEmptyCount or in nonInfCount. Also, can you
 describe the reason you're multiplying by two and taking the absolute
 value? It seems to work, but I am missing the intuition behind those
 operations.

total_count - 2*nonEmptyCount = (total_count - nonEmptyCount) -
nonEmptyCount = emptyCount - nonEmptyCount
So, it's really not evident. I've simplified it.



 * The penalty function is fairly hard to read still. At a high level, I
 think we're trying to accomplish a few things (in order from most to
 least important):
  - Keep normal ranges separate.
  - Avoid broadening the class of the original predicate (e.g. turning
 single-infinite into double-infinite).
  - Avoid broadening (as determined by subtype_diff) the original
 predicate.
  - Favor adding ranges to narrower original predicates.

 Do you agree? If so, perhaps we can attack those more directly and it
 might be a little more readable.

 Additionally, the arbitrary numbers might become a problem. Can we
 choose better constants there? They would still be arbitrary when
 compared with real numbers derived from subtype_diff, but maybe we can
 still do better than what's there.

I've changes some comments and add constants for penalty values.


 * Regarding the leftover common entries that can go to either side:
 what is the delta measure trying to accomplish? When I work through
 some examples, it seems to favor putting larger common ranges on the
 left (small delta) and smaller common ranges on the right (smaller
 delta). Why is that good? Or did I misread the code?

 Intuitively, I would think that we'd want to push the ranges with lower
 upper bounds to the left and higher lower bounds to the right -- in
 other words, recurse. Obviously, we'd need to make sure it terminated at
 some point, but splitting the common entries does seem like a smaller
 version of the original problem. Thoughts?

That was a bug. Actually, no abs is needed. Indeed it doesn't affect
result significantly.

-
With best regards,
Alexander Korotkov.


rangetypegist-0.6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-21 Thread Jeff Davis
On Wed, 2011-12-14 at 01:04 +0400, Alexander Korotkov wrote:
 Hi!
 
Thank you! Attached a few changes:

* Change ordinal to normal for clarity (at least to me).
* Some comment cleanup
* Change classes_groups to be an enum of SPLIT_LEFT and SPLIT_RIGHT,
rather than using 1 and 2.
* Changed the bounds_lower and bounds_upper variables into
by_lower and by_upper to indicate that the arrays are distinguished
by sort order. It was confusing me to read it otherwise.

A few comments:

* In range_gist_picksplit, it would be nice to have a little bit more
intuitive description of what's going on with the nonEmptyCount and
nonInfCount numbers. For instance, it appears to depend on the fact that
a range must either be in nonEmptyCount or in nonInfCount. Also, can you
describe the reason you're multiplying by two and taking the absolute
value? It seems to work, but I am missing the intuition behind those
operations.

* The penalty function is fairly hard to read still. At a high level, I
think we're trying to accomplish a few things (in order from most to
least important):
  - Keep normal ranges separate.
  - Avoid broadening the class of the original predicate (e.g. turning
single-infinite into double-infinite).
  - Avoid broadening (as determined by subtype_diff) the original
predicate.
  - Favor adding ranges to narrower original predicates.

Do you agree? If so, perhaps we can attack those more directly and it
might be a little more readable.

Additionally, the arbitrary numbers might become a problem. Can we
choose better constants there? They would still be arbitrary when
compared with real numbers derived from subtype_diff, but maybe we can
still do better than what's there.

* Regarding the leftover common entries that can go to either side:
what is the delta measure trying to accomplish? When I work through
some examples, it seems to favor putting larger common ranges on the
left (small delta) and smaller common ranges on the right (smaller
delta). Why is that good? Or did I misread the code?

Intuitively, I would think that we'd want to push the ranges with lower
upper bounds to the left and higher lower bounds to the right -- in
other words, recurse. Obviously, we'd need to make sure it terminated at
some point, but splitting the common entries does seem like a smaller
version of the original problem. Thoughts?

Thank you for the helpful comments! It took me a while to work through
the logic, but I would have been lost completely without the comments
around the double sorting split.

Regards,
Jeff Davis
*** a/src/backend/utils/adt/rangetypes_gist.c
--- b/src/backend/utils/adt/rangetypes_gist.c
***
*** 39,45 
  	((RangeType *) DatumGetPointer(datumCopy(PointerGetDatum(r), \
  			 false, -1)))
  
! /* Minimum accepted ratio of split */
  #define LIMIT_RATIO 0.3
  
  /* Helper macros to place an entry in the left or right group */
--- 39,49 
  	((RangeType *) DatumGetPointer(datumCopy(PointerGetDatum(r), \
  			 false, -1)))
  
! /*
!  * Minimum accepted ratio of split for items of the same class. If the items
!  * are of different classes, it will separate along those lines regardless of
!  * the ratio.
!  */
  #define LIMIT_RATIO 0.3
  
  /* Helper macros to place an entry in the left or right group */
***
*** 66,72 
   * GiST. Each unique combination of properties is a class. CLS_EMPTY cannot be
   * combined with anything else.
   */
! #define CLS_ORDINAL			0 /* Ordinal ranges (no bits set) */
  #define CLS_LOWER_INF		1 /* Lower bound is infinity */
  #define CLS_UPPER_INF		2 /* Upper bound is infinity */
  #define CLS_CONTAIN_EMPTY	4 /* Contains underlying empty ranges */
--- 70,76 
   * GiST. Each unique combination of properties is a class. CLS_EMPTY cannot be
   * combined with anything else.
   */
! #define CLS_NORMAL			0 /* Normal ranges (no bits set) */
  #define CLS_LOWER_INF		1 /* Lower bound is infinity */
  #define CLS_UPPER_INF		2 /* Upper bound is infinity */
  #define CLS_CONTAIN_EMPTY	4 /* Contains underlying empty ranges */
***
*** 76,81 
--- 80,102 
  			   * of properties. CLS_EMPTY doesn't combine with
  			   * anything else, so it's only 2^3 + 1. */
  
+ /*
+  * Auxiliary structure for picksplit based on single sorting.
+  */
+ typedef struct
+ {
+ 	int	index;
+ 	RangeBound			bound;
+ 	TypeCacheEntry	   *typcache;
+ } PickSplitSortItem;
+ 
+ /* place on left or right side of split? */
+ typedef enum
+ {
+ 	SPLIT_LEFT = 0, /* makes initialization to SPLIT_LEFT easier */
+ 	SPLIT_RIGHT
+ } SplitLR;
+ 
  static RangeType *range_super_union(TypeCacheEntry *typcache, RangeType *r1,
  	RangeType *r2);
  static bool range_gist_consistent_int(FmgrInfo *flinfo,
***
*** 97,103  static int sort_item_cmp(const void *a, const void *b);
  static void range_gist_class_split(TypeCacheEntry *typcache,
     GistEntryVector *entryvec,
     GIST_SPLITVEC *v,
!    

Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-21 Thread Jeff Davis
On Tue, 2011-12-20 at 13:22 +0400, Alexander Korotkov wrote:
 Hi!
 
 
 Studying this question little more I found that current approach of
 range indexing can be dramatically inefficient in some cases. It's not
 because of penalty or split implementation, but because of approach
 itself. Mapping intervals to two-dimensional space produce much better
 results in case of high-overlapping ranges and @, @ operators
 with low selectivity. 
 
Thank you for testing this. I agree that your approach is much better
especially dealing with widely varying range sizes, etc. My approach
really only tackled the simple (and hopefully common) case when the
ranges are about the same size.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-20 Thread Alexander Korotkov
Hi!

Studying this question little more I found that current approach of range
indexing can be dramatically inefficient in some cases. It's not because of
penalty or split implementation, but because of approach itself. Mapping
intervals to two-dimensional space produce much better results in case of
high-overlapping ranges and @, @ operators with low selectivity.

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select
(random()*1)::int as l, (random()*1000 + 1)::int s from
generate_series(1,100) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x @
int4range(5000,5010);
 QUERY PLAN


-
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=37.304..37.310 rows=2 loops=1)
   Recheck Cond: (x @ '[5000,5010)'::int4range)
   Buffers: shared hit=767
   -  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=37.288..37.288 rows=2 loops=1)
 Index Cond: (x @ '[5000,5010)'::int4range)
 Buffers: shared hit=765
 Total runtime: 37.385 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x @
box(point(5000,5000),point(5010,5010));
QUERY PLAN


---
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.197..0.206 rows=2 loops=1)
   Recheck Cond: (x @ '(5010,5010),(5000,5000)'::box)
   Buffers: shared hit=5
   -  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.182..0.182 rows=2 loops=1)
 Index Cond: (x @ '(5010,5010),(5000,5000)'::box)
 Buffers: shared hit=3
 Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @
int4range(5000,5990);
QUERY PLAN


---
 Bitmap Heap Scan on range_test  (cost=40.31..2585.65 rows=1000 width=32)
(actual time=4.578..4.603
rows=5 loops=1)
   Recheck Cond: (x @ '[5000,5990)'::int4range)
   Buffers: shared hit=52
   -  Bitmap Index Scan on range_test_idx  (cost=0.00..40.06 rows=1000
width=0) (actual time=4.561..4.561 rows=5 loops=1)
 Index Cond: (x @ '[5000,5990)'::int4range)
 Buffers: shared hit=47
 Total runtime: 4.669 ms
(7 rows)


test=# explain (analyze, buffers) select * from point_test where x @
box(point('-inf'::float,5990),point(5000,'+inf'::float));
QUERY PLAN


---
 Bitmap Heap Scan on point_test  (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.328..0.353 rows=5 loops=1)
   Recheck Cond: (x @ '(5000,inf),(-inf,5990)'::box)
   Buffers: shared hit=8
   -  Bitmap Index Scan on point_test_idx  (cost=0.00..44.11 rows=1000
width=0) (actual time=0.312..0.312 rows=5 loops=1)
 Index Cond: (x @ '(5000,inf),(-inf,5990)'::box)
 Buffers: shared hit=3
 Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from
here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-16 Thread Greg Smith

On 12/13/2011 04:04 PM, Alexander Korotkov wrote:
On Mon, Dec 12, 2011 at 10:41 PM, Jeff Davis pg...@j-davis.com 
mailto:pg...@j-davis.com wrote:


* There's a lot of code for range_gist_penalty. Rather than having
special cases for all combinations of properties in the new an
original,
is it possible to use something a little simpler? Maybe just start the
penalty at zero, and add something for each property of the predicate
range that must be changed. The penalties added might vary, e.g.,
if the
original range has an infinite lower bound, changing it to have an
infinite upper bound might be a higher penalty.

I belive it's possible to make it simplier. I've coded quite 
intuitively. Probably, we should select some representive datasets in 
order to determine which logic is reasonable by tests.


That seems to be a sticking point; you mentioned before that finding 
larger data sets useful for your purposes was hard.


I'm not sure where you'll find data fitting your needs here, but it 
seems difficult to validate all of what you've done so far without it.  
I'm going to mark this one returned and hope you can dig up something 
useful to nail this down.  You might also describe what it is you're 
looking for better and see if anyone else has a suggestion.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-13 Thread Alexander Korotkov
Hi!

On Mon, Dec 12, 2011 at 10:41 PM, Jeff Davis pg...@j-davis.com wrote:

 Thank you. I have attached a patch that's mostly just cleanup to this
 one.

Thanks a lot for cleanup. Path with applied cleanup is attached.


 Comments:

 * You use the term ordinal range quite a lot, which I haven't heard
 before. Is that a mathematical term, or do you mean something more like
 ordinary?

Actually I meant ordinal range to be finite, non-empty and
non-contain-empty range. It's not mathematical term. Probably there is some
better word for that, but my english is not strong enough :).


 * There's a lot of code for range_gist_penalty. Rather than having
 special cases for all combinations of properties in the new an original,
 is it possible to use something a little simpler? Maybe just start the
 penalty at zero, and add something for each property of the predicate
 range that must be changed. The penalties added might vary, e.g., if the
 original range has an infinite lower bound, changing it to have an
 infinite upper bound might be a higher penalty.

I belive it's possible to make it simplier. I've coded quite intuitively.
Probably, we should select some representive datasets in order to determine
which logic is reasonable by tests.

* It looks like LIMIT_RATIO is not always considered. Should it be?

Yes, it's so. In this part I repeat logic of GiST with NULLs. It makes
NULLs to be separated from non-NULLs even if it's produce worse ratio. I'm
not sure about how it should be. It seems to be tradeoff between having
some excess pages and having slightly worse tree.


 * You defined get/set_range_contain_empty, but didn't use them. I think
 this was a merge error, but I removed them. So now there are no changes
 in rangetypes.c.

Ok, thanks.

--
With best regards,
Alexander Korotkov.


rangetypegist-0.5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-13 Thread Alexander Korotkov
On Sat, Dec 10, 2011 at 6:14 PM, Greg Smith g...@2ndquadrant.com wrote:

 On 12/02/2011 06:48 AM, Alexander Korotkov wrote:

 Rebased with head.


 Could you comment a little more on what changed?  There were a couple of
 areas Tom commented on:

 -General code fixes

 Expensibe usage of Max macro is fixed in 0.5 version of patch.

-pull out and apply the changes related to the RANGE_CONTAIN_EMPTY flag,
 and also remove the  opclass entry

 It's already done by Tom.

-Subdiff issues

 The third one sounded hard to deal with, so presumably nothing there.

As I wrote before, I believe there is some limitation of current GiST
interface. Most likely we're not going to change GiST interface now and
have to do will solution of tradeoff. I think good way to do it is to
select representive datasets and do some tests which will show which logic
is more reasonable. Actually, I need some help with that, because I don't
have enough of datasets.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-12 Thread Jeff Davis
On Fri, 2011-12-02 at 15:48 +0400, Alexander Korotkov wrote:
 Rebased with head.
 
Thank you. I have attached a patch that's mostly just cleanup to this
one.

Comments:

* You use the term ordinal range quite a lot, which I haven't heard
before. Is that a mathematical term, or do you mean something more like
ordinary?

* There's a lot of code for range_gist_penalty. Rather than having
special cases for all combinations of properties in the new an original,
is it possible to use something a little simpler? Maybe just start the
penalty at zero, and add something for each property of the predicate
range that must be changed. The penalties added might vary, e.g., if the
original range has an infinite lower bound, changing it to have an
infinite upper bound might be a higher penalty.

* It looks like LIMIT_RATIO is not always considered. Should it be?

* You defined get/set_range_contain_empty, but didn't use them. I think
this was a merge error, but I removed them. So now there are no changes
in rangetypes.c.

Regards,
Jeff Davis


range_gist_cleanup.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-10 Thread Greg Smith

On 12/02/2011 06:48 AM, Alexander Korotkov wrote:

Rebased with head.


Could you comment a little more on what changed?  There were a couple of 
areas Tom commented on:


-General code fixes
-pull out and apply the changes related to the RANGE_CONTAIN_EMPTY 
flag, and also remove the  opclass entry

-Subdiff issues

The third one sounded hard to deal with, so presumably nothing there.  
I'm not sure if your updated rebase addresses either of those first two 
yet or not, or if it was just fixing bitrot from upstream code changes.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-12-02 Thread Alexander Korotkov
Rebased with head.

--
With best regards,
Alexander Korotkov.


rangetypegist-0.4.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-27 Thread Alexander Korotkov
On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis pg...@j-davis.com wrote:

 There's been some significant change in rangetypes_gist.c, can you
 please rebase this patch?

OK, rebased with head.

--
With best regards,
Alexander Korotkov.


rangetypegist-0.3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-27 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis pg...@j-davis.com wrote:
 There's been some significant change in rangetypes_gist.c, can you
 please rebase this patch?

 OK, rebased with head.

I looked at this patch a bit.  I agree with the aspect of it that says
let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges; I think we really need that in order to make
contained_by searches usable.  However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions.  I see two problems
there:

1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff.  This is not good, because you have
no idea what scale the subtype differences will be expressed on.  The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.

2. It's too large/complicated.  You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size.  I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.

There are a lot of garden-variety coding problems, too, for instance here:

+ *penalty = Max(DatumGetFloat8(FunctionCall2(
+ subtype_diff, orig_lower.val, new_lower.val)), 0.0);

which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument.  But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.

Earlier in the thread you wrote:

 Questions:
 1) I'm not sure about whether we need support of  in GiST, because it
 always produces full index scan (except search for non-empty ranges).

I was thinking the same thing; that opclass entry seems pretty darn
useless.

I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the  opclass entry,
because I think these are uncontroversial and in the nature of
must fix quickly.  The redesign of the penalty and picksplit
functions should be discussed separately.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-27 Thread Alexander Korotkov
On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
 values obtained from subtype_diff.  This is not good, because you have
 no idea what scale the subtype differences will be expressed on.  The
 hard-wired values could be greatly larger than range widths, or greatly
 smaller, resulting in randomly different index behavior.

Current GiST code only compare penalty values of inserting same tuple. And
don't see why it may alters. So, values obtained from subtype_diff
and hard-wired values would be never compared each other.


 2. It's too large/complicated.  You're proposing to add nearly a
 thousand lines to rangetypes_gist.c, and I do not see any reason to
 think that this is so much better than what's there now as to justify
 that kind of increment in the code size.  I saw your performance
 results, but one set of results on an arbitrary (not-real-world) test
 case doesn't prove a lot to me; and in particular it doesn't prove that
 we couldn't do as well with a much smaller and simpler patch.

I've tested double sorting split algorithm itself pretty much on synthetic
datasets. See paper for details. Strategy of separation of different
classes of ranges really need more testing. But obtaining large enough
real-life datasets is pretty *problematic for me.*


 There are a lot of garden-variety coding problems, too, for instance here:

 + *penalty = Max(DatumGetFloat8(FunctionCall2(
 + subtype_diff, orig_lower.val, new_lower.val)), 0.0);

 which is going to uselessly call the subtype_diff function twice most of
 the time (Max() is only a macro), plus you left off the collation
 argument.  But I don't think it's worth worrying about those until the
 big picture is correct, which I feel it isn't yet.

Oh, I see. It will be fixed.

I propose to pull out and apply the changes related to the
 RANGE_CONTAIN_EMPTY flag, and also remove the  opclass entry,
 because I think these are uncontroversial and in the nature of
 must fix quickly.  The redesign of the penalty and picksplit
 functions should be discussed separately.

I think the same.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-27 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
 values obtained from subtype_diff.  This is not good, because you have
 no idea what scale the subtype differences will be expressed on.  The
 hard-wired values could be greatly larger than range widths, or greatly
 smaller, resulting in randomly different index behavior.

 Current GiST code only compare penalty values of inserting same tuple. And
 don't see why it may alters. So, values obtained from subtype_diff
 and hard-wired values would be never compared each other.

I see your point that we only need the penalty values to be comparable
for the same new value, but I don't think that really answers my
objection, because you've had to lobotomize the logic.  As an example,
if we have a new empty range to insert, and all the existing root-page
entries are ordinary finite ranges, this code will throw up its hands
and give them all the same 4.0 penalty value.  Surely it would be better
to attempt to pick the smallest (narrowest) existing range.  But to do
that, you have to pay attention to the subdiff value.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-27 Thread Alexander Korotkov
On Mon, Nov 28, 2011 at 3:00 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I see your point that we only need the penalty values to be comparable
 for the same new value, but I don't think that really answers my
 objection, because you've had to lobotomize the logic.  As an example,
 if we have a new empty range to insert, and all the existing root-page
 entries are ordinary finite ranges, this code will throw up its hands
 and give them all the same 4.0 penalty value.  Surely it would be better
 to attempt to pick the smallest (narrowest) existing range.  But to do
 that, you have to pay attention to the subdiff value.

I believe it's a problem of the current GiST interface. If using subdiff
value as an penalty for insertion of empty range, we have to return 0
penalty for any entry with RANGE_CONTAIN_EMPTY flag. And for plain empty
entry too without any chance to define priority between them. In my opinion
solution is that penalty function should return vector of floats instead of
single float. With current GiST interface we have to do will solution of
handling some cases better and some cases worse. For example, GiST for
boxes also suffers from interface limitation. In many papers I met
recommendation to choose smallest box from boxes with same extention (it's
not a rare situation to have multiple boxes with zero extention) for tuple
insertion. But with current interface, we can't implement it.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-25 Thread Jeff Davis
On Wed, 2011-11-09 at 20:24 +0400, Alexander Korotkov wrote:
 New version of GiST for range types patch is here. This version seems
 to be complete and ready for review.
 
There's been some significant change in rangetypes_gist.c, can you
please rebase this patch?

I like the patch conceptually, though I'm still working through the
details.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-07 Thread Alexander Korotkov
During work on gist for range types I've faced with following problem:

test=# select 'empty'::int4range !?;
ERROR:  operator does not exist: int4range !?
LINE 1: select 'empty'::int4range !?;
  ^
HINT:  No operator matches the given name and argument type(s). You might
need to add explicit type casts.

test=# select 'empty'::int4range ?;
ERROR:  operator does not exist: int4range ?
LINE 1: select 'empty'::int4range ?;
  ^
HINT:  No operator matches the given name and argument type(s). You might
need to add explicit type casts.

So, !? and ? operators are mentioned in documentation, but don't present in
catalog. Are them just missed in the catalog or there is some more serious
problem?

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-07 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 So, !? and ? operators are mentioned in documentation, but don't present in
 catalog. Are them just missed in the catalog or there is some more serious
 problem?

IIRC, Heikki removed them from the final commit.  Sounds like he missed
some documentation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-07 Thread Alexander Korotkov
First version of GiST for range types patch is here. Comments  refactoring
 testing are coming soon.

--
With best regards,
Alexander Korotkov.


rangetypegist-0.1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-07 Thread Heikki Linnakangas

On 07.11.2011 20:36, Tom Lane wrote:

Alexander Korotkovaekorot...@gmail.com  writes:

So, !? and ? operators are mentioned in documentation, but don't present in
catalog. Are them just missed in the catalog or there is some more serious
problem?


IIRC, Heikki removed them from the final commit.  Sounds like he missed
some documentation.


Yep. Fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-11-04 Thread Jeff Davis
On Thu, 2011-11-03 at 10:37 +0200, Heikki Linnakangas wrote:
 Looking at the most recent patch, I don't actually see any GiST support 
 for the empty and non-empty operators (!? and ?). I don't see how those 
 could be accelerated with GiST, anyway; I think if you want to use an 
 index for those operators, you might as well create a partial or 
 functional index on empty(x).
 
 So I'm actually inclined to remove not only the nonempty function, but 
 also the ? and !? operators. They don't seem very useful, and ? and !? 
 don't feel very intuitive to me, anyway. I'll just leave the empty(x) 
 function.

That's fine with me. At best, they were only marginally useful.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-04 Thread Jeff Davis
On Wed, 2011-11-02 at 21:29 +0200, Heikki Linnakangas wrote:
  +   else if (lower1.infinite || upper1.infinite)
  +   length1 = 1.0/0.0;
 
 That seems wrong. I take it that the point is to set length1 to infinity?

I reworked this in commit (on my private repo, of course):
6197fbffb00f729feba8082136801cdef5ac850e

For the archives, it's essentially taking the difference on the left
side of the range, and the difference on the right side of the range,
and adding them together. There are just a lot of special cases for
infinite boundaries, empty ranges, and the lack of a subtype_diff
function.

I think it's a little closer to what Alexander intended, which I think
is an improvement. It should now be able to recognize that expanding
[10,) into [0,) has a penalty of 10.

 PS. I note the docs still refer to subtype_float. I'll fix that before 
 committing.

Thank you. The only change I found strange was the test that used \c to
reconnect; but I can't say that my solution was any better.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-04 Thread Jeff Davis
On Wed, 2011-11-02 at 22:59 +0200, Heikki Linnakangas wrote:
 This seems to be coming from the selectivity estimation function. The 
 selectivity function for @ is scalargtsel, which is usually used for 
 scalar  and =. That doesn't seem right. But what do we store in the 
 statistics for range types in the first place, and what would be the 
 right thing to do for selectivity estimation?

I'll have to think more about that, and it depends on the operator. It
seems like an easier problem for contains a point than contains
another range or overlaps with another range.

Right now I don't have a very good answer, and even for the contains a
point case I'll have to think about the representation in pg_statistic.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-11-03 Thread Heikki Linnakangas

On 17.10.2011 01:09, Jeff Davis wrote:

On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:

* Do we really need non_empty(anyrange) ? You can just do NOT empty(x)


To make it a searchable (via GiST) condition, I need an operator. I
could either remove that operator (as it's not amazingly useful), or I
could just not document the function but leave the operator there.


Looking at the most recent patch, I don't actually see any GiST support 
for the empty and non-empty operators (!? and ?). I don't see how those 
could be accelerated with GiST, anyway; I think if you want to use an 
index for those operators, you might as well create a partial or 
functional index on empty(x).


So I'm actually inclined to remove not only the nonempty function, but 
also the ? and !? operators. They don't seem very useful, and ? and !? 
don't feel very intuitive to me, anyway. I'll just leave the empty(x) 
function.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-03 Thread Heikki Linnakangas

On 03.11.2011 10:42, Jeff Davis wrote:

On Wed, 2011-11-02 at 22:59 +0200, Heikki Linnakangas wrote:

This seems to be coming from the selectivity estimation function. The
selectivity function for@ is scalargtsel, which is usually used for
scalar  and=. That doesn't seem right. But what do we store in the
statistics for range types in the first place, and what would be the
right thing to do for selectivity estimation?


I'll have to think more about that, and it depends on the operator. It
seems like an easier problem for contains a point than contains
another range or overlaps with another range.

Right now I don't have a very good answer, and even for the contains a
point case I'll have to think about the representation in pg_statistic.


I've committed this now, after some more cleanup. I removed the 
selectivity estimation functions from operators where they were bogus, 
so writing those is a clear TODO. But that can well be done as a 
separate patch.


Thanks!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-03 Thread David E. Wheeler
On Nov 3, 2011, at 4:59 AM, Heikki Linnakangas wrote:

 I've committed this now, after some more cleanup. I removed the selectivity 
 estimation functions from operators where they were bogus, so writing those 
 is a clear TODO. But that can well be done as a separate patch.
 
 Thanks!

Woo! Congrats Jeff. Awesome news. Very excited about this feature. Thanks for 
getting this in, Heikki.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-03 Thread Florian Pflug
On Nov3, 2011, at 18:54 , David E. Wheeler wrote:
 On Nov 3, 2011, at 4:59 AM, Heikki Linnakangas wrote:
 I've committed this now, after some more cleanup. I removed the selectivity 
 estimation functions from operators where they were bogus, so writing those 
 is a clear TODO. But that can well be done as a separate patch.
 
 Thanks!
 
 Woo! Congrats Jeff. Awesome news. Very excited about this feature. Thanks for 
 getting this in, Heikki.

+1. Great work, guys!

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-03 Thread Alexander Korotkov
On Thu, Nov 3, 2011 at 3:59 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 I've committed this now, after some more cleanup. I removed the
 selectivity estimation functions from operators where they were bogus, so
 writing those is a clear TODO. But that can well be done as a separate
 patch.

Cool! Patch with GiST on range types improvements from me will be soon.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-02 Thread Heikki Linnakangas

On 01.11.2011 06:33, Jeff Davis wrote:

On Mon, 2011-10-24 at 15:05 +0400, Alexander Korotkov wrote:


I think implementing subtype_diff for each datatype is ok.  We could
implement some universal function based on minus operator and casting
to double precision. But such solution might be unacceptable in
both predictability (operator and casting function might do not the
things we expect) and performance.


Done.


Thanks, I'm looking into this now.


+   else if (lower1.infinite || upper1.infinite)
+   length1 = 1.0/0.0;


That seems wrong. I take it that the point is to set length1 to infinity?

PS. I note the docs still refer to subtype_float. I'll fix that before 
committing.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 01.11.2011 06:33, Jeff Davis wrote:
 +   else if (lower1.infinite || upper1.infinite)
 +   length1 = 1.0/0.0;

 That seems wrong. I take it that the point is to set length1 to infinity?

Please use get_float[48]_infinity() or get_float[48]_nan(), as
appropriate (I think the latter may be intended here), rather than
making up your own way of getting those values.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-02 Thread Heikki Linnakangas

On 01.11.2011 06:33, Jeff Davis wrote:

On Mon, 2011-10-24 at 15:05 +0400, Alexander Korotkov wrote:


I think implementing subtype_diff for each datatype is ok.  We could
implement some universal function based on minus operator and casting
to double precision. But such solution might be unacceptable in
both predictability (operator and casting function might do not the
things we expect) and performance.


Done.

Everything is complete in this patch with the exception of two optional
things, which I still intend to do but might best be done in a separate
commit:

   * support typmod for ranges
   * support casts between different range types

Both of these things, I believe, require the introduction of an
RangeCoerceExpr, similar to ArrayCoerceExpr. That's fine, but it creates
a rather large diff, so it might be best left for a later commit.


Using the test table from the rangetypes test case:

postgres=#  select * from test_range_gist where 10 @ ir;
ERROR:  unsupported type: 3904

This seems to be coming from the selectivity estimation function. The 
selectivity function for @ is scalargtsel, which is usually used for 
scalar  and =. That doesn't seem right. But what do we store in the 
statistics for range types in the first place, and what would be the 
right thing to do for selectivity estimation?


I'll dig deeper into this tomorrow...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-11-02 Thread Heikki Linnakangas

On 02.11.2011 22:59, Heikki Linnakangas wrote:

I'll dig deeper into this tomorrow...


Forgot to mention: I have pushed what I have done this far to my git 
repository at git://git.postgresql.org/git/users/heikki/postgres.git, if 
you want to take a look. Nothing major, just garden-variety cleanup.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas

On 25.10.2011 19:37, Jeff Davis wrote:

On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:

Hmm, I don't think that's safe. After Oid wraparound, a range type oid
might get reused for some other range type, and the cache would return
stale values. Extremely unlikely to happen by accident, but could be
exploited by an attacker.


Any ideas on how to remedy that? I don't have another plan for making it
perform well. Plugging it into the cache invalidation mechanism seems
like overkill, but I suppose that would solve the problem.


I think we should look at the array-functions for precedent. array_in et 
al cache the information in fn_extra, so that when it's called 
repeatedly in one statement for the same type, the information is only 
looked up once. That's good enough, it covers repeated execution in a 
single query, as well as COPY and comparison calls from index searches, 
for example.



Aren't there a few other cases like this floating around the code?


Not that I know of. That said, I wouldn't be too surprised if there was.


I know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


True.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas
On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
 Hmm, I don't think that's safe. After Oid wraparound, a range type oid
 might get reused for some other range type, and the cache would return
 stale values. Extremely unlikely to happen by accident, but could be
 exploited by an attacker.

 Any ideas on how to remedy that? I don't have another plan for making it
 perform well. Plugging it into the cache invalidation mechanism seems
 like overkill, but I suppose that would solve the problem.

 Aren't there a few other cases like this floating around the code? I
 know the single-xid cache is potentially vulnerable to xid wraparound
 for the same reason.

I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, because then you could
have a situation where some people think a given XID is in the future
and others think it's in the past.  I have been wondering if we should
have some sort of active guard against that scenario; I don't think we
do at present.

But OID wraparound is not the same as XID wraparound.  It's far more
common, I think, for a single transaction to use lots of OIDs than it
is for it to use lots of XIDs (i.e. have many subtransactions).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas

On 26.10.2011 18:42, Robert Haas wrote:

On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davispg...@j-davis.com  wrote:

Aren't there a few other cases like this floating around the code? I
know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, ...


That's not what Jeff is referring to here, though (correct me if I'm 
wrong). He's talking about the one-item cache in 
TransactionIdLogFetch(). You don't need need long-running transactions 
for that to get confused. Specifically, this could happen:


1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
   The row has xmin = 123456, and it is cached as committed in the 
one-item cache by TransactionLogFetch.
2. A lot of time passes. Everything is frozen, and XID wrap-around 
happens. (Session A is idle but not in a transaction, so it doesn't 
inhibit freezing.)

3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
   By coincidence, this transaction was assigned XID 123456.
4. In session A: SELECT * FROM foo WHERE id = 2;
   The one-item cache still says that 123456 committed, so we return 
the tuple inserted by the aborted transaction. Oops.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas
On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 26.10.2011 18:42, Robert Haas wrote:

 On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davispg...@j-davis.com  wrote:

 Aren't there a few other cases like this floating around the code? I
 know the single-xid cache is potentially vulnerable to xid wraparound
 for the same reason.

 I believe that we're in trouble with XIDs as soon as you have two
 active XIDs that are separated by a billion, ...

 That's not what Jeff is referring to here, though (correct me if I'm wrong).
 He's talking about the one-item cache in TransactionIdLogFetch(). You don't
 need need long-running transactions for that to get confused. Specifically,
 this could happen:

 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
   The row has xmin = 123456, and it is cached as committed in the one-item
 cache by TransactionLogFetch.
 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
 (Session A is idle but not in a transaction, so it doesn't inhibit
 freezing.)
 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
   By coincidence, this transaction was assigned XID 123456.
 4. In session A: SELECT * FROM foo WHERE id = 2;
   The one-item cache still says that 123456 committed, so we return the
 tuple inserted by the aborted transaction. Oops.

Oh, hmm.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié oct 26 13:19:47 -0300 2011:
 On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

  1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
    The row has xmin = 123456, and it is cached as committed in the one-item
  cache by TransactionLogFetch.
  2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
  (Session A is idle but not in a transaction, so it doesn't inhibit
  freezing.)
  3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
    By coincidence, this transaction was assigned XID 123456.
  4. In session A: SELECT * FROM foo WHERE id = 2;
    The one-item cache still says that 123456 committed, so we return the
  tuple inserted by the aborted transaction. Oops.
 
 Oh, hmm.  That sucks.

Can this be solved by simple application of the Xid epoch?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Jeff Davis
On Wed, 2011-10-26 at 12:19 -0400, Robert Haas wrote:
  1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
The row has xmin = 123456, and it is cached as committed in the one-item
  cache by TransactionLogFetch.
  2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
  (Session A is idle but not in a transaction, so it doesn't inhibit
  freezing.)
  3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
By coincidence, this transaction was assigned XID 123456.
  4. In session A: SELECT * FROM foo WHERE id = 2;
The one-item cache still says that 123456 committed, so we return the
  tuple inserted by the aborted transaction. Oops.

Yes, that's the scenario I was talking about.

 Oh, hmm.  That sucks.

It didn't seem like a major concern a while ago:
http://archives.postgresql.org/message-id/28107.1291264...@sss.pgh.pa.us

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 That's not what Jeff is referring to here, though (correct me if I'm 
 wrong). He's talking about the one-item cache in 
 TransactionIdLogFetch(). You don't need need long-running transactions 
 for that to get confused. Specifically, this could happen:

 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
 The row has xmin = 123456, and it is cached as committed in the 
 one-item cache by TransactionLogFetch.
 2. A lot of time passes. Everything is frozen, and XID wrap-around 
 happens. (Session A is idle but not in a transaction, so it doesn't 
 inhibit freezing.)
 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
 By coincidence, this transaction was assigned XID 123456.
 4. In session A: SELECT * FROM foo WHERE id = 2;
 The one-item cache still says that 123456 committed, so we return 
 the tuple inserted by the aborted transaction. Oops.

I think this is probably a red herring, because it's impossible for a
session to remain totally idle for that long --- see sinval updating.
(If you wanted to be really sure, we could have sinval reset clear
the TransactionLogFetch cache, but I doubt it's worth the trouble.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I believe that we're in trouble with XIDs as soon as you have two
 active XIDs that are separated by a billion, because then you could
 have a situation where some people think a given XID is in the future
 and others think it's in the past.  I have been wondering if we should
 have some sort of active guard against that scenario; I don't think we
 do at present.

Sure we do.  It's covered by the XID wraparound prevention logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-25 Thread Jeff Davis
On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
 Hmm, I don't think that's safe. After Oid wraparound, a range type oid 
 might get reused for some other range type, and the cache would return 
 stale values. Extremely unlikely to happen by accident, but could be 
 exploited by an attacker.
 

Any ideas on how to remedy that? I don't have another plan for making it
perform well. Plugging it into the cache invalidation mechanism seems
like overkill, but I suppose that would solve the problem.

Aren't there a few other cases like this floating around the code? I
know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-25 Thread Alexander Korotkov
On Mon, Oct 24, 2011 at 3:05 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 If we allow user to specify own gist_penalty function, then such function
 should deal with:
 1) GiST-specific data structures such as GISTENTRY.
 2) Decomposing ranges using range_deserialize.
 3) Inifinities, which we could handle in general penalty functions.
 Thats why I prefere to implement subtype_diff.

I forgot another agument for having subtype_diff:
4) In my picksplit algorithm it would be more natural to use subtype_diff
for measuring overlap than use penalty function.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-24 Thread Alexander Korotkov
Hi!

On Mon, Oct 17, 2011 at 12:38 PM, Jeff Davis pg...@j-davis.com wrote:

  I started implementing subtype_diff, and I noticed that it requires
  defining an extra function for each range type. Previously, the numeric
  types could just use a cast, which was convenient for user-defined range
  types.
 
  If you have any other ideas to make that cleaner, please let me know.
  Otherwise I'll just finish implementing subtype_diff.

I think implementing subtype_diff for each datatype is ok.  We could
implement some universal function based on minus operator and casting to
double precision. But such solution might be unacceptable in both
*predictability
(operator and casting function might do not the things we expect) and
performance.*

I'm beginning to think that we should just allow the user to specify
 their own gist_penalty function. Specifying just the subtype_diff
 doesn't save much time, and it can only be limiting. Additionally, it's
 harder for users to understand the purpose of the function.

If we allow user to specify own gist_penalty function, then such function
should deal with:
1) GiST-specific data structures such as GISTENTRY.
2) Decomposing ranges using range_deserialize.
3) Inifinities, which we could handle in general penalty functions.
Thats why I prefere to implement subtype_diff.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-24 Thread Heikki Linnakangas

On 17.10.2011 01:09, Jeff Davis wrote:

On Sat, 2011-10-15 at 01:46 +0300, Heikki Linnakangas wrote:

* The binary i/o format includes the length of the lower and upper
bounds twice, once explicitly in range_send, and second time within the
send-function of the subtype. Seems wasteful.


Any ideas how to fix that? How else do I know how much the underlying
send function will consume?


Oh, never mind. I was misreading the code, it's not sending the length 
twice.



* range_constructor_internal - I think it would be better to move logic
to figure out the the arguments into the callers.


Done.


The comment above range_constructor0() is now outdated.


* The gist support functions frequently call range_deserialize(), which
does catalog lookups. Isn't that horrendously expensive?


Yes, it was. I have introduced a cached structure that avoids syscache
lookups when it's the same range as the last lookup (the common case).


Hmm, I don't think that's safe. After Oid wraparound, a range type oid 
might get reused for some other range type, and the cache would return 
stale values. Extremely unlikely to happen by accident, but could be 
exploited by an attacker.



* What exactly is canonical function supposed to return? It's not clear
what format one should choose as the canonical format when writing a
custom range type. And even for the built-in types, it would be good to
explain which types use which canonical format (I saw the discussions on
that, so I understand that might still be subject to change).


The canonical function is just supposed to return a new range such that
two equal values will have equal representations. I have listed the
built-in discrete range types and their canonical form.

As far as describing what a custom range type is supposed to use for the
canonical form, I don't know which part is exactly unclear. There aren't
too many rules to defining one -- the only guideline is that ranges of
equal value going in should be put in one canonical representation.


Ok. The name canonical certainly hints at that, but it would be good 
to explicitly state that guideline. As the text stands, it would seem 
that a canonical function that maps [1,7] to [1,8), and also vice 
versa, [1,8) to [1,7], would be valid. That would be pretty silly, 
but it would be good to say something like The canonical output for two 
values that are equal, like [1,7] and [1,8), must be equal. It doesn't 
matter which representation you choose to be the canonical one, as long 
as two equal values with different formattings are always mapped to the 
same value with same formatting


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-17 Thread Jeff Davis
On Sun, 2011-10-16 at 14:43 -0700, Jeff Davis wrote:
 On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:
 
  The first thing caught my eye in existing GiST code is idea of
  subtype_float. float8 has limited precision and can't respresent, for
  example, varlena values good enough. Even if we have large int8 value
  we can loose lower bits, but data distribution can be so that these
  bits are valuable. Wouldn't it better to have function like
  subtype_diff_float which returns difference between two values of
  subtype as an float? Using of such function could make penalty more
  sensible to even small difference between values, and accordingly more
  relevant.
  
 I started implementing subtype_diff, and I noticed that it requires
 defining an extra function for each range type. Previously, the numeric
 types could just use a cast, which was convenient for user-defined range
 types.
 
 If you have any other ideas to make that cleaner, please let me know.
 Otherwise I'll just finish implementing subtype_diff.

I'm beginning to think that we should just allow the user to specify
their own gist_penalty function. Specifying just the subtype_diff
doesn't save much time, and it can only be limiting. Additionally, it's
harder for users to understand the purpose of the function.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-16 Thread Jeff Davis
On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:

 The first thing caught my eye in existing GiST code is idea of
 subtype_float. float8 has limited precision and can't respresent, for
 example, varlena values good enough. Even if we have large int8 value
 we can loose lower bits, but data distribution can be so that these
 bits are valuable. Wouldn't it better to have function like
 subtype_diff_float which returns difference between two values of
 subtype as an float? Using of such function could make penalty more
 sensible to even small difference between values, and accordingly more
 relevant.
 
I started implementing subtype_diff, and I noticed that it requires
defining an extra function for each range type. Previously, the numeric
types could just use a cast, which was convenient for user-defined range
types.

If you have any other ideas to make that cleaner, please let me know.
Otherwise I'll just finish implementing subtype_diff.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-14 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 * Have you tested this on an architecture with strict alignment? I don't 
 see any alignment bugs, but I think there's a lot of potential for them..

Well, fwiw, this patch passes its regression tests on HPPA, except for this
which seems more to do with the already-noted unacceptable dependency on
non-C collations:

*** /home/postgres/pgsql/src/test/regress/expected/rangetypes.out   Fri Oct 
14 21:19:19 2011
--- /home/postgres/pgsql/src/test/regress/results/rangetypes.outFri Oct 
14 21:50:11 2011
***
*** 842,857 
  --
  create type textrange_c as range(subtype=text, collation=C);
  create type textrange_en_us as range(subtype=text, collation=en_US);
  select textrange_c('a','Z') @ 'b'::text;
  ERROR:  range lower bound must be less than or equal to range upper bound
  select textrange_en_us('a','Z') @ 'b'::text;
!  ?column? 
! --
!  t
! (1 row)
! 
  drop type textrange_c;
  drop type textrange_en_us;
  --
  -- Test out polymorphic type system
  --
--- 842,858 
  --
  create type textrange_c as range(subtype=text, collation=C);
  create type textrange_en_us as range(subtype=text, collation=en_US);
+ ERROR:  collation en_US for encoding SQL_ASCII does not exist
  select textrange_c('a','Z') @ 'b'::text;
  ERROR:  range lower bound must be less than or equal to range upper bound
  select textrange_en_us('a','Z') @ 'b'::text;
! ERROR:  function textrange_en_us(unknown, unknown) does not exist
! LINE 1: select textrange_en_us('a','Z') @ 'b'::text;
!^
! HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
  drop type textrange_c;
  drop type textrange_en_us;
+ ERROR:  type textrange_en_us does not exist
  --
  -- Test out polymorphic type system
  --

==


Also, I notice you forgot to update serial_schedule:

diff --git a/src/test/regress/serial_schedule b/src/test/regress/serial_schedule
index bb654f9c612970ef777e8cc39369a91e343f6afc..2e87d9eefd6fbb70a5603bc000ffe833
5945201f 100644
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
*** test: txid
*** 18,23 
--- 18,24 
  test: uuid
  test: enum
  test: money
+ test: rangetypes
  test: strings
  test: numerology
  test: point


regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Thom Brown
On 11 October 2011 02:14, Florian Pflug f...@phlo.org wrote:
 On Oct10, 2011, at 20:06 , Thom Brown wrote:
 Okay, a real example of why discrete should be '[]' and continuous
 should be '[)'.

 If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
 precisely it either becomes free or is available to someone else, so
 it can be booked 11:00 to 12:00 without conflict.

 If you have raffle tickets numbered 1 to 100 (int4range), and you ask
 for tickets 9 to 11, no-one else can use 11 as it aligns with the last
 one you bought.

 So for me, it's intuitive for them to behave differently.  So yes,
 default behaviour would vary between types, but I didn't previously
 read anything on default_flags, so I don't know where that comes into
 it.  Shouldn't it be the case that if a type has a canonical function,
 it's entirely inclusive, otherwise it's upper boundary is exclusive?

 First, there's the type date, which in my book is discrete. So we'd make
 date ranges closed by default, not half-open. And there's timestamp, which
 is continuous so we'd make its default half-open. That doesn't seem exactly
 intuitive to me.

Ah yes, I agree there.  Okay, I see your point.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter
On Mon, Oct 10, 2011 at 10:25:18PM -0700, Jeff Davis wrote:
 On Tue, 2011-10-11 at 03:14 +0200, Florian Pflug wrote:
  Maybe ranges over discrete types are slightly more likely to be
  closed, and ranges over continuous types slightly more likely to
  be open. Still, I very much doubt that the skew in the
  distribution is large enough to warrant the confusion and
  possibility of subtle bugs we introduce by making the semantics of
  a range type's constructor depend on the definition of the range
  and/or base type.
 
 I think you persuaded me on the consistency aspect.
 
 I'm wondering whether to do away with the default argument entirely,
 and just force the user to always specify it during construction. It
 seems like a shame that such pain is caused over the syntax, because
 in a perfect world it wouldn't be a bother to specify it at all. I
 even considered using prefix/postfix operators to try to make it
 nicer, but it seems like every idea I had was just short of
 practical. Maybe a few extra characters at the end aren't so bad.
 
 I'd like to hear from some potential users though to see if anyone
 recoils at the common case.

I'd recoil at not having ranges default to left-closed, right-open.
The use case for that one is so compelling that I'm OK with making it
the default from which deviations need to be specified.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Florian Pflug
On Oct11, 2011, at 14:43 , David Fetter wrote:
 I'd recoil at not having ranges default to left-closed, right-open.
 The use case for that one is so compelling that I'm OK with making it
 the default from which deviations need to be specified.

The downside of that is that, as Tom pointed out upthread, we cannot
make [) the canonical representation of ranges. It'd require us to
increment the right boundary of a closed range, but that incremented
boundary might no longer be in the base type's domain.

So we'd end up with [) being the default for range construction,
but [] being the canonical representation, i.e. what you get back
when SELECTing a range (over a discrete base type).

Certainly not the end of the world, but is the convenience of being
able to write somerange(a, b) instead of somerange(a, b, '[)') really
worth it? I kind of doubt that...

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter
On Tue, Oct 11, 2011 at 03:20:05PM +0200, Florian Pflug wrote:
 On Oct11, 2011, at 14:43 , David Fetter wrote:
  I'd recoil at not having ranges default to left-closed,
  right-open.  The use case for that one is so compelling that I'm
  OK with making it the default from which deviations need to be
  specified.
 
 The downside of that is that, as Tom pointed out upthread, we cannot
 make [) the canonical representation of ranges. It'd require us to
 increment the right boundary of a closed range, but that incremented
 boundary might no longer be in the base type's domain.
 
 So we'd end up with [) being the default for range construction, but
 [] being the canonical representation, i.e. what you get back when
 SELECTing a range (over a discrete base type).
 
 Certainly not the end of the world, but is the convenience of being
 able to write somerange(a, b) instead of somerange(a, b, '[)')
 really worth it? I kind of doubt that...

You're making a persuasive argument for the latter based solely on the
clarity.  If people see that 3rd element in the DDL, or need to
provide it, it's *very* obvious what's going on.

Cheers,
David (who suspects that having a syntax like somerange[a,b) just
won't work with the current state of parsers, etc.)
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
  Certainly not the end of the world, but is the convenience of being
  able to write somerange(a, b) instead of somerange(a, b, '[)')
  really worth it? I kind of doubt that...

 You're making a persuasive argument for the latter based solely on the
 clarity.  If people see that 3rd element in the DDL, or need to
 provide it, it's *very* obvious what's going on.

 That was how I originally thought, but we're also providing built-in
 range types like tsrange and daterange. I could see how if the former
 excluded the endpoint and the latter included it, it could be confusing.

 We could go back to having different constructor names for different
 inclusivity; e.g. int4range_cc(1,10). That at least removes the
 awkwardness of typing (and seeing) '[]'.

The cure seems worse than the disease.  What is so bad about '[]'?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter
On Tue, Oct 11, 2011 at 12:09:01PM -0400, Robert Haas wrote:
 On Tue, Oct 11, 2011 at 12:03 PM, Jeff Davis pg...@j-davis.com wrote:
  On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
   Certainly not the end of the world, but is the convenience of being
   able to write somerange(a, b) instead of somerange(a, b, '[)')
   really worth it? I kind of doubt that...
 
  You're making a persuasive argument for the latter based solely on the
  clarity.  If people see that 3rd element in the DDL, or need to
  provide it, it's *very* obvious what's going on.
 
  That was how I originally thought, but we're also providing built-in
  range types like tsrange and daterange. I could see how if the former
  excluded the endpoint and the latter included it, it could be confusing.
 
  We could go back to having different constructor names for different
  inclusivity; e.g. int4range_cc(1,10). That at least removes the
  awkwardness of typing (and seeing) '[]'.
 
 The cure seems worse than the disease.  What is so bad about '[]'?

Nothing's bad about '[]' per se.  What's better, but possibly out of
the reach of our current lexing and parsing system, would be things
like:

[1::int, 10)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:12 PM, David Fetter da...@fetter.org wrote:
 Nothing's bad about '[]' per se.  What's better, but possibly out of
 the reach of our current lexing and parsing system, would be things
 like:

 [1::int, 10)

That's been discussed before.  Aside from the parser issues (which are
formidable) it would break brace-matching in most if not all commonly
used editors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis
On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
 The cure seems worse than the disease.  What is so bad about '[]'?

OK, so we stick with the 3-argument form. Do we have a default for the
third argument, or do we scrap it to avoid confusion?

There were some fairly strong objections to using '[]' as the default or
having the default vary between types. So, the only real option
remaining, if we do have a default, is '[)'.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis
On Tue, 2011-10-11 at 06:28 -0700, David Fetter wrote:
  Certainly not the end of the world, but is the convenience of being
  able to write somerange(a, b) instead of somerange(a, b, '[)')
  really worth it? I kind of doubt that...
 
 You're making a persuasive argument for the latter based solely on the
 clarity.  If people see that 3rd element in the DDL, or need to
 provide it, it's *very* obvious what's going on.

That was how I originally thought, but we're also providing built-in
range types like tsrange and daterange. I could see how if the former
excluded the endpoint and the latter included it, it could be confusing.

We could go back to having different constructor names for different
inclusivity; e.g. int4range_cc(1,10). That at least removes the
awkwardness of typing (and seeing) '[]'.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Robert Haas
On Tue, Oct 11, 2011 at 12:30 PM, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-10-11 at 12:09 -0400, Robert Haas wrote:
 The cure seems worse than the disease.  What is so bad about '[]'?

 OK, so we stick with the 3-argument form. Do we have a default for the
 third argument, or do we scrap it to avoid confusion?

 There were some fairly strong objections to using '[]' as the default or
 having the default vary between types. So, the only real option
 remaining, if we do have a default, is '[)'.

I think using '[)' is fine.  At some level, this is just a question of
expectations.  If you expect that int4range(1,4) will create a range
that includes 4, well, you're wrong.  Once you get used to it, it will
seem normal, and you'll know that you need to write
int4range(1,4,'[]') if that's what you want.  As long as the system is
designed around a set of consistent and well-thought-out principles,
people will get used to the details.  I don't see that the idea of a
half-open range over a discrete-valued type is particularly confusing
- we use them all the time in programming, when we make the end
pointer point to the byte following the end of the array, rather than
the last element - but even if it is, overall design consistency
trumps what someone may find to be the absolutely perfect behavior in
some particular case.  And saving typing is nearly always good -
unless it creates a LOT more confusion than I think this will.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Jeff Davis
On Tue, 2011-10-11 at 12:40 -0400, Robert Haas wrote:
 I think using '[)' is fine.  At some level, this is just a question of
 expectations.  If you expect that int4range(1,4) will create a range
 that includes 4, well, you're wrong.  Once you get used to it, it will
 seem normal, and you'll know that you need to write
 int4range(1,4,'[]') if that's what you want.  As long as the system is
 designed around a set of consistent and well-thought-out principles,
 people will get used to the details.  I don't see that the idea of a
 half-open range over a discrete-valued type is particularly confusing
 - we use them all the time in programming, when we make the end
 pointer point to the byte following the end of the array, rather than
 the last element - but even if it is, overall design consistency
 trumps what someone may find to be the absolutely perfect behavior in
 some particular case.  And saving typing is nearly always good -
 unless it creates a LOT more confusion than I think this will.

That sounds very reasonable to me.

Tom made an observation about '[1,INT_MAX]' thowing an error because
canonicalization would try to increment INT_MAX. But I'm not
particularly disturbed by it. If you want a bigger range, use int8range
or numrange -- the same advice we give to people who want unsigned
types. Or, for people who really need the entire range of signed int4
exactly, they can easily make their own range type that canonicalizes to
'[]'.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 Tom made an observation about '[1,INT_MAX]' thowing an error because
 canonicalization would try to increment INT_MAX. But I'm not
 particularly disturbed by it. If you want a bigger range, use int8range
 or numrange -- the same advice we give to people who want unsigned
 types. Or, for people who really need the entire range of signed int4
 exactly, they can easily make their own range type that canonicalizes to
 '[]'.

I agree we shouldn't contort the entire design to avoid that corner
case.  We should, however, make sure that the increment *does* throw
an error, and not just silently overflow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread David Fetter
On Tue, Oct 11, 2011 at 12:18:18PM -0400, Robert Haas wrote:
 On Tue, Oct 11, 2011 at 12:12 PM, David Fetter da...@fetter.org wrote:
  Nothing's bad about '[]' per se.  What's better, but possibly out
  of the reach of our current lexing and parsing system, would be
  things like:
 
  [1::int, 10)
 
 That's been discussed before.  Aside from the parser issues (which
 are formidable) it would break brace-matching in most if not all
 commonly used editors.

That being the situation, ubiquitous support for the natural syntax
looks like it's a decade away, minimum. :(

Trying to be cheery,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-11 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Oct11, 2011, at 14:43 , David Fetter wrote:
 I'd recoil at not having ranges default to left-closed, right-open.
 The use case for that one is so compelling that I'm OK with making it
 the default from which deviations need to be specified.

I agree with David on this.

 The downside of that is that, as Tom pointed out upthread, we cannot
 make [) the canonical representation of ranges.

Yeah, we certainly *can* do that, we just have to allow ranges that
include the last element of the domain to be corner cases that require
special handling.  If we don't want to just fail, we have to
canonicalize them to closed instead of open ranges.  It does not follow
that the default on input has to be closed.

Note that the INT_MAX case is probably not the worst issue in practice.
What is going to be an issue is ranges over enum types, where having the
last element being part of the range is a much more likely use-case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Thom Brown
On 2 October 2011 20:05, Jeff Davis pg...@j-davis.com wrote:
 On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
 Looking at the patch, I noticed that it's possible to specify the default
 boundaries ([], [), (] or ()) per individual float type with the
 DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
 do more harm then good - it makes it impossible to deduce the meaning of
 e.g. numericrange(1.0, 2.0) without looking up the definition of 
 numericrange.

 I suggest we pick one set of default boundaries, ideally '[)' since that
 is what all the built-in canonization functions produce, and stick with it.

 Done.

 Also, made the range parsing even more like records with more code
 copied verbatim. And fixed some parsing tests along the way.

I don't know if this has already been discussed, but can you explain
the following:

postgres=# select '[1,8]'::int4range;
 int4range
---
 [1,9)
(1 row)

It seems unintuitive to represent a discrete range using an exclusive
upper bound.  While I agree that the value itself is correct, it's
representation looks odd.  Is it necessary?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Jeff Davis
On Mon, 2011-10-10 at 14:27 +0100, Thom Brown wrote:
 I don't know if this has already been discussed, but can you explain
 the following:
 
 postgres=# select '[1,8]'::int4range;
  int4range
 ---
  [1,9)
 (1 row)
 
 It seems unintuitive to represent a discrete range using an exclusive
 upper bound.  While I agree that the value itself is correct, it's
 representation looks odd.  Is it necessary?

The canonicalize function (specified at type creation time) allows you
to specify the canonical output representation. So, I can change the
canonical form for discrete ranges to use '[]' notation if we think
that's more expected.

But then int4range(1,8) would still mean int4range(1,8,'[)') and
therefore '[1,7]'. I used to have a default_flags parameter that could
also be specified at type creation time that would control the default
third parameter (the parameter that controls inclusivity) of the
constructor. However, I removed the default_flags parameter because,
as Florian pointed out, it's better to have a consistent output from the
constructor.

I'm open to suggestions, including potentially bringing back
default_flags.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Mon, 2011-10-10 at 14:27 +0100, Thom Brown wrote:
 I don't know if this has already been discussed, but can you explain
 the following:
 
 postgres=# select '[1,8]'::int4range;
 int4range
 ---
 [1,9)
 (1 row)
 
 It seems unintuitive to represent a discrete range using an exclusive
 upper bound.  While I agree that the value itself is correct, it's
 representation looks odd.  Is it necessary?

 The canonicalize function (specified at type creation time) allows you
 to specify the canonical output representation. So, I can change the
 canonical form for discrete ranges to use '[]' notation if we think
 that's more expected.

What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
fail with an integer overflow.  I suppose you could canonicalize it to
an unbounded range, but that seems unnecessarily surprising.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Florian Pflug
On Oct10, 2011, at 18:53 , Tom Lane wrote:
 What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
 fail with an integer overflow.  I suppose you could canonicalize it to
 an unbounded range, but that seems unnecessarily surprising.

That is a very good point. Canonicalizing to an unbounded range doesn't work,
because, as it stands, the ranges '[1, INT_MAX]' and '[1,)' are *not* equal. So
the only remaining option is to canonicalize to the closed form always.

I still think we should strive for consistency here, so let's also make
'[]' the default flags for the range constructors.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Jeff Davis
On Mon, 2011-10-10 at 12:53 -0400, Tom Lane wrote:
  The canonicalize function (specified at type creation time) allows you
  to specify the canonical output representation. So, I can change the
  canonical form for discrete ranges to use '[]' notation if we think
  that's more expected.
 
 What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
 fail with an integer overflow.  I suppose you could canonicalize it to
 an unbounded range, but that seems unnecessarily surprising.

So, are you suggesting that I canonicalize to '[]' then? That seems
reasonable to me, but there's still some slight awkwardness because
int4range(1,10) would be '[1,9]'.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Thom Brown
On 10 October 2011 18:31, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-10-10 at 12:53 -0400, Tom Lane wrote:
  The canonicalize function (specified at type creation time) allows you
  to specify the canonical output representation. So, I can change the
  canonical form for discrete ranges to use '[]' notation if we think
  that's more expected.

 What if I write '[1,INT_MAX]'::int4range?  The open-parenthesis form will
 fail with an integer overflow.  I suppose you could canonicalize it to
 an unbounded range, but that seems unnecessarily surprising.

 So, are you suggesting that I canonicalize to '[]' then? That seems
 reasonable to me, but there's still some slight awkwardness because
 int4range(1,10) would be '[1,9]'.

Why?  int4range(1,10,'[]') returns:

 int4range
---
 [1,11)
(1 row)

Which if corrected to display the proposed way would just be '[1,10]'.
 So the default boundaries should be '[]' as opposed to '[)' as it is
now.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Jeff Davis
On Mon, 2011-10-10 at 19:22 +0200, Florian Pflug wrote:
 I still think we should strive for consistency here, so let's also make
 '[]' the default flags for the range constructors.

For continuous ranges I don't think that's a good idea. Closed-open is a
very widely-accepted convention and there are good reasons for it -- for
instance, it's good for specifying contiguous-but-non-overlapping
ranges.

So, I think we either need to standardize on '[)' or allow different
default_flags for different types. Or, always specify the inclusivity in
the constructor (hopefully in a convenient way).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Jeff Davis
On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
  So the default boundaries should be '[]' as opposed to '[)' as it is
 now.

Would that vary between range types? In other words, do I bring back
default_flags?

If not, I think a lot of people will object. The most common use-case
for range types are for continuous ranges like timestamps. And (as I
pointed out in reply to Florian) there are good reasons to use the '[)'
convention for those cases.

Regards,
Jeff Davis





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Thom Brown
On 10 October 2011 18:45, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
  So the default boundaries should be '[]' as opposed to '[)' as it is
 now.

 Would that vary between range types? In other words, do I bring back
 default_flags?

 If not, I think a lot of people will object. The most common use-case
 for range types are for continuous ranges like timestamps. And (as I
 pointed out in reply to Florian) there are good reasons to use the '[)'
 convention for those cases.

I'm proposing it for discrete ranges.  For continuous ranges, I guess
it makes sense to have up to, but not including.  The same boundary
inclusivity/exclusivity thing seems unintuitive for discrete ranges.
This has the downside of inconsistency, but I don't think that's
really a solid argument against it since their use will be different
anyway.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Thom Brown
On 10 October 2011 18:53, Thom Brown t...@linux.com wrote:
 On 10 October 2011 18:45, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-10-10 at 18:39 +0100, Thom Brown wrote:
  So the default boundaries should be '[]' as opposed to '[)' as it is
 now.

 Would that vary between range types? In other words, do I bring back
 default_flags?

 If not, I think a lot of people will object. The most common use-case
 for range types are for continuous ranges like timestamps. And (as I
 pointed out in reply to Florian) there are good reasons to use the '[)'
 convention for those cases.

 I'm proposing it for discrete ranges.  For continuous ranges, I guess
 it makes sense to have up to, but not including.  The same boundary
 inclusivity/exclusivity thing seems unintuitive for discrete ranges.
 This has the downside of inconsistency, but I don't think that's
 really a solid argument against it since their use will be different
 anyway.

Okay, a real example of why discrete should be '[]' and continuous
should be '[)'.

If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
precisely it either becomes free or is available to someone else, so
it can be booked 11:00 to 12:00 without conflict.

If you have raffle tickets numbered 1 to 100 (int4range), and you ask
for tickets 9 to 11, no-one else can use 11 as it aligns with the last
one you bought.

So for me, it's intuitive for them to behave differently.  So yes,
default behaviour would vary between types, but I didn't previously
read anything on default_flags, so I don't know where that comes into
it.  Shouldn't it be the case that if a type has a canonical function,
it's entirely inclusive, otherwise it's upper boundary is exclusive?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Florian Pflug
On Oct10, 2011, at 20:06 , Thom Brown wrote:
 Okay, a real example of why discrete should be '[]' and continuous
 should be '[)'.
 
 If you book a meeting from 09:00 to 11:00 (tsrange), at 11:00
 precisely it either becomes free or is available to someone else, so
 it can be booked 11:00 to 12:00 without conflict.
 
 If you have raffle tickets numbered 1 to 100 (int4range), and you ask
 for tickets 9 to 11, no-one else can use 11 as it aligns with the last
 one you bought.
 
 So for me, it's intuitive for them to behave differently.  So yes,
 default behaviour would vary between types, but I didn't previously
 read anything on default_flags, so I don't know where that comes into
 it.  Shouldn't it be the case that if a type has a canonical function,
 it's entirely inclusive, otherwise it's upper boundary is exclusive?

I'm not convinced by this. The question here is not whether which types
of ranges we *support*, only which type we consider to be more *canonical*,
and whether the bounds provided to a range constructor are inclusive or
exclusive by *default*.

Maybe ranges over discrete types are slightly more likely to be closed,
and ranges over continuous types slightly more likely to be open. Still,
I very much doubt that the skew in the distribution is large enough to
warrant the confusion and possibility of subtle bugs we introduce by making
the semantics of a range type's constructor depend on the definition of the
range and/or base type. Especially since we're talking about only *6* extra
characters to communicate the intended inclusivity/exclusivity of the bounds
to the range constructor.

Also, the distinction between types for which ranges should obviously
be closed, and those for which they should obviously be half-open is nowhere
as clear-cut as it seems at first sight.

First, there's the type date, which in my book is discrete. So we'd make
date ranges closed by default, not half-open. And there's timestamp, which
is continuous so we'd make its default half-open. That doesn't seem exactly
intuitive to me.

Second, there's int4 and float8, one discrete, one continuous. So would we
make int4range(1, 2) include 2, but float8range(1.0, 2.0) *not* include 2.0?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Florian Pflug
On Oct10, 2011, at 19:41 , Jeff Davis wrote:
 On Mon, 2011-10-10 at 19:22 +0200, Florian Pflug wrote:
 I still think we should strive for consistency here, so let's also make
 '[]' the default flags for the range constructors.
 
 For continuous ranges I don't think that's a good idea. Closed-open is a
 very widely-accepted convention and there are good reasons for it -- for
 instance, it's good for specifying contiguous-but-non-overlapping
 ranges.

It really depends on what you're using ranges for. Yeah, if you're convering
something with ranges (like mapping things to a certain period of time, or
an area of space), then half-open ranges are probably very common.

If, OTOH, you're storing measurement with error margins, then open ranges,
i.e. '()', are probably what you want.

I still firmly believe that consistency trumps convenience here. Specifying
the range boundaries' exclusivity/inclusivity explicitly is quite cheap...

 So, I think we either need to standardize on '[)' or allow different
 default_flags for different types. Or, always specify the inclusivity in
 the constructor (hopefully in a convenient way).

In the light of Tom's argument, my pick would be '[]'. It's seem strange
to normalize ranges over discrete types to closed ranges, yet make the
construction function expect open boundaries by default.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-10 Thread Jeff Davis
On Tue, 2011-10-11 at 03:14 +0200, Florian Pflug wrote:
 Maybe ranges over discrete types are slightly more likely to be closed,
 and ranges over continuous types slightly more likely to be open. Still,
 I very much doubt that the skew in the distribution is large enough to
 warrant the confusion and possibility of subtle bugs we introduce by making
 the semantics of a range type's constructor depend on the definition of the
 range and/or base type.

I think you persuaded me on the consistency aspect.

I'm wondering whether to do away with the default argument entirely, and
just force the user to always specify it during construction. It seems
like a shame that such pain is caused over the syntax, because in a
perfect world it wouldn't be a bother to specify it at all. I even
considered using prefix/postfix operators to try to make it nicer, but
it seems like every idea I had was just short of practical. Maybe a few
extra characters at the end aren't so bad.

I'd like to hear from some potential users though to see if anyone
recoils at the common case.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-09 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Sat, 2011-10-08 at 12:44 -0700, Jeff Janes wrote:
 When I apply this to head, make check fails with:
 
 create type textrange_en_us as range(subtype=text, collation=en_US);
 + ERROR:  collation en_US for encoding SQL_ASCII does not exist

 Thank you for pointing that out. I think I need to remove those before
 commit, but I just wanted them in there now to exercise that part of the
 code.

 Is there a better way to test collations like that?

You could add some code that assumes particular collations are present
into collate.linux.utf8.sql.  It's not going to work to depend on
anything beyond C locale being present in a mainline regression test
case.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-08 Thread Jeff Davis
On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:

 The first thing caught my eye in existing GiST code is idea of
 subtype_float. float8 has limited precision and can't respresent, for
 example, varlena values good enough. Even if we have large int8 value
 we can loose lower bits, but data distribution can be so that these
 bits are valuable. Wouldn't it better to have function like
 subtype_diff_float which returns difference between two values of
 subtype as an float? Using of such function could make penalty more
 sensible to even small difference between values, and accordingly more
 relevant.

The reason I did it that way is for unbounded ranges. With
subtype_diff_float, it's difficult for the GiST code to differentiate
between [10,) and [10,), because infinity minus anything is
infinity. But when inserting the range [100,200), the penalty for the
first one should be zero and the second one should have some positive
penalty, right?

Maybe we can still use subtype_diff_float by calling it on various pairs
of bounds to come up with a reasonable cost?

I'm open to suggestion. I'd just like to make sure that unbounded ranges
are a part of the consideration.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-08 Thread Alexander Korotkov
On Sat, Oct 8, 2011 at 1:01 PM, Jeff Davis pg...@j-davis.com wrote:

 On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:

  The first thing caught my eye in existing GiST code is idea of
  subtype_float. float8 has limited precision and can't respresent, for
  example, varlena values good enough. Even if we have large int8 value
  we can loose lower bits, but data distribution can be so that these
  bits are valuable. Wouldn't it better to have function like
  subtype_diff_float which returns difference between two values of
  subtype as an float? Using of such function could make penalty more
  sensible to even small difference between values, and accordingly more
  relevant.

 The reason I did it that way is for unbounded ranges. With
 subtype_diff_float, it's difficult for the GiST code to differentiate
 between [10,) and [10,), because infinity minus anything is
 infinity. But when inserting the range [100,200), the penalty for the
 first one should be zero and the second one should have some positive
 penalty, right?

I meant that penalty can be determined as sum of difference of old and new
bounds of range, i.e. penalty = subtype_diff_float(new_lower, old_lower)
+ subtype_diff_float(old_upper, new_upper).
When we insert [100,200) into [10,+inf), union([100,200), [10,+inf))
= [10,+inf), so penalty =  subtype_diff_float(10,10)
+  subtype_diff_float(+inf, +inf) = 0 + 0 = 0.
When we insert [100,200) into [10,), union([100,200), [10,+inf))
= [100,+inf), so penalty =  subtype_diff_float(100,10)
+  subtype_diff_float(+inf, +inf) = 99900 + 0 = 99900.

But, there are still the problem, when we'are inserting open interval when
there is no such open intervals yet. For example, we're going to insert
[0,+inf), while root page contains [0,10), [10,20), [20,30). Each penalty
will be infinity, while it seems to be better to insert it into [0,10). But,
it seems to me to be general limitation of current GiST interface, when we
have to express penalty in a single float.

--
With best regards,
Alexander Korotkov.


Re: GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-08 Thread Jeff Davis
On Sat, 2011-10-08 at 18:43 +0400, Alexander Korotkov wrote:

 I meant that penalty can be determined as sum of difference of old and
 new bounds of range, i.e. penalty = subtype_diff_float(new_lower,
 old_lower) + subtype_diff_float(old_upper, new_upper). 
 When we insert [100,200) into [10,+inf), union([100,200), [10,+inf))
 = [10,+inf), so penalty =  subtype_diff_float(10,10)
 +  subtype_diff_float(+inf, +inf) = 0 + 0 = 0.
 When we insert [100,200) into [10,), union([100,200), [10,
 +inf)) = [100,+inf), so penalty =  subtype_diff_float(100,10)
 +  subtype_diff_float(+inf, +inf) = 99900 + 0 = 99900.
 
OK, I like that. I will make the change.

 But, there are still the problem, when we'are inserting open interval
 when there is no such open intervals yet. For example, we're going to
 insert [0,+inf), while root page contains [0,10), [10,20), [20,30).
 Each penalty will be infinity, while it seems to be better to insert
 it into [0,10). But, it seems to me to be general limitation of
 current GiST interface, when we have to express penalty in a single
 float.

That seems like an acceptable limitation. I don't think my solution
handles it any better.

Regards,
Jeff Davis

 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-08 Thread Jeff Janes
On Sun, Oct 2, 2011 at 12:05 PM, Jeff Davis pg...@j-davis.com wrote:
 On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
 Looking at the patch, I noticed that it's possible to specify the default
 boundaries ([], [), (] or ()) per individual float type with the
 DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
 do more harm then good - it makes it impossible to deduce the meaning of
 e.g. numericrange(1.0, 2.0) without looking up the definition of 
 numericrange.

 I suggest we pick one set of default boundaries, ideally '[)' since that
 is what all the built-in canonization functions produce, and stick with it.

 Done.

 Also, made the range parsing even more like records with more code
 copied verbatim. And fixed some parsing tests along the way.

When I apply this to head, make check fails with:

  create type textrange_en_us as range(subtype=text, collation=en_US);
+ ERROR:  collation en_US for encoding SQL_ASCII does not exist

Is this a problem?  e.g. will it break the build-farm?

make check LANG=en_US does pass

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-08 Thread Jeff Davis
On Sat, 2011-10-08 at 12:44 -0700, Jeff Janes wrote:
 When I apply this to head, make check fails with:
 
   create type textrange_en_us as range(subtype=text, collation=en_US);
 + ERROR:  collation en_US for encoding SQL_ASCII does not exist
 
 Is this a problem?  e.g. will it break the build-farm?
 
 make check LANG=en_US does pass

Thank you for pointing that out. I think I need to remove those before
commit, but I just wanted them in there now to exercise that part of the
code.

Is there a better way to test collations like that?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


GiST for range types (was Re: [HACKERS] Range Types - typo + NULL string constructor)

2011-10-07 Thread Alexander Korotkov
On Fri, Oct 7, 2011 at 7:41 AM, Jeff Davis pg...@j-davis.com wrote:

 I'd prefer to include it in the initial patch. If the current GiST code
 is going to be replaced, then there's not much sense reviewing/testing
 it.

 You may need to consider unbounded and empty ranges specially. I made an
 attempt to do so in the current GiST code, and you might want to take a
 look at that first. I'm not particularly attached to my approach, but we
 should do something reasonable with unbounded and empty ranges.


The first thing caught my eye in existing GiST code is idea of
subtype_float. float8 has limited precision and can't respresent, for
example, varlena values good enough. Even if we have large int8 value we can
loose lower bits, but data distribution can be so that these bits are
valuable. Wouldn't it better to have function like subtype_diff_float which
returns difference between two values of subtype as an float? Using of such
function could make penalty more sensible to even small difference between
values, and accordingly more relevant.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-06 Thread Alexander Korotkov
Hi, Jeff!

Heikki has recently commited my patch about picksplit for GiST on points and
boxes:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f3bd86843e5aad84585a57d3f6b80db3c609916
I would like to try this picksplit method on ranges. I believe that it might
be much more efficient on highly overlapping ranges than current picksplit.
Range types patch isn't commited, yet. So, what way of work publishing is
prefered in this case? Add-on patch, separate branch in git or something
else?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-06 Thread Jeff Davis
On Thu, 2011-10-06 at 23:26 +0400, Alexander Korotkov wrote:
 Hi, Jeff!
 
 
 Heikki has recently commited my patch about picksplit for GiST on
 points and boxes:
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=7f3bd86843e5aad84585a57d3f6b80db3c609916
 I would like to try this picksplit method on ranges. I believe that it
 might be much more efficient on highly overlapping ranges than current
 picksplit. Range types patch isn't commited, yet. So, what way of work
 publishing is prefered in this case? Add-on patch, separate branch in
 git or something else?

Sounds great! Thank you.

The repo is available here:

http://git.postgresql.org/gitweb/?p=users/jdavis/postgres.git;a=log;h=refs/heads/rangetypes

I'd prefer to include it in the initial patch. If the current GiST code
is going to be replaced, then there's not much sense reviewing/testing
it.

You may need to consider unbounded and empty ranges specially. I made an
attempt to do so in the current GiST code, and you might want to take a
look at that first. I'm not particularly attached to my approach, but we
should do something reasonable with unbounded and empty ranges.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-02 Thread Florian Pflug
On Oct2, 2011, at 08:12 , Jeff Davis wrote:
 Done. Now range types more closely resemble records in parsing behavior.
 Patch attached.

Cool!

Looking at the patch, I noticed that it's possible to specify the default
boundaries ([], [), (] or ()) per individual float type with the
DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
do more harm then good - it makes it impossible to deduce the meaning of
e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.

I suggest we pick one set of default boundaries, ideally '[)' since that
is what all the built-in canonization functions produce, and stick with it.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-02 Thread Jeff Davis
On Sun, 2011-10-02 at 11:32 +0200, Florian Pflug wrote:
 Looking at the patch, I noticed that it's possible to specify the default
 boundaries ([], [), (] or ()) per individual float type with the
 DEFAULT_FLAGS clause of CREATE TYPE .. AS RANGE. I wonder if that doesn't
 do more harm then good - it makes it impossible to deduce the meaning of
 e.g. numericrange(1.0, 2.0) without looking up the definition of numericrange.
 
 I suggest we pick one set of default boundaries, ideally '[)' since that
 is what all the built-in canonization functions produce, and stick with it.

That sounds reasonable to me. Unless someone objects, I'll make the
change in the next patch.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-22 Thread Jeff Davis
On Thu, 2011-09-22 at 02:31 +0200, Florian Pflug wrote:
 My personal favourite would be '0', since it resembles the symbol used
 for empty sets in mathematics, and we already decided to use mathematical
 notation for ranges.
 
 If we're concerned that most of our users won't get that, then 'empty'
 would be a viable alternative I think.
 
 From a consistency POV it'd make sense to use a bracket-based syntax
 also for empty ranges. But the only available options would be '()' and '[]',
 which are too easily confused with '(,)' and '[,]' (which we already
 decided should represent the full range).

Yes, I think () is too close to (,).

Brainstorming so far:
 0   : simple, looks like the empty set symbol
 empty   : simple
 empty : a little more obvious that it's special
   : visually looks empty
 -   : also looks empty
 {}  : mathematical notation, but doesn't quite fit ranges

I don't have a strong opinion. I'd be OK with any of those.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Jeff Davis
On Mon, 2011-09-19 at 18:32 +0200, Florian Pflug wrote:
 No, but more similar the format are the easier it gets to at least factor
 the hairy parts of such a parser into a common subroutine. Assume that we
 don't support NULL as an alias for INF. What would then be the result of
 
   '[A,NULL)'::textrange?

I think that the range input should *parse* NULL in a similar way, but
reject it. So, to make it the range between two definite strings, you'd
do:

  '[A,NULL)'::textrange

which would be equal to textrange('A','NULL','[)'). Without the quotes,
it would detect the NULL, and give an error. Open to suggestion here,
though.

 Presumably, it'd be the same as textrange('A','NULL','[)'). Which think
 is a bit surprising, since '[A,NULL]'::text[] produces ARRAY['A',NULL],
 *NOT* ARRAY['A','NULL'].
 
 BTW, we currently represent infinity for floating point values as
 'Infinity', not 'INF'. Shouldn't we do the same for ranges, i.e. make
 
   int4range(0,NULL,'[)')::text
 
 return 
 
   '[0,Infinity)'?

I'm open to that, if you think it's an improvement I'll do it (but we
should probably pick one identifiable string and stick with it). What
I'd like to avoid is adding to the NULL/infinity confusion.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Jeff Davis
On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
 What I really
 care about is that we don't talk ourselves into needing a zillion
 constructor functions.  Making things work with a single constructor
 function seems to me to simplify life quite a bit, and allowing there
 seems essential for that.

I think we pretty much all agree on that. However, you did see the note
about the difficulty of using default parameters in built-in functions,
right?

I ultimately ended up with 4 constructors, each with the same name but
0, 1, 2, and 3 parameters. Suggestions welcome.

 (I am also vaguely wondering what happens if if you have a text
 range is (nubile, null) ambiguous?)

There are a few ways to handle that. I would lean toward parsing the
NULL as a special keyword, and then rejecting it (does it matter if it's
upper case?).

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Florian Pflug
On Sep21, 2011, at 09:23 , Jeff Davis wrote:
 On Mon, 2011-09-19 at 18:32 +0200, Florian Pflug wrote:
 No, but more similar the format are the easier it gets to at least factor
 the hairy parts of such a parser into a common subroutine. Assume that we
 don't support NULL as an alias for INF. What would then be the result of
 
  '[A,NULL)'::textrange?
 
 I think that the range input should *parse* NULL in a similar way, but
 reject it. So, to make it the range between two definite strings, you'd
 do:
 
  '[A,NULL)'::textrange
 
 which would be equal to textrange('A','NULL','[)'). Without the quotes,
 it would detect the NULL, and give an error. Open to suggestion here,
 though.

Hm, that seems like a reasonable compromise. As long as range types and
arrays agree on the same basic lexical rules regarding quoting and whitespace
(i.e. that spaces outside of double-quotes are non-significant, that keywords
like NULL and INF/Infinity are case-insensitive, ...) I'm happy I guess.

 BTW, we currently represent infinity for floating point values as
 'Infinity', not 'INF'. Shouldn't we do the same for ranges, i.e. make
 
  int4range(0,NULL,'[)')::text
 
 return 
 
  '[0,Infinity)'?
 
 I'm open to that, if you think it's an improvement I'll do it (but we
 should probably pick one identifiable string and stick with it). What
 I'd like to avoid is adding to the NULL/infinity confusion.

I've thought about this some more, and came to realize that the question
here really is whether

  floatrange(0, 'Infinity'::float, '[)')

and

  floatrange(0, NULL, '[)')

are the same thing or not. If they're not, then obviously using Infinity
to represent omitted bounds is going to be very confusing. If they are,
then using Infinity seems preferable. Maybe boundaries should be restricted
to numeric float values (i.e. +/-Infinity and NaN should be rejected), though
I dunno if the range type infrastructure supports that. Thoughts?

best regards,
Florian Pflug




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Robert Haas
On Wed, Sep 21, 2011 at 3:29 AM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
 What I really
 care about is that we don't talk ourselves into needing a zillion
 constructor functions.  Making things work with a single constructor
 function seems to me to simplify life quite a bit, and allowing there
 seems essential for that.

 I think we pretty much all agree on that. However, you did see the note
 about the difficulty of using default parameters in built-in functions,
 right?

 I ultimately ended up with 4 constructors, each with the same name but
 0, 1, 2, and 3 parameters. Suggestions welcome.

 (I am also vaguely wondering what happens if if you have a text
 range is (nubile, null) ambiguous?)

 There are a few ways to handle that. I would lean toward parsing the
 NULL as a special keyword, and then rejecting it (does it matter if it's
 upper case?).

Boy, that seems really weird to me.  If you're going to do it, it
ought to be case-insensitive, but I think detecting the case only for
the purpose of rejecting it is probably a mistake.  I mean, if
(nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
really seem like they ought to be any different.  Otherwise, anyone
who wants to construct these strings programatically is going to need
to escape everything and always write (cat,dog) or however you do
that, and that seems like an unnecessary imposition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Florian Pflug
On Sep21, 2011, at 14:00 , Robert Haas wrote:
 On Wed, Sep 21, 2011 at 3:29 AM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-09-19 at 12:26 -0400, Robert Haas wrote:
 (I am also vaguely wondering what happens if if you have a text
 range is (nubile, null) ambiguous?)
 
 There are a few ways to handle that. I would lean toward parsing the
 NULL as a special keyword, and then rejecting it (does it matter if it's
 upper case?).
 
 Boy, that seems really weird to me.  If you're going to do it, it
 ought to be case-insensitive, but I think detecting the case only for
 the purpose of rejecting it is probably a mistake.  I mean, if
 (nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
 really seem like they ought to be any different.

But that's exactly how arrays behave too. '{null,nutty}' is interpreted
as ARRAY[NULL,'nutty'] while '{nubile,nutty}' is interpreted as
ARRAY['nubile','nutty'].

 Otherwise, anyone
 who wants to construct these strings programatically is going to need
 to escape everything and always write (cat,dog) or however you do
 that, and that seems like an unnecessary imposition.

Unless you fully depart from what arrays you, you'll have to do that anyway
because leading and trailing spaces aren't considered to be significant in
non-quoted elements. In other words, '( cat , dog )' represents
textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').

Also, as long as we need to recognize at least one special value meaning
a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
around the need for quotes in the general case. Well, expect making the
representation of

  range(X, NULL, '[)') be '[X)',

the one of

  range(NULL, X, '(]') be '(X]'

and the one of

  range(NULL, NULL, '()') be '()',

but I'm not sure that's an improvement. And even if it was, you'd still
need to quote X if it contained one of (,),[,] or ,. So most
client would probably still choose to quote unconditionally, instead of
detecting whether it was necessary or not.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Robert Haas
On Wed, Sep 21, 2011 at 8:41 AM, Florian Pflug f...@phlo.org wrote:
 Boy, that seems really weird to me.  If you're going to do it, it
 ought to be case-insensitive, but I think detecting the case only for
 the purpose of rejecting it is probably a mistake.  I mean, if
 (nubile, nutty) is OK, then (nubile, null) and (null, nutty) don't
 really seem like they ought to be any different.

 But that's exactly how arrays behave too. '{null,nutty}' is interpreted
 as ARRAY[NULL,'nutty'] while '{nubile,nutty}' is interpreted as
 ARRAY['nubile','nutty'].

Oh.  Well, never mind then.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Tom Lane
Florian Pflug f...@phlo.org writes:
 On Sep21, 2011, at 14:00 , Robert Haas wrote:
 Otherwise, anyone
 who wants to construct these strings programatically is going to need
 to escape everything and always write (cat,dog) or however you do
 that, and that seems like an unnecessary imposition.

 Unless you fully depart from what arrays you, you'll have to do that anyway
 because leading and trailing spaces aren't considered to be significant in
 non-quoted elements. In other words, '( cat , dog )' represents
 textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').

Keep in mind that the array I/O behavior is widely considered to suck.
When we defined the record I/O behavior, we did not emulate that
whitespace weirdness, nor a number of other weirdnesses.  I would argue
that ranges ought to model their I/O behavior on records not arrays,
because that's not as much of a legacy syntax.

 Also, as long as we need to recognize at least one special value meaning
 a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
 around the need for quotes in the general case.

Right.  In the record case, we used an empty string for NULL, and then
had to insist on quotes for actual empty strings.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Florian Pflug
On Sep21, 2011, at 16:41 , Tom Lane wrote:
 Florian Pflug f...@phlo.org writes:
 On Sep21, 2011, at 14:00 , Robert Haas wrote:
 Otherwise, anyone
 who wants to construct these strings programatically is going to need
 to escape everything and always write (cat,dog) or however you do
 that, and that seems like an unnecessary imposition.
 
 Unless you fully depart from what arrays you, you'll have to do that anyway
 because leading and trailing spaces aren't considered to be significant in
 non-quoted elements. In other words, '( cat , dog )' represents
 textrange('cat', 'dog', '()'), *not* textrange(' cat ', ' dog ', '()').
 
 Keep in mind that the array I/O behavior is widely considered to suck.
 When we defined the record I/O behavior, we did not emulate that
 whitespace weirdness, nor a number of other weirdnesses.  I would argue
 that ranges ought to model their I/O behavior on records not arrays,
 because that's not as much of a legacy syntax.

Interesting. Funnily enough, I always assumed it was the other way around.
Probably because I don't care much for the empty-unquoted-string-is-NULL
behaviour of records. But leaving that personal opinion aside, yeah, in that
case we should make range I/O follow record I/O.

 Also, as long as we need to recognize at least one special value meaning
 a non-existing bound ('INF' or 'Infinity' or whatever), I don't see a way
 around the need for quotes in the general case.
 
 Right.  In the record case, we used an empty string for NULL, and then
 had to insist on quotes for actual empty strings.

Hm, so we'd have

  '(X,)' for range(X, NULL, '()'),
  '(,X)' for range(NULL, X, '()') and
  '(,)' for range(NULL, NULL, '()').

We'd then have the choice of either declaring

  '(X,]' to mean '(X,)',
  '[,X)' to mean '(,X)' and
  '[,]' to mean '(,)'

or to forbid the use of '[' and ']' for unspecified bounds.

(Leaving out the ',' in the case of only one bound as in my reply to Robert's
mail somewhere else in this thread doesn't actually work, since it'd be 
ambiguous whether '(X)' means range(X, NULL, '()') or range(NULL, X, '()').)

One nice property is that, apart from the different brackets used, this
representation is identical to the one used by records while still avoiding
the infinity vs. NULL confusion.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Jeff Davis
On Wed, 2011-09-21 at 17:20 +0200, Florian Pflug wrote:
 Hm, so we'd have
 
   '(X,)' for range(X, NULL, '()'),
   '(,X)' for range(NULL, X, '()') and
   '(,)' for range(NULL, NULL, '()').
 
 We'd then have the choice of either declaring
 
   '(X,]' to mean '(X,)',
   '[,X)' to mean '(,X)' and
   '[,]' to mean '(,)'
 
 or to forbid the use of '[' and ']' for unspecified bounds.

Right now, I just canonicalize it to round brackets if infinite. Seems
pointless to reject it, but I can if someone thinks it's better.

 (Leaving out the ',' in the case of only one bound as in my reply to Robert's
 mail somewhere else in this thread doesn't actually work, since it'd be 
 ambiguous whether '(X)' means range(X, NULL, '()') or range(NULL, X, '()').)
 
 One nice property is that, apart from the different brackets used, this
 representation is identical to the one used by records while still avoiding
 the infinity vs. NULL confusion.

OK, I like that. Slightly strange to require quoting empty strings, but
not stranger than the alternatives.

While we're at it, any suggestions on the text representation of an
empty range?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Jeff Davis
On Wed, 2011-09-21 at 13:24 +0200, Florian Pflug wrote:
 I've thought about this some more, and came to realize that the question
 here really is whether
 
   floatrange(0, 'Infinity'::float, '[)')
 
 and
 
   floatrange(0, NULL, '[)')
 
 are the same thing or not.

The unbounded side of a range is never equal to a value in the data
type's domain, so no, it's not the same.

I think that we pretty much settled on just using an empty string for
infinity in the other thread, right? So that makes this a non-issue.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-09-21 Thread Florian Pflug
On Sep21, 2011, at 19:00 , Jeff Davis wrote:
 While we're at it, any suggestions on the text representation of an
 empty range?

My personal favourite would be '0', since it resembles the symbol used
for empty sets in mathematics, and we already decided to use mathematical
notation for ranges.

If we're concerned that most of our users won't get that, then 'empty'
would be a viable alternative I think.

From a consistency POV it'd make sense to use a bracket-based syntax
also for empty ranges. But the only available options would be '()' and '[]',
which are too easily confused with '(,)' and '[,]' (which we already
decided should represent the full range).

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >