Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-06 Thread Greg Stark

Joe Conway [EMAIL PROTECTED] writes:

 I'd think given the preceding, it would make more sense to throw an error
 whenever trying to access an element greater than the length.

For an analogous situation in SQL I would propose

select (select foo from bar where xyz);

if there are no records in bar it returns NULL. Only if there are multiple
records in bar or some sort of error in the subquery does it produce an error.

Does SQL-99 not say anything about this case? It seems like the kind of thing
a standard should specify.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-03 Thread Greg Stark

Joe Conway [EMAIL PROTECTED] writes:

 Greg Stark wrote:
  This really ought to work, it obviously shouldn't allow you to set a[5] and
  then surreptitiously move it to a[1]. But nor should it generate an error,
  since I may well have a specific meaning for a[5] and may be planning to fill
  in a[1]..a[4] later.
  The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
  null. This could be implemented by actually storing the NULLs or else storing
  some notation that's used to adjust the base of the index to save space.
 
 I agree. I had always envisioned something exactly like that once we supported
 NULL elements. As far as the implementation goes, I think it would be very
 similar to tuples -- a null bitmask that would exist if any elements are NULL.

Well you might still want to store an internal all indexes below this are
null. That way update foo set a[1000]=1 doesn't require storing even a bitmap
for the first 999 elements. Though might make maintaining the bitmap kind of a
pain. Maintaining the bitmap might be kind of a pain anyways though because
unlike tuples the array size isn't constant.

 A related question is how to deal with non-existing array elements. Until now,
 you could do:

I would have to think about it some more, but my first reaction is that
looking up [0] should generate an error if there can never be a valid entry at
[0]. But looking up indexes above the highest index should return NULL.

There are two broad use cases I see for arrays. Using them to represent tuples
where a[i] means something specific for each i, and using them to represent
sets where order doesn't matter.

In the former case I might want to initialize my column to an empty array and
set only the relevant columns as needed. In that case returning NULL for
entries that haven't been set yet whether they're above the last entry set or
below is most consistent.

In the latter case you really don't want to be looking up anything past the
end and don't want to be storing NULLs at all. So it doesn't really matter
what the behaviour is for referencing elements past the end, but you might
conceivably want to write code like while (e = a[i++]) 




Incidentally I'm using both of these models in my current project. 

I use text[] to represent localized strings, str[1] is always English and
str[2] is always French. When I need to expand to more languages I'll add
str[3] for Spanish or whatever else. It would be a problem if I stored
something in str[2] and then found it in str[1] later. And it could be a bit
awkward to have to prefill str[3] everywhere in the whole database when the
time comes. Having it just silently return NULL would be more convenient.

I also use arrays for sets in a cache table. In that case there would never be
NULLs and the arrays are variable sized. Sometimes with thousands of entries.
The purpose of the cache table is to speed things up so storing the arrays
densely is important.

-- 
greg


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-03 Thread Joe Conway
Greg Stark wrote:
Joe Conway [EMAIL PROTECTED] writes:
I agree. I had always envisioned something exactly like that once we supported
NULL elements. As far as the implementation goes, I think it would be very
similar to tuples -- a null bitmask that would exist if any elements are NULL.
Well you might still want to store an internal all indexes below this are
null. That way update foo set a[1000]=1 doesn't require storing even a bitmap
for the first 999 elements. Though might make maintaining the bitmap kind of a
pain. Maintaining the bitmap might be kind of a pain anyways though because
unlike tuples the array size isn't constant.
I don't think it will be worth the complication to do other than a 
straight bitmap -- at least not the first attempt.

A related question is how to deal with non-existing array elements. Until now,
you could do:
I would have to think about it some more, but my first reaction is that
looking up [0] should generate an error if there can never be a valid entry at
[0]. But looking up indexes above the highest index should return NULL.
There are two broad use cases I see for arrays. Using them to represent tuples
where a[i] means something specific for each i, and using them to represent
sets where order doesn't matter.
In the former case I might want to initialize my column to an empty array and
set only the relevant columns as needed. In that case returning NULL for
entries that haven't been set yet whether they're above the last entry set or
below is most consistent.
Maybe, but you're still going to need to explicitly set the real upper 
bound element in order for the length/cardinality to be correct. In 
other words, if you really want an array with elements 1 to 1000, but 2 
through 1000 are NULL, you'll need to explicitly set A[1000] = NULL; 
otherwise we'll have no way of knowing that you really want 1000 
elements. Perhaps we'll want some kind of array_init function to create 
an array of a given size filled with all NULL elements (or even some 
arbitrary constant element).

I'd think given the preceding, it would make more sense to throw an 
error whenever trying to access an element greater than the length.

In the latter case you really don't want to be looking up anything past the
end and don't want to be storing NULLs at all. So it doesn't really matter
what the behaviour is for referencing elements past the end, but you might
conceivably want to write code like while (e = a[i++]) 
See reasoning as above. And if you did somehow wind up with a real 
NULL element in this scenario, you'd never know about it. The looping 
could always be:
  while (i++ = length)
or
  for (i = 1; i = length, i++)

Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 array_lower() and array_upper() are returning NULL for a non-null
 input, the empty array, even though lower and upper bounds are known
 just as well as they are for any other sized array. They are behaving
 as if there's something unknown about the empty array that makes it
 hard to provide a lower bound or upper bound.

Hm.  My idea of the index lower bound is the smallest index for which
there is an array member --- so I agree with Joe that it's not very
well defined for an empty array.  We could pick some arbitrary value,
such as zero, for the LB and UB to be reported for an empty array, but
I can't see any way that you could justify them except as arbitrary
choices.

I think that maybe we ought to question these two properties:
* empty array is different from NULL ... really?  Why?
* storing a value into an element of a NULL array yields
  a NULL array instead of a singleton array.
IIRC the latter is presently true because we couldn't figure out
just what dimensionality to assign, but it might be easier to agree on
that than to resolve these other arguments...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 Hm.  My idea of the index lower bound is the smallest index for which
 there is an array member --- so I agree with Joe that it's not very
 well defined for an empty array.  

Hm. The problem is that they have other implications. Like the upper bound is
one less than the index an element added to the upper end will get, and
similarly the lower bound is one more than the index that would be assigned to
an element added on the low end.

Currently there is a lower bound and upper bound in the implementation even
for empty arrays. I have empty arrays in my table that have a lower bound of
0, and they behave slightly differently than arrays with lower bounds of 1.

 I think that maybe we ought to question these two properties:
   * empty array is different from NULL ... really?  Why?
   * storing a value into an element of a NULL array yields
 a NULL array instead of a singleton array.

Well that breaks other things. Then lots of functions have to become
non-strict to work properly because they should have valid output when passed
null values. Ick.


I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Well that breaks other things. Then lots of functions have to become
 non-strict to work properly because they should have valid output when passed
 null values. Ick.
 ...
 I'm leaning towards suggesting that postgres should follow sql-99 here and
 normalize all array indexes to have a lower bound of 1.

That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1.  Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway.  The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Joe Conway
Greg Stark wrote:
array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.
Sorry, but I still disagree. There *is* something unknown about the 
lower and upper bound of an empty array because there are no bounds.

I know it's possible to work around this special case, but I'm saying it's odd
to have an irregularity in the interface. What justification is there for
breaking the invariant length = upper-lower+1 ?
I don't see the spec defined CARDINALITY as a workaround. It defines 
length as the number of elements in the array. When the array is empty, 
that value is clearly 0. Nothing strange about it.

Yes I read the examples you gave, but you a) had to work around the nit with a
special case in your function and b) still have corner cases where one of the
invariants I named fails, namely: 

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select '{}'::int[] as a, array[1,2] as b) as x;
 array_upper | ?column? 
-+--
   2 | 
(1 row)
OK, you got me with this corner case. But using what you described as 
the result int_aggregate would give you in this case (-1), you would get 
an even stranger answer (-1 + 2 = 1) that would still need to be worked 
around.

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Joe Conway
Tom Lane wrote:
I think that maybe we ought to question these two properties:
* empty array is different from NULL ... really?  Why?
I think this makes sense, similar to the difference between '' and NULL.

* storing a value into an element of a NULL array yields
  a NULL array instead of a singleton array.
Same argument. If you think of text as an array or characters, you get 
this analogy (sort of):

regression=# create table s1(f1 int, f2 text);
CREATE TABLE
regression=# insert into s1 values(1, null);
INSERT 164679 1
regression=# select f1, substr(f2, 1, 1) is null from s1;
 f1 | ?column?
+--
  1 | t
(1 row)
regression=# update s1 set f2 = 'a' || substr(f2, 2);
UPDATE 1
regression=# select f1, substr(f2, 1, 1) is null from s1;
 f1 | ?column?
+--
  1 | t
(1 row)
Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Joe Conway
Tom Lane wrote:
Greg Stark [EMAIL PROTECTED] writes:
I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1.
That would break even more things, no?

On the other hand, it'd get rid of the problem that we presently face
with dump/restore of arrays that don't have lower bound 1.  Because
pg_dump doesn't do anything to mark such values, they'll end up with
lower bound 1 after reload anyway.  The fact that we haven't heard lots
of squawks about that suggests to me that not many people are using such
arrays at present ...
The more I think about it, the more I like it. Does everyone else agree 
that a lower bound of 1 complies with the spec?

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Joe Conway
Greg Stark wrote:
I'm leaning towards suggesting that postgres should follow sql-99 here and
normalize all array indexes to have a lower bound of 1. Then array_lower and
array_upper become entirely unnecessary. Instead we just have array_length
which is exactly equivalent to my idea of array_upper.
Now we finally have something to agree on ;-)

I do think this is the way to go, but it is a significant hit to 
backward compatibility. Same is true for supporting NULL elements of 
arrays -- maybe we should bite the bullet and make both changes at the 
same time?

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 That would break even more things, no?
 
 On the other hand, it'd get rid of the problem that we presently face
 with dump/restore of arrays that don't have lower bound 1.  Because
 pg_dump doesn't do anything to mark such values, they'll end up with
 lower bound 1 after reload anyway.  The fact that we haven't heard lots
 of squawks about that suggests to me that not many people are using such
 arrays at present ...

You have to be using not only arrays, but the new 7.4 functions provided to
manipulate them. In fact I think you have to be using array_prepend
specifically. But even there since it's not a mutator it's really not that
surprising that the elements of the brand new array it's returning should have
new indexes.

In fact I suspect there are more people with hidden bugs where they depend on
arrays starting at 1. This type of bug is insidious since it's hard to test
for, your application might never generate an array with a lower bound other
than 1 until someone adds some new code using array_prepend somewhere and all
of the sudden you get strange behaviours from unrelated code.

I can have the honour of being the first squawker like you describe, but my
problem was only evidence that having such non-normalized arrays at all was
surprising. I was using int_aggregate.c which generates non-standard arrays
with lower bounds of 0. My code assumed array_upper()+1 == length. After I
dumped and restored all my counts were off by one.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Greg Stark
Joe Conway [EMAIL PROTECTED] writes:

 Greg Stark wrote:
  I'm leaning towards suggesting that postgres should follow sql-99 here and
  normalize all array indexes to have a lower bound of 1. Then array_lower and
  array_upper become entirely unnecessary. Instead we just have array_length
  which is exactly equivalent to my idea of array_upper.
 
 
 Now we finally have something to agree on ;-)
 
 I do think this is the way to go, but it is a significant hit to backward
 compatibility. Same is true for supporting NULL elements of arrays -- maybe we
 should bite the bullet and make both changes at the same time?

In fact on further thought I think they *have* to be done together.

I forgot that your code did something else cool allowing updates to extend
arrays by directly updating elements outside the current bounds. Ie:

slo= update test set a = '{}';
UPDATE 1
slo= update test set a[1] = 1;
UPDATE 1
slo= select * from test;
  a  
-
 {1}
(1 row)

But if we normalize array indexes to start at 1 then this makes it hard to
fill in the array starting at higher values. For example:

slo= update test set a = '{}';
UPDATE 1
slo= update test set a[5] = 5;
UPDATE 1
slo= select a[5] from test;
 a 
---
 5
(1 row)

This really ought to work, it obviously shouldn't allow you to set a[5] and
then surreptitiously move it to a[1]. But nor should it generate an error,
since I may well have a specific meaning for a[5] and may be planning to fill
in a[1]..a[4] later.

The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
null. This could be implemented by actually storing the NULLs or else storing
some notation that's used to adjust the base of the index to save space.


One thing that can't be made to work like it does now is extending the array
on the low end indefinitely:

slo= update test set a[1] = 1;
UPDATE 1
slo= update test set a[0] = 0;
UPDATE 1
slo= update test set a[-1] = -1;
UPDATE 1
slo= select * from test;
a 
--
 {-1,0,1}
(1 row)


If this all looks familiar it's because Perl, and other languages, also behave
this way:

bash-2.05b$ perl -e '@a = (); $a[10]=10; print join(,,@a),\n'
,,10
bash-2.05b$ perl -e '@a = (); $a[-1]=-1; print join(,,@a),\n'
Modification of non-creatable array value attempted, subscript -1 at -e line 1.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Greg Stark

Joe Conway [EMAIL PROTECTED] writes:

 Greg Stark wrote:
  array_lower() and array_upper() are returning NULL for a non-null input, the
  empty array, even though lower and upper bounds are known just as well as they
  are for any other sized array. They are behaving as if there's something
  unknown about the empty array that makes it hard to provide a lower bound or
  upper bound.
 
 Sorry, but I still disagree. There *is* something unknown about the lower and
 upper bound of an empty array because there are no bounds.

So I understand your point of view now. But I think the intuitive meaning here
for lower/upper bound as the lowest/highest index where an element is present
is only a part of the picture.

lower and upper bound are also related to other properties like where
array_prepend and array_append place things. And of course the array length.

So to give a practical example, say I was implementing a stack using an array.
I push things on by extending the array by storing in arr[array_upper(arr)+1].
(Incidentally I don't think we actually have any way to shorten an array, do
we?) As soon as I pop off the last element I lose my stack pointer. I need a
special case in my code to handle pushing elements in when the array is empty.

In reality array_append() would work fine. It's only array_upper() that's
going out of its way to make things weird. There's still an upper bound,
array_append() knows it, array_upper() just hides this value from the user.


 I don't see the spec defined CARDINALITY as a workaround. It defines length as
 the number of elements in the array. When the array is empty, that value is
 clearly 0. Nothing strange about it.

The workaround is having to have that case handled with a special case if
statement. If array_lower(), array_upper(), array_length()/CARDINALITY are all
defined in a consistent way it doesn't seem like there ought to be any special
cases in the implementations. There should be a simple rigid mathematical
relationship between them. namely upper-lower+1 = length



  test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select 
  '{}'::int[] as a, array[1,2] as b) as x;
   array_upper | ?column? -+--
 2 | (1 row)
 
 OK, you got me with this corner case. But using what you described as the
 result int_aggregate would give you in this case (-1), you would get an even
 stranger answer (-1 + 2 = 1) that would still need to be worked around.

No actually, 1 would be the correct answer, the original array would have
indexes ranging from [0,-1] and the new array would have indexes ranging from
[0,1], ie, two elements. The only strangeness is the unusual lower bound which
isn't the default for postgres arrays constructed from string literals.
Personally I prefer the zero-based indexes but as neither SQL-foo nor
backwards compatibility agree with me here I'll give that up as a lost cause
:)

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-02 Thread Joe Conway
Greg Stark wrote:
This really ought to work, it obviously shouldn't allow you to set a[5] and
then surreptitiously move it to a[1]. But nor should it generate an error,
since I may well have a specific meaning for a[5] and may be planning to fill
in a[1]..a[4] later.
The logical thing to do, I think, is pre-fill the entries a[1]..a[4] with
null. This could be implemented by actually storing the NULLs or else storing
some notation that's used to adjust the base of the index to save space.
I agree. I had always envisioned something exactly like that once we 
supported NULL elements. As far as the implementation goes, I think it 
would be very similar to tuples -- a null bitmask that would exist if 
any elements are NULL.

A related question is how to deal with non-existing array elements. 
Until now, you could do:

regression=# select f[0] from (select array[1,2]) as t(f);
 f
---
(1 row)

Even though index 0 does not exist, you get a NULL value returned 
instead of an ERROR. I'd think if we hardwire a lower bound of 1, this 
should produce an ERROR. Similarly:

regression=# select f[3] from (select array[1,2]) as t(f);
 f
---
(1 row)

Should this produce an ERROR instead of returning NULL once existing 
array elements can be NULL?

One thing that can't be made to work like it does now is extending the array
on the low end indefinitely:
slo= update test set a[1] = 1;
UPDATE 1
slo= update test set a[0] = 0;
UPDATE 1
slo= update test set a[-1] = -1;
UPDATE 1
Right. In the new world order we're describing, the latter two examples 
would have to produce errors.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Joe Conway
Josh Berkus wrote:
I'm noticing some inconsistent behavior regarding empty arrays and IS NULL 
status.For example:

net_test=# select array_upper('{}'::INT[], 1) IS NULL;
 ?column?
--
 t
(1 row)
This is correct. There are no dimensions to an empty array by 
definition. The only other way to handle this would be an ERROR. I 
followed the lead of (the pre-existing function) array_dims() when 
creating array_upper() and array_lower().

net_test=# select '{}'::INT[] IS NULL;
 ?column?
--
 f
(1 row)
This is also correct, and completely orthogonal to the first example. 
There is a difference between an empty array and NULL, just like there 
is between an empty string and NULL.

I feel that this is confusing; an empty array should be considered NULL 
everywhere or nowhere.
As I said above, that makes no more sense than saying '' == NULL

For that matter, the new array declaration syntax does not support
empty arrays:
net_test=# select ARRAY[ ]::INT[];
ERROR:  syntax error at or near ] at character 15
This is a known issue, and will not be easily fixed. We discussed it at 
some length last June/July. See especially:

http://archives.postgresql.org/pgsql-hackers/2003-06/msg01174.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01195.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01196.php
http://archives.postgresql.org/pgsql-hackers/2003-06/msg01298.php
Joe

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Josh Berkus
Joe,

 This is correct. There are no dimensions to an empty array by 
 definition. The only other way to handle this would be an ERROR. I 
 followed the lead of (the pre-existing function) array_dims() when 
 creating array_upper() and array_lower().

What about a 0?That seems more consistent to me.   If the array is empty, 
its dimensions are not NULL, meaning unknown, but in fact zero elements, 
which is a known value.  The way it works now, array_upper on a NULL array 
produces the same results as array_upper on an empty-but-non-null array.

Or is there some concept I'm missing?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Joe Conway
Josh Berkus wrote:
What about a 0?That seems more consistent to me.   If the array
is empty, its dimensions are not NULL, meaning unknown, but in
fact zero elements, which is a known value.
They cannot be 0 because 0 is a real index. They are undefined, because 
an empty array has no dimensions. It is entirely possible to have a real 
array that starts at a lower bound of 0 (or even an upper bound of 0).

regression=# select f[0] from (select 99 || array[1,2,3] as f) as t;
 f

 99
(1 row)
regression=# create table a1 (f int[]);
CREATE TABLE
regression=# insert into a1 values('{}');
INSERT 18688045 1
regression=# update a1 set f[0] = 99;
UPDATE 1
regression=# select array_upper(f,1) from a1;
 array_upper
-
   0
(1 row)
The way it works now, array_upper on a NULL array produces the same
results as array_upper on an empty-but-non-null array.
Sure, and in both cases array_upper is undefined because there are no 
array dimensions to speak of. I guess you might argue that array_upper, 
array_lower, and array_dims should all produce an ERROR on null input 
instead of NULL. But that would have been an un-backward compatible 
change for array_dims at the time array_lower and array_upper were 
created. I don't really believe they should throw an ERROR on an empty 
array though.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Josh Berkus
Joe,

 Sure, and in both cases array_upper is undefined because there are no 
 array dimensions to speak of. I guess you might argue that array_upper, 
 array_lower, and array_dims should all produce an ERROR on null input 
 instead of NULL. But that would have been an un-backward compatible 
 change for array_dims at the time array_lower and array_upper were 
 created. I don't really believe they should throw an ERROR on an empty 
 array though.

OK, I understand the logic now.   Thanks.

I guess this is another case where we're haunted by the ANSI committee's 
failure to define both and unknown and a not applicable value instead of 
the unitary NULL.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Greg Stark
Josh Berkus [EMAIL PROTECTED] writes:

 Joe,
 
  This is correct. There are no dimensions to an empty array by 
  definition. The only other way to handle this would be an ERROR. I 
  followed the lead of (the pre-existing function) array_dims() when 
  creating array_upper() and array_lower().
 
 What about a 0?That seems more consistent to me.   If the array is empty, 
 its dimensions are not NULL, meaning unknown, but in fact zero elements, 
 which is a known value.  The way it works now, array_upper on a NULL array 
 produces the same results as array_upper on an empty-but-non-null array.
 
 Or is there some concept I'm missing?

I would certainly second that. Consider all that making it NULL breaks:

length(a) != array_upper(a)-array_lower(a)

array_upper(a||b) == array_upper(a)+length(b)

If someone implements pop and push it sure makes things weird that push
doesn't always increment the length pop doesn't decrement the length until 0.

Perhaps you're worried that you have pick an arbitrary lower and upper bound
and, strangely, that the upper bound would actually be one less than the lower
bound such as [1,0]. However this isn't really any different than the normal
case. All arrays in postgres have arbitrary lower bounds.

Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
makes arrays that don't violate these invariants. For empty arrays the
dimensions are [0,-1].

This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
a standard function that constructs your style arrays my app would break. I
guess I'll have to add a coalesce(...,0) to my array_length() function to work
around it. Which only seems like strong evidence it's the wrong behaviour.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Joe Conway
Greg Stark wrote:
length(a) != array_upper(a)-array_lower(a)
[You really meant array_upper(a) - array_lower(a) + 1 I'd guess]

length(A) is a missing function at the moment; the spec actually calls 
it CARDINALITY. Once available, you would use it to determine array 
length. SQL2003 says:

  The result of cardinality expression is the number of elements of
  the result of the collection value expression.
So, when A is an empty array, CARDINALITY(A) = 0, by definition.

array_upper(a||b) == array_upper(a)+length(b)
Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec 
defines a || b as NULL. See section 6.35:

  2) If array concatenation is specified, then let AV1 be the value of
 array value expression 1 and let AV2 be the value of
 array primary.
 Case:
 a) If either AV1 or AV2 is the null value, then the result of the
array concatenation is the null value.
 b) If the sum of the cardinality of AV1 and the cardinality of AV2
is greater than IMDC, then an exception condition is raised:
data exception  array data, right truncation.
 c) Otherwise, the result is the array comprising every element of
AV1 followed by every element of AV2.
If someone implements pop and push it sure makes things weird that push
doesn't always increment the length pop doesn't decrement the length until 0.
I have no idea what you're trying to say here. Current behavior 
certainly increments length by one when you push an element (which is 
what array || element effectively does). An empty array has length 0 
before pushing an element on to it, and length 1 afterward. Pushing an 
element onto a NULL array yields NULL, which is not explicitly defined 
by the spec (that I can find), but is certainly consistent with the above.

As far as array_pop is concerned, we discussed the fact that it makes no 
sense in the context of Postgres arrays -- see the archives from last 
year in May.

Perhaps you're worried that you have pick an arbitrary lower and upper bound
and, strangely, that the upper bound would actually be one less than the lower
bound such as [1,0]. However this isn't really any different than the normal
case. All arrays in postgres have arbitrary lower bounds.
Again, I have no idea what you mean here.


Fwiw, the int_aggregate function from int_aggregate.c in the contrib section
makes arrays that don't violate these invariants. For empty arrays the
dimensions are [0,-1].
Seems rather arbitrary to me. As I said to Josh, an empty array has 
undefined bounds, literally.

This isn't hypothetical for me. Whenever I end up replacing int_aggregate with
a standard function that constructs your style arrays my app would break. I
guess I'll have to add a coalesce(...,0) to my array_length() function to work
around it. Which only seems like strong evidence it's the wrong behaviour.
Sorry, but these are not my style arrays, they are Berkley style ;-).

Anyway, CARDINALITY is what you really need -- hopefully I'll be able to 
find time to address that and some ather array items before the 7.5 
freeze. In the meantime, if you have a custom array_length function 
already, why not make it return 0 for empty arrays -- then your problems 
disappear:

create or replace function array_length(anyarray)
returns int as '
 select
  case
   when $1 = ''{}'' then 0
   else array_upper($1, 1) - array_lower($1, 1) + 1
  end
' language sql;
CREATE FUNCTION
regression=# select array_length(array[1,2,3]);
 array_length
--
3
(1 row)
regression=# select array_length('{}'::int4[]);
 array_length
--
0
(1 row)
regression=# select array[1,2,3] || '{}'::int4[];
 ?column?
--
 {1,2,3}
(1 row)
regression=# select array_upper(array[1,2,3], 1) + 
array_length('{}'::int4[]);
 ?column?
--
3
(1 row)

HTH,

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Greg Stark


Joe Conway [EMAIL PROTECTED] writes:

 Same here; this would be:
 
 array_upper(a || b) == array_upper(a) + CARDINALITY(b)
 
 and would work just fine. Note that if array-a is NULL, then the spec defines a
 || b as NULL. See section 6.35:

Why are you talking about when a is NULL? The original question was for when a
was an empty array. That's not an unknown value, it's known to be an array
containing no elements.

It sounds like this is the same type of confusion that led to Oracle treating
empty strings as NULL, which causes no end of headaches.

 Anyway, CARDINALITY is what you really need -- hopefully I'll be able to find
 time to address that and some ather array items before the 7.5 freeze. In the
 meantime, if you have a custom array_length function already, why not make it
 return 0 for empty arrays -- then your problems disappear:
 
 create or replace function array_length(anyarray)
 returns int as '
   select
case
 when $1 = ''{}'' then 0
 else array_upper($1, 1) - array_lower($1, 1) + 1
end
 ' language sql;

My argument was that having to write a special case here makes it pretty clear
the idea of equating {} with NULL is the wrong interface.

-- 
greg


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Joe Conway
Greg Stark wrote:

Joe Conway [EMAIL PROTECTED] writes:
Same here; this would be:

array_upper(a || b) == array_upper(a) + CARDINALITY(b)

and would work just fine. Note that if array-a is NULL, then the spec defines a
|| b as NULL. See section 6.35:
Why are you talking about when a is NULL? The original question was for when a
was an empty array. That's not an unknown value, it's known to be an array
containing no elements.
Did you even look at my examples at the end of the post? I showed your 
example, with an empty array, handled correctly. The mention of a NULL 
array was only for completeness.

It sounds like this is the same type of confusion that led to Oracle treating
empty strings as NULL, which causes no end of headaches.
ISTM that you're the one who's confused. There is a very clear 
distinction between a NULL array and an empty array in the present 
implementation. They are *not* treated the same:

regression=# select '{}'::int[], NULL::int[];
 int4 | int4
--+--
 {}   |
(1 row)
regression=# select '{}'::int[] is NULL, NULL::int[] is NULL;
 ?column? | ?column?
--+--
 f| t
(1 row)
You seem to be saying that because the output of certain functions that 
operate on empty arrays is NULL, it somehow implies that the array is 
being treated as NULL -- that's just plain incorrect.

My argument was that having to write a special case here makes it pretty clear
the idea of equating {} with NULL is the wrong interface.
But they're not being equated (see above), so I don't see where there's 
an issue.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] Inconsistent behavior on Array Is Null?

2004-04-01 Thread Greg Stark

Joe Conway [EMAIL PROTECTED] writes:

 You seem to be saying that because the output of certain functions that operate
 on empty arrays is NULL, it somehow implies that the array is being treated as
 NULL -- that's just plain incorrect.

Not incorrect, but a sign something weird is going on. It should only happen
if the certain functions really are looking for some property that isn't known
for that peculiar value of their parameters. iscomputable(n) perhaps, sqrt(-1)
if we abuse the rules a bit. But array_upper for a non-null array?

array_lower() and array_upper() are returning NULL for a non-null input, the
empty array, even though lower and upper bounds are known just as well as they
are for any other sized array. They are behaving as if there's something
unknown about the empty array that makes it hard to provide a lower bound or
upper bound.

slo= select array_lower('{}'::int[],1), array_upper('{}'::int[],1);
 array_lower | array_upper 
-+-
 |
(1 row)

I know it's possible to work around this special case, but I'm saying it's odd
to have an irregularity in the interface. What justification is there for
breaking the invariant length = upper-lower+1 ?



Yes I read the examples you gave, but you a) had to work around the nit with a
special case in your function and b) still have corner cases where one of the
invariants I named fails, namely: 

 Same here; this would be:
 
 array_upper(a || b) == array_upper(a) + CARDINALITY(b)
 
 and would work just fine. Note that if array-a is NULL, then the spec
 defines a || b as NULL. See section 6.35:

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select 
array[1,2] as a, array[1,2] as b) as x;
 array_upper | ?column? 
-+--
   4 |4
(1 row)

test=# select array_upper(a||b, 1), array_upper(a,1)+array_length(b) from (select 
'{}'::int[] as a, array[1,2] as b) as x;
 array_upper | ?column? 
-+--
   2 | 
(1 row)



-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Inconsistent behavior on Array Is Null?

2004-03-30 Thread Josh Berkus
Joe,

First off, pardon me if these are known things which are already fixed in CVS.  
Also, let me again thank you for all the work on Arrays in 7.4; I've been 
able to tremendously simplify quite a number of procedures in my databases 
thanks to the new array support.

Now, the issue:
I'm noticing some inconsistent behavior regarding empty arrays and IS NULL 
status.For example:

net_test=# select array_upper('{}'::INT[], 1);
 array_upper
-

(1 row)

net_test=# select array_upper('{}'::INT[], 1) IS NULL;
 ?column?
--
 t
(1 row)

net_test=# select '{}'::INT[] IS NULL;
 ?column?
--
 f
(1 row)

I feel that this is confusing; an empty array should be considered NULL 
everywhere or nowhere. For that matter, the new array declaration syntax 
does not support empty arrays:

net_test=# select ARRAY[ ]::INT[];
ERROR:  syntax error at or near ] at character 15
net_test=# select ARRAY[]::INT[];
ERROR:  syntax error at or near ] at character 14
net_test=#

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match