subject:"\[HACKERS\] VARIANT \/ ANYTYPE datatype"

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Robert Haas

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 It seems to me a reasonable way to implement VARIANT would be to have
 a data type called VARIANT that stores an OID of the inner type at the
 beginning, followed by the binary data.

That's likely to be how it gets implemented, but you seem to have
missed the point of some of the discussion upthread: the big problem
with that is that someone might type DROP TYPE foo, and when they
do, you need an efficient way to figure out whether foo is in use
inside an instance of the variant type anywhere in the system.  The
devil is in the details...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Andrew Dunstan




On 05/11/2011 07:53 AM, Robert Haas wrote:

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com  wrote:

It seems to me a reasonable way to implement VARIANT would be to have
a data type called VARIANT that stores an OID of the inner type at the
beginning, followed by the binary data.

That's likely to be how it gets implemented, but you seem to have
missed the point of some of the discussion upthread: the big problem
with that is that someone might type DROP TYPE foo, and when they
do, you need an efficient way to figure out whether foo is in use
inside an instance of the variant type anywhere in the system.  The
devil is in the details...



Actually, I thought that was the variant (pun intended) Tom objected to, 
and wanted declared variant types instead, precisely so we could know 
where we'd need to look if a type was dropped. Using that you might 
still use the type oid at the beginning of the field, but there are 
other possibilities too (e.g. a single byte indicating which known 
variant is in use).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Joseph Adams

On Wed, May 11, 2011 at 7:53 AM, Robert Haas robertmh...@gmail.com wrote:
 That's likely to be how it gets implemented, but you seem to have
 missed the point of some of the discussion upthread: the big problem
 with that is that someone might type DROP TYPE foo, and when they
 do, you need an efficient way to figure out whether foo is in use
 inside an instance of the variant type anywhere in the system.  The
 devil is in the details...

Sorry, I missed that.  That in mind, I think I would lean more toward
the union proposal as well.  Can anyone think of a case where VARIANT
would be more useful?

As for using one or two bytes to store the type of a UNION, that
creates a problem when you want to extend the union in the future.
That is, if a UNION is simply a collection of possible types values of
the UNION type can hold.

If UNION is implemented more like a tagged union:

CREATE TYPE token AS TAGGED UNION (identifier TEXT, keyword TEXT,
number INT);

Then the problem of altering it is much like the problem of altering an ENUM.

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:
 Examples of open union types could be number, which all the numeric types
 compose, and so you can know say that you can use the generic numeric
 operators on values you have simply if their types compose the number union
 type, and it still works if more numeric types appear later.  Likewise, the
 string open union could include both text and blob, as both support
 catenation and substring matches or extraction, for example.

 This would aid to operator overloading in a generic way, letting you use the
 same syntax for different types, but allowing types to mix is optional; eg,
 you could support add(int,int) and add(real,real) without supporting
 add(int,real) etc but the syntax add(x,y) is shared, and you do this
 while still having a strong type system; allowing the mixing is optional
 case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a

I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).

To express the (+) example in PostgreSQL, it would be tempting to simply say:

add(real, real) returns real

However, what if each real is a different type (e.g. INT and FLOAT).
Is that allowed?  In the Haskell example above, (+) constraints both
of its arguments to the same type.  In ad-hoc syntax, it would look
like this in PostgreSQL:

real anyelement = add(anyelement, anyelement) returns anyelement

Another thing to consider: attempting to use a type class as a column
type, e.g.:

CREATE TABLE foo (n real);

Normally in Haskell, type information is passed implicitly as
parameters (hence the term parametric polymorphism), rather than
carried alongside values (like in object-oriented languages).  In the
case above, the type information would have to be carried with each
value.  Haskell actually supports this, but under a somewhat-weird
extension called Existential types (see
http://www.haskell.org/haskellwiki/Existential_type#Examples for an
example).  It isn't terribly useful in Haskell, and I don't think it
will be in PostgreSQL either.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Alvaro Herrera

Excerpts from Simon Riggs's message of mar may 10 17:57:20 -0400 2011:
 On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:

  both Oracle and MS-SQL have it
 
 Do they? What types are they called?

ANYTYPE

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Robert Haas

On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net 
 wrote:
 Examples of open union types could be number, which all the numeric types
 compose, and so you can know say that you can use the generic numeric
 operators on values you have simply if their types compose the number union
 type, and it still works if more numeric types appear later.  Likewise, the
 string open union could include both text and blob, as both support
 catenation and substring matches or extraction, for example.

 This would aid to operator overloading in a generic way, letting you use the
 same syntax for different types, but allowing types to mix is optional; eg,
 you could support add(int,int) and add(real,real) without supporting
 add(int,real) etc but the syntax add(x,y) is shared, and you do this
 while still having a strong type system; allowing the mixing is optional
 case-by-case.

 Coming from a Haskell perspective, this is a great idea, but I don't
 think the union feature should be used to implement it.

I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
numeric.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan


Robert Haas wrote:

On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
joeyadams3.14...@gmail.com wrote:

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan dar...@darrenduncan.net wrote:

Examples of open union types could be number, which all the numeric types
compose, and so you can know say that you can use the generic numeric
operators on values you have simply if their types compose the number union
type, and it still works if more numeric types appear later.  Likewise, the
string open union could include both text and blob, as both support
catenation and substring matches or extraction, for example.

This would aid to operator overloading in a generic way, letting you use the
same syntax for different types, but allowing types to mix is optional; eg,
you could support add(int,int) and add(real,real) without supporting
add(int,real) etc but the syntax add(x,y) is shared, and you do this
while still having a strong type system; allowing the mixing is optional
case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.


I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
numeric.


In the case of the open union numeric, one point is that users or extensions 
could come up with new types that add themselves to the union, for example say a 
type for complex numbers (I didn't see a built-in such).


But I'm just citing numeric as an example; there would be a lot more in 
practice, potentially one for every individual type, so for example if operators 
were defined for the open union rather than for the base type, then 
users/extensions could define their own types and easily declare you can use it 
like this type but its different in some important way, which may just be an 
implementation difference.  Operations that don't care about the differences can 
just be written against the open union type where they just work and those that 
do care can be more specific.


Joseph Adams said:

Coming from a Haskell perspective, this is a great idea, but I don't
think the union feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:

data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:

(+) :: (Num a) = a - a - a


While closed unions would often be used for your first examlpe, I see they're 
still useful for type classes as well.  Especially in the face of open unions 
being available, the closed unions let users say, no, I don't really want the 
meaning of this union type to change just because someone else declares a new 
type in scope (that adds itself to the union).  For example, one could declare a 
system_numeric closed union type that only includes Pg built-in numerics and 
users of that can be confident that nothing about it will change later unless 
the definition of system_numeric itself or the types it unions are changed.  But 
open unions would be preferred in places they wouldn't cause trouble, where you 
want to allow easier user extensibility.



I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).


I think that Haskell and other functional languages have a lot to teach 
relational DBMSs and I see them as being highly compatible.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Darren Duncan


To follow-up my earlier comments ...

I suspect for practical purposes we may want to limit the scope of some type 
features.


For example, the greatest benefits for open union / mixin types is with 
routines/operators, not so much with tables.


So, Pg could choose to support open unions but only for routines, where the 
declared types of table columns must still be other types.  Then you don't have 
to worry about searching your data for where one might be used, but only search 
your system catalog for routines/operators that use it.


But closed unions should be supported in tables too.

-- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-11 Thread Tom Lane

Darren Duncan dar...@darrenduncan.net writes:
 But I'm just citing numeric as an example; there would be a lot more
 in practice, potentially one for every individual type, so for example
 if operators were defined for the open union rather than for the base
 type, then users/extensions could define their own types and easily
 declare you can use it like this type but its different in some
 important way, which may just be an implementation difference.
 Operations that don't care about the differences can just be written
 against the open union type where they just work and those that do
 care can be more specific.

I'm just an old-school abstract data type hacker, but I don't see
anything in what you're saying that doesn't work today in our existing
type system: with overloaded and/or polymorphic operators and functions
you can get all those effects.  Maybe there would be some small gain in
ability to share code for tasks that fall between single-data-type and
works-for-anything cases, but it looks like a pretty marginal
improvement from here; probably not worth the cost and compatibility
implications of a major overhaul of the type system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Bruce Momjian

Darren Duncan wrote:
 To follow-up, an additional feature that would be useful and resembles union 
 types is the variant where you could declare a union type first and then 
 separately other types could declare they are a member of the union.  I'm 
 talking about loosely what mixins or type-roles or interfaces etc are in 
 other 
 languages.  The most trivial example would be declaring an ENUM-alike first 
 and 
 then separately declaring the component values where the latter declare they 
 are 
 part of the ENUM, and this could make it easier to add or change ENUM values. 
 But keep in mind that this is a distinct concept from what we're otherwise 
 talking about as being union types. -- Darren Duncan

Should this be a TODO item?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Alvaro Herrera

Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:
 Darren Duncan wrote:
  To follow-up, an additional feature that would be useful and resembles 
  union 
  types is the variant where you could declare a union type first and then 
  separately other types could declare they are a member of the union.  I'm 
  talking about loosely what mixins or type-roles or interfaces etc are in 
  other 
  languages.  The most trivial example would be declaring an ENUM-alike first 
  and 
  then separately declaring the component values where the latter declare 
  they are 
  part of the ENUM, and this could make it easier to add or change ENUM 
  values. 
  But keep in mind that this is a distinct concept from what we're otherwise 
  talking about as being union types. -- Darren Duncan
 
 Should this be a TODO item?

The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Darren Duncan


Alvaro Herrera wrote:

Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:

Darren Duncan wrote:
To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan

Should this be a TODO item?


The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.


From the perspective of users, the single greatest distinction between these 2 
kinds of unions is being closed versus being open, and that is the primary 
reason to choose one over the other.


A closed union is the C-style, where the union type declares what other types or 
values it ranges over.  The closed union is best when the union definer can 
reasonably assume that the union won't either ever or would rarely be changed, 
and in particular can assume that application or database code would have 
knowledge of the parts that it deals specially with, so it can be assumed that 
if the closed union type ever is changed then any code designed to use it may be 
changed at the same time.


A good example for a closed union would be a boolean type which just ranges over 
the two singletons false and true or an order type which ranges just over the 
three singletons decrease, same, increase.  Or a type which enumerates the 7 
days of the week, as this is unlikely to change in the life of a system.


An open union is the mixin style, where the component types declare they are 
part of the union.  The open union is best when it is likely that there would be 
either user-defined or extension-defined new types for the union to come along 
later, and we want to have code that can be generic or polymorphic for any types 
that can be used in particular ways.


Examples of open union types could be number, which all the numeric types 
compose, and so you can know say that you can use the generic numeric operators 
on values you have simply if their types compose the number union type, and it 
still works if more numeric types appear later.  Likewise, the string open union 
could include both text and blob, as both support catenation and substring 
matches or extraction, for example.


This would aid to operator overloading in a generic way, letting you use the 
same syntax for different types, but allowing types to mix is optional; eg, you 
could support add(int,int) and add(real,real) without supporting 
add(int,real) etc but the syntax add(x,y) is shared, and you do this while 
still having a strong type system; allowing the mixing is optional case-by-case.


Supporting the open union is closer to supporting ANYTYPE while the closed union 
isn't so much.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Simon Riggs

On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:

 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.
 I note that this has been requested in the past:
 http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

I think its a reasonably common use case.

Would it be possible to do this with a typed hstore? Seems easier to
add something there than it would be to add the VARIANT type as
discussed here.


 both Oracle and MS-SQL have it

Do they? What types are they called?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Eric McKeeth

On Tue, May 10, 2011 at 3:57 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org
 wrote:

  A customer came to us with this request: a way to store any data in a
  column.  We've gone back and forth trying to determine reasonable
  implementation restrictions, safety and useful semantics for them.
  I note that this has been requested in the past:
  http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

 I think its a reasonably common use case.

 Would it be possible to do this with a typed hstore? Seems easier to
 add something there than it would be to add the VARIANT type as
 discussed here.


  both Oracle and MS-SQL have it

 Do they? What types are they called?

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


MS SQL Server calls it's variant type 'sql_variant', but it's limited to a
subset of the data types they support. Basically, it can store any numeric
type, or any binary or text type with a constrained length. No timestamps,
geometry, XML, user-defined types, etc. allowed. So it's not really as much
of an any value type as it might look on the surface. Don't know any
details of Oracle's implementation.

-Eric

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Joseph Adams

It seems to me a reasonable way to implement VARIANT would be to have
a data type called VARIANT that stores an OID of the inner type at the
beginning, followed by the binary data.  When you say
pg_typeof(somevariant), you'll get 'variant'.  Instead, you'd use a
function like this:

variant_typeof(VARIANT) returns REGTYPE

VARIANT could then be casted to/from other types.  I think the rules
for converting to/from variant should be a little stronger than normal
PostgreSQL casting rules.  For example:

SELECT '12345'::TEXT::INT; -- allowed
SELECT '12345'::TEXT::VARIANT::INT; -- not allowed
SELECT '12345'::TEXT::VARIANT::TEXT::INT; -- allowed

I'm not sure how conversions to/from VARIANT could be implemented
other than creating conversion functions for every type.  It'd be nice
if we could avoid that.

A more permissive way to convert out of VARIANT might be to have a
function like this:

variant_unwrap(VARIANT) returns TEXT -- user casts to desired type

I suppose the in/out functions could prefix the value with the type
name and a colon:

 SELECT '12345'::INT::VARIANT;
   variant
-
integer:12345

The VARIANT type, or similar, would be useful for the JSON data type
I've been intermittently working on, as it would allow us to create a
function like this:

from_json(JSON) returns VARIANT

from_json would unwrap a JSON string/number/bool/null/array,
converting it to a VARIANT whose inner type is TEXT / (INT or BIGINT
or DOUBLE or NUMERIC) / BOOL / [null] / ARRAY.  In the [null] case,
from_json would actually return NULL (I see no need for VARIANT to
wrap nulls).

This is rather type-safe compared to what I currently have:

from_json(JSON) returns TEXT -- user casts to desired type

By returning variant, we can then cast to the desired type, and if the
cast is invalid, a type error will occur.


Joey Adams

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Joseph Adams

On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
joeyadams3.14...@gmail.com wrote:
 The VARIANT type, or similar, would be useful for the JSON data type
 I've been intermittently working on, as it would allow us to create a
 function like this:

    from_json(JSON) returns VARIANT

This occurred to me: if PostgreSQL functions could return ANYELEMENT,
then we could just say:

from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value

Likewise, the variant conversion functions (which could be invoked
automatically) could be done this way:

to_variant(ANYELEMENT) returns VARIANT
from_variant(VARIANT) returns ANYELEMENT

However, I'm not familiar enough with the innards of PostgreSQL's type
system to know if returning ANYELEMENT would be possible and make
sense.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Pavel Stehule

2011/5/11 Joseph Adams joeyadams3.14...@gmail.com:
 On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
 joeyadams3.14...@gmail.com wrote:
 The VARIANT type, or similar, would be useful for the JSON data type
 I've been intermittently working on, as it would allow us to create a
 function like this:

    from_json(JSON) returns VARIANT

 This occurred to me: if PostgreSQL functions could return ANYELEMENT,
 then we could just say:

    from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value

 Likewise, the variant conversion functions (which could be invoked
 automatically) could be done this way:

    to_variant(ANYELEMENT) returns VARIANT
    from_variant(VARIANT) returns ANYELEMENT

 However, I'm not familiar enough with the innards of PostgreSQL's type
 system to know if returning ANYELEMENT would be possible and make
 sense.


it isn't possible. You can have a ANYELEMENT on input and output, or
ANYELEMENT on input, but you ca't have ANYELEMENT just on output. The
analogy for ANYELEMENT is C++ template.

Regards

Pavel Stehule

VARIANT is maybe good idea - it can simplify a PL/pgSQL code - but
though anybody about using this type in planner?


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Alvaro Herrera

Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:

 I see VARIANT/ANYTYPE as the most general case of supporting union types, 
 which, 
 say, could have more specific examples of allow any number or date here but 
 nothing else.  If VARIANT is supported, unions in general ought to be also.

Okay, so aside from the performance (storage reduction) gained, there's
this argument for having variant/union types.  It seems to me that this
is indeed possible to build.  Completely general VARIANT, though, is
rather complex.  A declared union, where you specify exactly which types
can be part of the union, can be catalogued, so that the system knows
exactly where to look when a type needs to be modified.  A general
VARIANT however looks complex to me to solve.

The problem is this: if an user attempts to drop a type, and this type
is used in a variant somewhere, we would lose the stored data.  So the
drop needs to be aborted.  Similarly, if we alter a type (easy example:
a composite type) used in a variant, we need to cascade to modify all
rows using that composite.

If the unions that use a certain type are catalogued, we at least know
what tables to scan to cascade.

In a general variant, the system catalogs do not have the information of
what type each variant masquerades as.  We would need to examine the
variant's masqueraded types on each insert; if the current type is not
found, add it.  This seems a bit expensive.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Andrew Dunstan




On 05/06/2011 04:08 PM, Alvaro Herrera wrote:

Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:


I see VARIANT/ANYTYPE as the most general case of supporting union types, which,
say, could have more specific examples of allow any number or date here but
nothing else.  If VARIANT is supported, unions in general ought to be also.

Okay, so aside from the performance (storage reduction) gained, there's
this argument for having variant/union types.  It seems to me that this
is indeed possible to build.  Completely general VARIANT, though, is
rather complex.  A declared union, where you specify exactly which types
can be part of the union, can be catalogued, so that the system knows
exactly where to look when a type needs to be modified.  A general
VARIANT however looks complex to me to solve.

The problem is this: if an user attempts to drop a type, and this type
is used in a variant somewhere, we would lose the stored data.  So the
drop needs to be aborted.  Similarly, if we alter a type (easy example:
a composite type) used in a variant, we need to cascade to modify all
rows using that composite.

If the unions that use a certain type are catalogued, we at least know
what tables to scan to cascade.

In a general variant, the system catalogs do not have the information of
what type each variant masquerades as.  We would need to examine the
variant's masqueraded types on each insert; if the current type is not
found, add it.  This seems a bit expensive.





So how is a declared union going to look and operate? Something like this?

CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
CREATE TABLE bar (myunion foo);
INSERT INTO bar (myunion) VALUES (ival=1), (tval='some text');
UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;


Something like that could actually be quite nice for a number of purposes.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Darren Duncan's message of miÃ© may 04 15:33:33 -0300 2011:
 I see VARIANT/ANYTYPE as the most general case of supporting union types, 
 which, 
 say, could have more specific examples of allow any number or date here but 
 nothing else.  If VARIANT is supported, unions in general ought to be also.

 Okay, so aside from the performance (storage reduction) gained, there's
 this argument for having variant/union types.  It seems to me that this
 is indeed possible to build.  Completely general VARIANT, though, is
 rather complex.

Yeah.  I have no particular objection to a UNION over a specified set of
types, but am not very happy with the idea of an unconstrained union.

Also, a UNION declaration would allow attaching a field name to each
containable type, as I see Andrew just posted about.  That seems like
potentially a good idea.

 In a general variant, the system catalogs do not have the information of
 what type each variant masquerades as.  We would need to examine the
 variant's masqueraded types on each insert; if the current type is not
 found, add it.  This seems a bit expensive.

Not to mention race-condition-prone.  How do you ensure someone is not
inserting another instance of the variant, with some previously not used
content type, while this is going on?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Merlin Moncure

On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote:


 On 05/06/2011 04:08 PM, Alvaro Herrera wrote:

 Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:

 I see VARIANT/ANYTYPE as the most general case of supporting union types,
 which,
 say, could have more specific examples of allow any number or date here
 but
 nothing else.  If VARIANT is supported, unions in general ought to be
 also.

 Okay, so aside from the performance (storage reduction) gained, there's
 this argument for having variant/union types.  It seems to me that this
 is indeed possible to build.  Completely general VARIANT, though, is
 rather complex.  A declared union, where you specify exactly which types
 can be part of the union, can be catalogued, so that the system knows
 exactly where to look when a type needs to be modified.  A general
 VARIANT however looks complex to me to solve.

 The problem is this: if an user attempts to drop a type, and this type
 is used in a variant somewhere, we would lose the stored data.  So the
 drop needs to be aborted.  Similarly, if we alter a type (easy example:
 a composite type) used in a variant, we need to cascade to modify all
 rows using that composite.

 If the unions that use a certain type are catalogued, we at least know
 what tables to scan to cascade.

 In a general variant, the system catalogs do not have the information of
 what type each variant masquerades as.  We would need to examine the
 variant's masqueraded types on each insert; if the current type is not
 found, add it.  This seems a bit expensive.

 So how is a declared union going to look and operate? Something like this?

    CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
    CREATE TABLE bar (myunion foo);
    INSERT INTO bar (myunion) VALUES (ival=1), (tval='some text');
    UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;


 Something like that could actually be quite nice for a number of purposes.

using your hypothetical example, could you cast types to the union?

select 1::int::foo;

record types would presumably work as well?  you could do a lot of
*really* neat stuff that way...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Darren Duncan


Thanks for picking up on my mentioning union types; I wasn't sure if anyone did.

Merlin Moncure wrote:

On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan and...@dunslane.net wrote:


On 05/06/2011 04:08 PM, Alvaro Herrera wrote:

Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:


I see VARIANT/ANYTYPE as the most general case of supporting union types,
which,
say, could have more specific examples of allow any number or date here
but
nothing else.  If VARIANT is supported, unions in general ought to be
also.

Okay, so aside from the performance (storage reduction) gained, there's
this argument for having variant/union types.  It seems to me that this
is indeed possible to build.  Completely general VARIANT, though, is
rather complex.  A declared union, where you specify exactly which types
can be part of the union, can be catalogued, so that the system knows
exactly where to look when a type needs to be modified.  A general
VARIANT however looks complex to me to solve.

The problem is this: if an user attempts to drop a type, and this type
is used in a variant somewhere, we would lose the stored data.  So the
drop needs to be aborted.  Similarly, if we alter a type (easy example:
a composite type) used in a variant, we need to cascade to modify all
rows using that composite.

If the unions that use a certain type are catalogued, we at least know
what tables to scan to cascade.

In a general variant, the system catalogs do not have the information of
what type each variant masquerades as.  We would need to examine the
variant's masqueraded types on each insert; if the current type is not
found, add it.  This seems a bit expensive.

So how is a declared union going to look and operate? Something like this?

   CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
   CREATE TABLE bar (myunion foo);
   INSERT INTO bar (myunion) VALUES (ival=1), (tval='some text');
   UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;


Something like that could actually be quite nice for a number of purposes.


using your hypothetical example, could you cast types to the union?

select 1::int::foo;

record types would presumably work as well?  you could do a lot of
*really* neat stuff that way...


Like with other respondents to this topic, I consider it much more useful and 
important, as well as practical, to support explicitly defined type unions than 
a wide-open ANYTYPE.


As for how it would operate:  The example above implies a union type 
implementation that is like C's concept of such, where you have to explicitly 
state how you want the value bit pattern to be interpreted, by naming ival/etc, 
rather than the DBMS just knowing that a particular value is of a specific type, 
because per-value stored metadata says so (like with SQLite).


While that might be best in practice for implementation, I had envisioned 
something more like set unions, so you could instead do it like this:


CREATE TYPE foo AS UNION (int, text, timestamptz):
CREATE TABLE bar (myunion foo);
INSERT INTO bar (myunion) VALUES (1), ('some text');
UPDATE bar SET myunion = CURRENT_TIMESTAMP;

Unlike a record type, where multiple attributes may have the same time, 
presumably with a union, they would all be distinct, and so you could use the 
type name itself to refer to each option; you don't have to make up ival or 
whatever ... unless there are situations where types don't have names.


When doing operations that are type-generic, such as equality test or 
assignment, especially assignment, between 2 things that are both declared to be 
of type foo, you could just do it with no verbosity same as if you were doing 2 
int or text or whatever.


When doing operations specific to int or text or whatever, or if you are 
assigning a foo-declared thing to an int/text-declared thing, then you'd want an 
explicit cast or assertion, such as select myunion::int + 3 as answer from bar 
where is_a(myunion,'int').


Whether you want an explicit cast to go the other way, I would assume you don't 
need to, like when you have DOMAINs; eg, I would expect the 4th line above to 
just work, because the system knows the type of CURRENT_TIMESTAMP and it knows 
that this is a member of the union type of myunion.  I see a UNION type as being 
like a DOMAIN type in reverse.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-06 Thread Darren Duncan

To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-05 Thread Merlin Moncure

On Wed, May 4, 2011 at 8:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
 As a followup idea there exists the desire to store records as records
 and not text representation of same (given differing record types, of
 course), for which it'd be more worthwhile.

 Maybe.  The conventional wisdom is that text representation of data is
 more compact than PG's internal representation by a significant factor
 --- our FAQ says up to 5x, in fact.  I know that that's including row
 overhead and indexes and so on, but I still don't find it to be a given
 that you're going to win on space with this sort of trick.

I've done a lot of testing of the text vs binary format on the wire
format...not exactly the same set of issues, but pretty close since
you have to send all the oids, lengths, etc.   Conventional wisdom is
correct although overstated for this topic.  Even in truly
pathological cases for text, for example in sending multiple levels of
redundant escaping in complex structures, the text format will almost
always be smaller.  For 'typical' data it can be significantly
smaller.  Two exceptions most people will run into are bytea obviously
and the timestamp family of types where binary style manipulation is a
huge win both in terms of space and performance.

For complex data (say 3+ levels of composites stacked in arrays),
binary type formats are much *faster*, albeit larger, via binary as
long as you are not bandwidth constrained, and presumably they would
be as well for variants. Perhaps even more so, because some of the
manipulations made converting tuple storage to binary wire formats
don't have to happen.  That said, while there are use cases for
sending highly structured data over the wire, I can't think of any for
direct storage on a table in variant type scenarios, at least not yet
:-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-05 Thread Jim Nasby

On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote:
 I'm far from convinced that storing deltas per column rather than per record 
 is a win anyway. I don't have hard numbers to hand, but my vague recollection 
 is that my tests showed it to be a design that used more space.

It depends on how many fields you're changing in one go and how wide the table 
is. It's also a PITA to identify what fields actually changed if you're storing 
everything. In the case of logging, I'd say that what's really needed is a way 
to store a table record that has an indicator of what fields actually changed 
(and possibly not storing anything for fields that didn't change). That table 
record would need to also deal with changes to the underlying table structure.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-05 Thread Andrew Dunstan




On 05/05/2011 01:00 PM, Jim Nasby wrote:

On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote:

I'm far from convinced that storing deltas per column rather than per record is 
a win anyway. I don't have hard numbers to hand, but my vague recollection is 
that my tests showed it to be a design that used more space.

It depends on how many fields you're changing in one go and how wide the table 
is. It's also a PITA to identify what fields actually changed if you're storing 
everything.


No it's not. Instead of storing OLD/NEW, store a base record and a delta 
record (an hstore with just the changed fields) for an update. This 
saves space and means you only have to calculate what changed once.




cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera

Hello,

A customer came to us with this request: a way to store any data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.

The main idea is to be able to store column values in an audit table
like this:

change_time timestamptz
table_name  name
column_name name
old_value   variant
new_value   variant
So per-column changes, which is much more convenient than the regular
idea of storing the whole NEW and/or OLD record(s).

Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

My thought was that a variant type would store the datum as
typiddata so that it would be possible to identify the datatype
stored in each column/row and interpret adequately, calling the
appropriate output function etc.  On input it would be limited to come
only from inside the system, not from the outside world, as that would
have obvious security implications; so it'd be similar to pg_node_tree
in that regard.

Now this has obvious limitations: first, any query that tries to extract
data would need to include a cast of the variant value to the
appropriate type, so that the type can be resolved early.  Thus,
trying to extract rows of different types would be forbidden.

Also, there would be a security problem with a datum storing something
whose datatype later changes (consider a user-defined record type or
things like that).  My first reaction was to do something like 
CREATE TYPE foo VARIANT OF (int, text, timestamptz);
and then you could declare old_value with type foo, which would only
allow values of the declared types.  This makes it easy to catalogue
used types in any variant, and thus easy to restrict modifying or
dropping types that are used in some variant.  However, this idea was
rejected by the customer due to the unusability: you would have to
remember to edit the variant to add the new type anytime you added a new
column to a table, which would be cumbersome.

What the customer suggested was to have a new fork, which stores
type OIDs of datatypes used in the variant.  Then when a type is to be
altered, all forks would be scanned to determine if the type is used,
and raise an error if so.  I rejected that idea as unworkable.

However, as a middle ground we agreed that we could allow a declared
variant to store any pinned type without restrictions; those can't be
changed or dropped so there's no safety concern.  Storing other types
(user-defined types, records and so on) would require some safety net.

Before spending too much time exploring a detailed design, I'd like to
hear some opinions on the basic idea.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Kevin Grittner

Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 
 A customer came to us with this request: a way to store any data
 in a column.
 
+1
 
More than once (usually in replication, interface, or audit
situations) I've had to create a table with one column each of a
number of different data types, only one of which will be used in
each row.  It would make life much simpler for programming such
things if an unknown type could be stored.  I could either cast
the types in the rows based on the related identifier column, but it
would be even nicer if they could be read unknown and pg_typeof or
something similar could be used to control flow based on the type.
 
As one example, only *last night* I created the following table as
part of the development of our shop's next generation of software. 
(Yes I know about Slony, Londiste, and others -- we need to roll
something out which integrates with existing systems, without
production disruption, over the next few years.  This does require a
custom solution.)
 
  Table public.DbTranOpValue
 Column |   Type| Modifiers
+---+---
 countyNo   | CountyNoT   | not null
 backendPid | integer   | not null
 tranStart  | TimestampT  | not null
 logRecordSeqNo | LogRecordSeqNoT | not null
 columnName | ColumnNameT | not null
 isAfter| BooleanT| not null
 textValue  | text  |
 intValue   | bigint|
 numericValue   | numeric   |
 binaryValue| bytea |
 booleanValue   | BooleanT|
 dateValue  | DateT   |
 timeValue  | TimeT   |
 timestampValue | TimestampT  |
Indexes:
DbTranOpValue_pkey PRIMARY KEY, btree
  (backendPid, tranStart, logRecordSeqNo, columnName)
 
I would much rather have had a value column of unknown type.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane

Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.

Yes, it seems rather messy.

 The main idea is to be able to store column values in an audit table
 like this:
 old_value variant
 new_value variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.

If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
it could perhaps be used in pg_statistic, in place of the rather ugly
anyarray hack that's there now.  But I note that nothing above the level
of C code can do anything very useful with the contents of pg_statistic,
and I'm not sure that having an official type would change that.

 However, as a middle ground we agreed that we could allow a declared
 variant to store any pinned type without restrictions; those can't be
 changed or dropped so there's no safety concern.

If you're going to accept that there are restrictions, I don't see that
there is a good argument against your thought of a declared union
type.  At least then it's clear what the restrictions are.  I am firmly
against exposing the notion of pinned types in any user-visible SQL
semantics, and even more firmly against the idea of creating fundamental
functionality that only works for built-in types and can't be extended
to add-on types.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Darren Duncan


Alvaro Herrera wrote:

A customer came to us with this request: a way to store any data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.


I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
say, could have more specific examples of allow any number or date here but 
nothing else.  If VARIANT is supported, unions in general ought to be also.


The most effective way of supporting VARIANT or union types in general is having 
an implementation where in the general case each value in the database knows its 
own data type rather than the data type being provided by a context such as what 
table column it is in.  For example, if rather than storing a data value 
directly we store a 2-attribute struct naming a data type and pointing to or 
holding the data value.


See how SQLite works as an example of how VARIANTs or unions could work, 
although that on its own would need to be made more comprehensive for Pg.


I claim ignorance as to how Pg currently implements these matters.

Where VARIANT/union types are supported by default, declaring more specific 
types is just a type constraint and an optimization.


Of course, when we know the type of a column/etc isn't going to be VARIANT or 
some other union type, then a simple optimization allows us to just store the 
value and have its type provided by context rather than the struct.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Merlin Moncure

On Wed, May 4, 2011 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.

 Yes, it seems rather messy.

 The main idea is to be able to store column values in an audit table
 like this:
 old_value     variant
 new_value     variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

The problem with text is that for composite/table types you lose the
ability to cast back when the structure changes.   Most of the
auditing implementations I've done cast new/old to text and drop it
directly into a single column on the audit record.  It works ok, but
you have to messily update the text later when the table changes.  For
non-variant composites you can add columns down the line and it works
ok in dependent records without too much fuss.

I think though that getting this to work such that type dependency is
via row/column instead of just column is going to be tough.  Outside
of composites, I don't see much advantages vs the text route,
performance maybe?

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Andrew Dunstan




On 05/04/2011 01:36 PM, Tom Lane wrote:



The main idea is to be able to store column values in an audit table
like this:
old_value   variant
new_value   variant
Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.



Yeah, I've been down this road once or twice, and I think that's the $64 
question.


I wrote a custom audit app two or three years ago. After several 
iterations the customer and I found that using an hstore for the old/new 
(or old record / changeset, which is what we actually use) was the most 
suitable for our use.


I think if we did this we'd need to add some sort of is_type()  and 
typeof() functions for variant objects.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Merlin Moncure

On Wed, May 4, 2011 at 2:55 PM, Andrew Dunstan and...@dunslane.net wrote:
 On 05/04/2011 01:36 PM, Tom Lane wrote:

 The main idea is to be able to store column values in an audit table
 like this:
 old_value       variant
 new_value       variant
 Currently, they use text for old_value and new_value, but this is, of
 course, not very satisfactory.

 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.


 Yeah, I've been down this road once or twice, and I think that's the $64
 question.

 I wrote a custom audit app two or three years ago. After several iterations
 the customer and I found that using an hstore for the old/new (or old record
 / changeset, which is what we actually use) was the most suitable for our
 use.

yeah -- +1 on that method.  I think it's really the right way to go
with the recent hstore enhancements.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:

  The main idea is to be able to store column values in an audit table
  like this:
  old_valuevariant
  new_valuevariant
  Currently, they use text for old_value and new_value, but this is, of
  course, not very satisfactory.
 
 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

Storage efficiency.  These people have really huge databases; small
changes in how tight things are packed makes a large difference for
them.  (For example, we developed a type to store SHA-2 digests in a
more compact way than bytea mainly because of this reason.  Also, at
some time they also wanted to apply compression to hstore keys and
values.)

As someone commented downthread, they also want to have things such as a
typeof operator.  It could be used in (say) a plpgsql function to
choose different branches of code.

Things go wild when you think about using VARIANT as a function
parameter type.  Given how complex are our current function resolution
rules I'm not really sold on making this work at all, so initially I'm
aiming at just raising an error in that case.

 If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
 it could perhaps be used in pg_statistic, in place of the rather ugly
 anyarray hack that's there now.  But I note that nothing above the level
 of C code can do anything very useful with the contents of pg_statistic,
 and I'm not sure that having an official type would change that.

Well, we could get rid of ugly hacks that are in various places in the
backend code to cope with this.  Nor sure how useful it'd be for doing
things at the SQL level.

  However, as a middle ground we agreed that we could allow a declared
  variant to store any pinned type without restrictions; those can't be
  changed or dropped so there's no safety concern.
 
 If you're going to accept that there are restrictions, I don't see that
 there is a good argument against your thought of a declared union
 type.  At least then it's clear what the restrictions are.  I am firmly
 against exposing the notion of pinned types in any user-visible SQL
 semantics, and even more firmly against the idea of creating fundamental
 functionality that only works for built-in types and can't be extended
 to add-on types.

The argument was it's too cumbersome to use (which makes sense: users
are certain to forget to add the new type to the declared union when
they add a new column to the table, possibly causing a run-time error if
the audit trigger is in place and attempts to load a new datum into the
log table.)

I understand the reluctancy to use pinned-ness in a user-visible way.
Back to the drawing board.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of miÃ© may 04 14:36:44 -0300 2011:
 Just out of curiosity, what actual functionality gain would ensue over
 just using text?  It seems like doing anything useful with the audit
 table contents would still require casting the column to text, or the
 moral equivalent of that.

 Storage efficiency.  These people have really huge databases; small
 changes in how tight things are packed makes a large difference for
 them.  (For example, we developed a type to store SHA-2 digests in a
 more compact way than bytea mainly because of this reason.  Also, at
 some time they also wanted to apply compression to hstore keys and
 values.)

Hmm.  The prototypical case for this would probably be a 4-byte int,
which if you add an OID to it so you can resolve the type is going to
take 8 bytes, plus you are going to need a length word because there is
really no alternative to the VARIANT type being varlena overall, which
makes it 9 bytes if you're lucky on alignment and up to 16 if you're
not.  That is not shorter than the average length of the text
representation of an int.  The numbers don't seem a lot better for
8-byte quantities like int8, float8, or timestamp.  It might be
marginally worthwhile for timestamp, but surely this is a huge amount of
effort to substitute for thinking of a more compact text representation
for timestamps.

Pardon me for being unconvinced.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread David E. Wheeler

On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:

 As someone commented downthread, they also want to have things such as a
 typeof operator.  It could be used in (say) a plpgsql function to
 choose different branches of code.

FWIW, pg_typeof(any) has been in core since 9.0.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Andrew Dunstan




On 05/04/2011 07:05 PM, Tom Lane wrote:

Alvaro Herreraalvhe...@commandprompt.com  writes:

Excerpts from Tom Lane's message of miÃ© may 04 14:36:44 -0300 2011:

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.

Storage efficiency.  These people have really huge databases; small
changes in how tight things are packed makes a large difference for
them.  (For example, we developed a type to store SHA-2 digests in a
more compact way than bytea mainly because of this reason.  Also, at
some time they also wanted to apply compression to hstore keys and
values.)

Hmm.  The prototypical case for this would probably be a 4-byte int,
which if you add an OID to it so you can resolve the type is going to
take 8 bytes, plus you are going to need a length word because there is
really no alternative to the VARIANT type being varlena overall, which
makes it 9 bytes if you're lucky on alignment and up to 16 if you're
not.  That is not shorter than the average length of the text
representation of an int.  The numbers don't seem a lot better for
8-byte quantities like int8, float8, or timestamp.  It might be
marginally worthwhile for timestamp, but surely this is a huge amount of
effort to substitute for thinking of a more compact text representation
for timestamps.

Pardon me for being unconvinced.




I'm far from convinced that storing deltas per column rather than per 
record is a win anyway. I don't have hard numbers to hand, but my vague 
recollection is that my tests showed it to be a design that used more space.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane

David E. Wheeler da...@kineticode.com writes:
 On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:
 As someone commented downthread, they also want to have things such as a
 typeof operator.  It could be used in (say) a plpgsql function to
 choose different branches of code.

 FWIW, pg_typeof(any) has been in core since 9.0.

But note that pg_typeof() would surely say VARIANT if applied to such
a datatype.  You'd need some other, VARIANT-specific function that knew
enough to dig into the value at runtime and get the OID therein.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Alvaro Herrera

Excerpts from Tom Lane's message of mié may 04 20:05:54 -0300 2011:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
  Just out of curiosity, what actual functionality gain would ensue over
  just using text?  It seems like doing anything useful with the audit
  table contents would still require casting the column to text, or the
  moral equivalent of that.
 
  Storage efficiency.  These people have really huge databases; small
  changes in how tight things are packed makes a large difference for
  them.  (For example, we developed a type to store SHA-2 digests in a
  more compact way than bytea mainly because of this reason.  Also, at
  some time they also wanted to apply compression to hstore keys and
  values.)
 
 Hmm.  The prototypical case for this would probably be a 4-byte int,
 which if you add an OID to it so you can resolve the type is going to
 take 8 bytes, plus you are going to need a length word because there is
 really no alternative to the VARIANT type being varlena overall, which
 makes it 9 bytes if you're lucky on alignment and up to 16 if you're
 not.

Hmm, I was hoping that we could skip making it varlena at least in some
cases ... perhaps that's hopeless, in which case we'll have to reassess
the whole idea.  Still there's the new functionality argument, though.

As a followup idea there exists the desire to store records as records
and not text representation of same (given differing record types, of
course), for which it'd be more worthwhile.

 It might be
 marginally worthwhile for timestamp, but surely this is a huge amount of
 effort to substitute for thinking of a more compact text representation
 for timestamps.

Surely if it's just for timestamps it'd be a waste of time.

 Pardon me for being unconvinced.

:-)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-04 Thread Tom Lane

Alvaro Herrera alvhe...@commandprompt.com writes:
 As a followup idea there exists the desire to store records as records
 and not text representation of same (given differing record types, of
 course), for which it'd be more worthwhile.

Maybe.  The conventional wisdom is that text representation of data is
more compact than PG's internal representation by a significant factor
--- our FAQ says up to 5x, in fact.  I know that that's including row
overhead and indexes and so on, but I still don't find it to be a given
that you're going to win on space with this sort of trick.

Another point here is that once the values in question get to be any
significant number of bytes apiece, both text and the proposed VARIANT
representation could be subject to compression, which would greatly
reduce any nominal advantage of one over the other.  Any
representational inefficiency in either would just be grist for the
compressor.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

39 matches

Mail list logo