RE: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-10 Thread David Bennett

The only problem with 'enum' is that all of the possible values must be
specified at CREATE time.  A logical extension to this would be to allow for
'dynamic extensions' to the list.

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, July 03, 2001 4:49 PM
To: Rod Taylor
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [HACKERS] New SQL Datatype RECURRINGCHAR


"Rod Taylor" <[EMAIL PROTECTED]> writes:
> This is rather like MySQL's enum.

Yes.  If we were going to do anything like this, I'd vote for stealing
the "enum" API, lock stock and barrel --- might as well be compatible.

regards, tom lane


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-10 Thread David Bennett

>> It's apparent that there is a lot of duplicate space used in the storage
>> of this information.  The idea is if order.status was stored as a
>> RECURRINGCHAR
>> then the only data stored for the row would be a reference to the value
of
>> the column. The actual values would be stored in a separate lookup table.

>You should instead have another table with two columns, order_status_id
>and order_status_desc, and join with it to get your data.

The idea is to simplify the process of storing and accessing the data.
Joins required
a deeper knowledge of the relational structure.  This also complicates
application
programming,  two tables must be maintained instead of just one.

>> select distinct {RECURRINGCHAR} from {table}
>>
>>can be radically optimized

> select distinct order_status_desc from order_status_lookup

Again the idea is to simplify.  Reduce the number of tables required to
represent a business model.

>>  - Eliminates use of joins and extended knowledge of data relationships
>>   for adhoc users.

> For adhoc users, you can create a view so they won't be aware of joins.

Now we have a master table,  a lookup table AND a view?
even more complication

>> It is often an advantage to actually store an entire word representing a
>> business meaning as the value of a column (as opposed to a reference
>> number or mnemonic abbreviation ).  This helps to make the system
>> 'self documenting' and adds value to users who are performing adhoc
>> queries on the database.

> No, that is against good database design and any database normalization.

I would like to hear your argument on this.  I don't see how optimizing
the storage of reference value breaks a normalization rule.

--Dave


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-10 Thread David Bennett

> various disagreements and "quotes"...

I agree that you disagree :)

RECURRINGCHAR does not break normal form.  It simply optimizes the storage
of reference values (recurring keys).  This allows for the use of  'long
words' as reference values with a great deal of system storage savings and a
boost in performance in certain circumstances.  This is more a form of
'compression' then anything else, as a matter of fact,  this is very similar
to the LZ78 family of substitutional compressors.

  http://www.faqs.org/faqs/compression-faq/part2/section-1.html

The advantage here is that we are targeting a normalized value in it's
atomic state,  The recurrence rate of this these values is extremely high
which allows us to store this data in a very small space and optimize the
access to this data by using the 'dictionary' that we create.

>What if tomorrow you will need to change text name for "OPEN" status to
>"OPEN_PENDING_SOMETHING"? With your design, you will need to update all
>rows in the table changing it. With normalized design, you just update the
>lookup table. Etc, etc.

In either model you would:

update master_table set status='OPEN_PENDING_SOMETHING' where status='OPEN'

This would not change,  in fact,  even in a normalized design you wouldn't
change the lookup table (parent) key.  Perhaps you are misunderstanding my
initial concept.  The MySQL 'enum' is close.  However,  it is static and
requires you to embed business data (your key list) in the DDL.  The idea I
have here is to dynamically extend this list as needed.  I am not saying
that the value can't relate to a parent (lookup) table.  It's just not
necessary if the value is all that is needed.

--Dave (Hoping some other SQL developers are monitoring this thread :)


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



RE: [HACKERS] New SQL Datatype RECURRINGCHAR

2001-07-10 Thread David Bennett

Alex,

I think I fully understand your position. Let me put wrap up our
conversation so far.


Given the application requirements:

  1) contacts have a type.

  2) new types must be added on the fly as needed.

  3) types names rarely change.

  4) the number of contacts should scale to support millions of records.

  5) the number of types will be limited to under 64k

  6) Users must be able to easily query contacts with readable types.


-
In a nutshell you are recommending:
-

  create table contact_type (
code  int2,
typechar(16),
PRIMARY KEY ( code )
  );

  create table contact (
number  serial,
namechar(32),
type  int2,
PRIMARY KEY ( number ),
FOREIGN KEY ( type ) REFERENCES contact_type ( code )
  );

  create view contact_with_readble_type as (
select c.number as number,
   c.name as name,
   t.type as type
from
   contact c,
   contact_type t
  );

* To build a type lookup table:

  1) Select type and code from contact_type
  2) Build UI object which displays type and returns code

* In order to insert a new record with this model:

  1) Look up to see if type exists
  2) Insert new type
  3) Get type ID
  4) Insert contact record

* The adhoc query user is now faced with
  the task of understanding 3 data tables.

-
With recurringchar you could do this easily as:
-

  create table contact (
number  serial,
namechar(32),
type  recurringchar1,
PRIMARY KEY ( number ),
  );

* To build a type lookup table:

  1) Select distinct type from contact (optimized access to recurringchar
dictionary)
  2) Build UI object which displays and returns type.

* In order to insert a new record with this model:

  1) Insert contact record

* The adhoc query user has one data table.

-

Granted,  changing the value of contact_type.type would require edits to the
contact records.
It may be possible to add simple syntax to allow editing of a 'recurringchar
dictionary' to
get around isolated problem which would only exist in certain applications.

Actually,  maybe 'dictionary' or 'dictref' would be a better name for the
datatype.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html