Re: [SQL] Re: Restriction by grouping problem.

2001-07-27 Thread Philip Warner

At 11:47 27/07/01 +1000, Philip Warner wrote:
>
>...the FROM statement would be a lot nicer if your had a table of SIDs, or
>at least had a good way of generating the list of SIDs.
>

A final (& possibly best) option, is:

Select 
  MDT.GETFILE,
  [anything else from 'best' record]
From
(Select Distinct SID From LOGS) as S,
(Select * From LOGS L Where L.SID = S.SID 
 Order By DATETIME Asc Limit 1) as MDT

But unfortunately PG 7.1 does not allow this - the second select can not
reference the first select. I have not tried it in 7.1.2.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



[SQL] nullif BUG???

2001-07-27 Thread guard

I try

> select nullif(NULL,5)

++
| case   |
++
|  |
++



--




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [SQL] nullif BUG???

2001-07-27 Thread Josh Berkus

Guard, 

> > select nullif(NULL,5)
> 
> ++
> | case   |
> ++
> |  |
> ++
> 

Er... what were you expecting, exactly?

Except for IS NULL (and COALESCE, which uses IS NULL) any operation
involving a NULL is also NULL.

-Josh


__AGLIO DATABASE SOLUTIONS___
   Josh Berkus
  Complete information technology  [EMAIL PROTECTED]
   and data management solutions   (415) 565-7293
  for law firms, small businessesfax 621-2533
and non-profit organizations.  San Francisco












---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[SQL] Re: D308-E9AF-4C11 : CONFIRM from pgsql-sql (subscribe)

2001-07-27 Thread Gonzo Rock

A Question for those of you who consider yourself crack Database Designers.

I am currently moving a large database(100+Tables) into pgSQL... with the intention of 
deploying against 'any' SQL database in the future. The development side will be 
rigorously using Standard SQL constructs with no unique/proprietary extensions.

My question concerns establishing the relationships.

Currently Relationships between tables are established via a Unique Integer ID like 
this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
 PartTypeID   CustomerID   PartID
*PartType*Customer PartTypeID
  Address  CustomerID
  *PartNumber(2FieldPrimaryKey)
  *PartRevision(2FieldPrimaryKey)
   PartName
  

HOWEVER; I have read lots of texts describing the Relational Design should be instead 
like this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
*PartType*Customer PartType  
  Address *PartNumber(2FieldPrimaryKey)
  *PartRevison(2FieldPrimaryKey)
   PartName
   Customer
  
Both Techniques have a unique foreign key back to the parent tables but one uses 
No.Meaningful.Info.Integer.Data for the ForeignKey while the second uses 
Human.Understandable.ForeignKeys

Is one recommended over the other??? Sure appreciate the commentary before I get in 
too deep with all these tables.

Thanks!



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[SQL] Database Design Question

2001-07-27 Thread Gonzo Rock

A Question for those of you who consider yourself crack Database Designers.

I am currently moving a large database(100+Tables) into pgSQL... with the intention of 
deploying against 'any' SQL database in the future. The development side will be 
rigorously using Standard SQL constructs with no unique/proprietary extensions.

My question concerns establishing the relationships.

Currently Relationships between tables are established via a Unique Integer ID like 
this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
 PartTypeID   CustomerID   PartID
*PartType*Customer PartTypeID
  Address  CustomerID
  *PartNumber(2FieldPrimaryKey)
  *PartRevision(2FieldPrimaryKey)
   PartName
  

HOWEVER; I have read lots of texts describing the Relational Design should be instead 
like this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
*PartType*Customer PartType  
  Address *PartNumber(2FieldPrimaryKey)
  *PartRevison(2FieldPrimaryKey)
   PartName
   Customer
  
Both Techniques have a unique foreign key back to the parent tables but one uses 
No.Meaningful.Info.Integer.Data for the ForeignKey while the second uses 
Human.Understandable.ForeignKeys

Is one recommended over the other??? Sure appreciate the commentary before I get in 
too deep with all these tables.

Thanks!



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



RE: [SQL] Database Design Question

2001-07-27 Thread Jimmie Fulton

I'm by no means a crack database designer, but I do have my ideas on this
subject.  I prefer that every table has a unique integer (auto-incrementing)
primary key.  Why?  Consistency.  With the alternative, some tables may not
have a clear-cut candidate for a unique id.  In your example, you had
"customer".  How many "John Smith"s would it take before we decide that is
not a good identifier.  On the other hand, some tables would have perfectly
logical identifiers.  Part numbers, SSNs  So, you would need to create
some tables with integer primary keys, and others would have some other
natural identifier.  That to me is inconsistent.  Every table should be,
IMHO, predictable in it's definition of a primary key.  I don't even have to
guess what the names of my primary keys are either because the are all named
_ID.  Always.  I've only come up with these thoughts on my own,
and have not extensively tried the other way, so I'd be interested in
hearing other's ideas for the other side.

Thanks,

Jimmie Fulton
Systems Administrator
Environmental Health & Safety Office
Emory University School Of Medicine


-Original Message-
From: Gonzo Rock [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 27, 2001 2:03 PM
To: [EMAIL PROTECTED]
Subject: [SQL] Database Design Question


A Question for those of you who consider yourself crack Database Designers.

I am currently moving a large database(100+Tables) into pgSQL... with the
intention of deploying against 'any' SQL database in the future. The
development side will be rigorously using Standard SQL constructs with no
unique/proprietary extensions.

My question concerns establishing the relationships.

Currently Relationships between tables are established via a Unique Integer
ID like this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
 PartTypeID   CustomerID   PartID
*PartType*Customer PartTypeID
  Address  CustomerID
  *PartNumber(2FieldPrimaryKey)
  *PartRevision(2FieldPrimaryKey)
   PartName
  

HOWEVER; I have read lots of texts describing the Relational Design should
be instead like this:

*=APrimaryKey

 PartTypesCustomer Parts
 - -
*PartType*Customer PartType  
  Address *PartNumber(2FieldPrimaryKey)
  *PartRevison(2FieldPrimaryKey)
   PartName
   Customer
  
Both Techniques have a unique foreign key back to the parent tables but one
uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second
uses Human.Understandable.ForeignKeys

Is one recommended over the other??? Sure appreciate the commentary before I
get in too deep with all these tables.

Thanks!



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



RE: [SQL] Database Design Question

2001-07-27 Thread Jimmie Fulton

You would not necessarily want clients searching by the integer-based key.
That is used internally by the database or application.  Clients wouldn't
know what the keys are anyway because they are not used for anything except
referential integrity/relation of data.  You would have Part_ID, and
Part_Number.  Part_ID is defining the relationship and Part_Number is  human
readable for searching.  If you don't want your clients to see the Part_ID,
create a view that excludes it and make them use the views for queries.  As
stated already by James, you can still make Part_Number unique even if a
primary key already exists.

Hope this makes sense/is helpful,

Jimmie Fulton
Systems Administrator
Environmental Health & Safety Office
Emory University School Of Medicine


-Original Message-
From: Gonzo Rock [mailto:[EMAIL PROTECTED]]
Sent: Friday, July 27, 2001 4:22 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: RE: [SQL] Database Design Question


OK... Fair Enough... Good Points indeed y'all.

Well... What about the problem of users trying to Query the Database??

You know... like when using Crystal Reports or something?.

SELECT * from HistoryTable
WHERE PartID = SomeInteger

vs

SELECT * from HistoryTable
WHERE PartNum = 12345636 AND PartRev = C

How are they supposed to know What the PartID is ??

Anyway, that I why I was considering changing... current users always have
trouble peering into the database... They don't quite get it.




At 02:31 PM 7/27/01 -0400, Mike Mascari wrote:
>I prefer using unique integer ids generated from sequences rather than
>keys composed of meaningful values.
>
>Advantages:
>
>Client side applications can store/handle the unique integer ids more
>readily than having to deal with composite primary keys composed of
>varying data types. For example, I can stuff the id associated with a
>particular record easily in list boxes, combo boxes, edit controls, etc.
>via SetItemData() or some other appropriate method. Its a bit more
>complicated to track database records via composite keys of something
>like: part no, vendor no, vendor group.
>
>Updating the data doesn't require cascading updates. If you use keys
>with meaning, the referential integrity constraints must support
>cascading updates so if the key changes in the primary table the change
>is cascaded to all referencing tables as well. Earlier versions of most
>databases (Access, Oracle, etc.) only provided cascading deletes under
>the assumption you would be using sequence generated keys.
>
>Downside:
>
>Many queries might require more joins against the primary table to fetch
>the relevant information associated with the numerical id, whereas keys
>composed of solely the values with which they are associated might not
>require the joins, which will speed some applications. I now have some
>queries with 20-way joins. But PostgreSQL provides a way to explicitly
>set the path the planner will choose and so the execution of the query
>is instantaneous. I'm not sure about other databases. In earlier
>versions, I had to denormalize a bit solely for performance reasons.
>
>In the past, I used to use composite keys and switched to the purely
>sequence generated path and don't regret it at all. Of course, you'll
>still have a unique constraint on the what-would-have-been meaningful
>primary key.
>
>Hope that helps,
>
>Mike Mascari
>[EMAIL PROTECTED]
>
>Gonzo Rock wrote:
>> 
>> A Question for those of you who consider yourself crack Database
Designers.
>> 
>> I am currently moving a large database(100+Tables) into pgSQL... with the
intention of deploying against 'any' SQL database in the future. The
development side will be rigorously using Standard SQL constructs with no
unique/proprietary extensions.
>> 
>> My question concerns establishing the relationships.
>> 
>> Currently Relationships between tables are established via a Unique
Integer ID like this:
>> 
>> *=APrimaryKey
>> 
>>  PartTypesCustomer Parts
>>  - -
>>  PartTypeID   CustomerID   PartID
>> *PartType*Customer PartTypeID
>>   Address  CustomerID
>>   *PartNumber(2FieldPrimaryKey)
>>   *PartRevision(2FieldPrimaryKey)
>>PartName
>> 
>> 
>> HOWEVER; I have read lots of texts describing the Relational Design
should be instead like this:
>> 
>> *=APrimaryKey
>> 
>>  PartTypesCustomer Parts
>>  - -
>> *PartType*Customer PartType
>>   Address *PartNumber(2FieldPrimaryKey)
>>   *PartRevison(2FieldPrimaryKey)
>>PartName
>>Customer
>> 
>> Both Techniques have a unique foreign key back to the parent tables but
one uses No.Meaningful.Info.Integer.Data for the ForeignKey while the second
uses Human.Understandable.ForeignKeys
>> 
>> Is one recommended over the other??? Sure appreciate the commentary
befo

Re: [SQL] Re: Restriction by grouping problem.

2001-07-27 Thread Josh Berkus

Phillip,

> What it effectively does is executes the second SELECT for each row
> of the
> first select, a bit like a column-select expression, but it allows
> access
> to all rows.

Ah!  I knew that there had to be a simple way to do what you want to do
in PGSQL:

SELECT MIN(getfile) as bestfile
FROM logs,
(SELECT SID, MAX(logtime) as lasttime FROM logs) MDT
WHERE logs.SID = MDT.SID
AND logs.logtime = MDT.lasttime
GROUP BY SID

This gives you the "getfile" from the logs with the latest timestamp.
In the event of two with the exact same logtime, it selects the first
one alphabetically so that you don't get two files for one SID.

-Josh

__AGLIO DATABASE SOLUTIONS___
   Josh Berkus
  Complete information technology  [EMAIL PROTECTED]
   and data management solutions   (415) 565-7293
  for law firms, small businessesfax 621-2533
and non-profit organizations.  San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [SQL] Re: Restriction by grouping problem.

2001-07-27 Thread Philip Warner

At 17:19 27/07/01 -0700, Josh Berkus wrote:
>Phillip,
>
>> What it effectively does is executes the second SELECT for each row
>> of the
>> first select, a bit like a column-select expression, but it allows
>> access
>> to all rows.
>
>Ah!  I knew that there had to be a simple way to do what you want to do
>in PGSQL:

The first two examples I sent do it 'simply' in PG. Unfortunately the
example you give will only return one row, and since Jeff wanted one
gettime for each SID, we'd need to modify both your and my first solution to:

Select SID,Min(GETFILE) From 
   (Select Min(DATETIME),SID From LOGS Group by SID) as MDT,
LOGS L
Where
   L.SID = MDT.SID
   And L.DATETIME = MDT.DATETIME


This will work, but will not solve the larger problem of 'give me all the
attrs of of the record with the least datetime for a given SID'. Jeff does
not need this, but it is a common problem. Simply using min/max works for
one attr, but fails to provide a consistent view of multiple attrs. PG has
no elegant single-statement solution to this problem. The SQL solution I
use in other systems is, as I mentioned earlier:

Select
  [any list of attrs from the 'best' record]
From
(Select Distinct SID From LOGS) as S, -- Get the grouping attrs
(Select * From LOGS L Where L.SID = S.SID  -- For each group do a
'row-select'
 Order By DATETIME Asc Limit 1) as MDT

The second select can be thought of as analagous to a
column-select-expression, but allowing access to all attrs of the resulting
row.

If you want f1,f2,and f3 from LOGS, then a similar result would be achieved
(inelegantly) by:

Select
  (Select F1 From LOGS L Where L.SID = S.SID 
 Order By DATETIME Asc Limit 1) as F1,
  (Select F2 From LOGS L Where L.SID = S.SID 
 Order By DATETIME Asc Limit 1) as F2,
  (Select F3 From LOGS L Where L.SID = S.SID 
 Order By DATETIME Asc Limit 1) as F3
From
(Select Distinct SID From LOGS) as S, -- Get the grouping attrs

Assuming DATETIME is unique then F1, F2, F3 will all come from the same row
and you will have a consistent record.

I have not checked, but I'd guess that PG will create a cross block with
four entries, whereas the original syntax above should just use 2 entries.
Like I said, it's just an optimizer hint.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[SQL] Why does this plpgslq always return 1?

2001-07-27 Thread John Oakes

Can anyone tell me why this always return 1?  Thanks!

CREATE FUNCTION passrate(date, date, text) RETURNS float AS '

DECLARE
 begindate ALIAS FOR $1;
 enddate ALIAS FOR $2;
 passfail ALIAS FOR $3; 
 ret float;
 countp float;
 counttotal float;
 

BEGIN

 SELECT INTO countp COUNT(*)
 FROM benchmark
 WHERE passfail = passfail;

 SELECT INTO counttotal COUNT(*)
 FROM benchmark;

 ret := countp / counttotal;
 RETURN ret;
END;'
LANGUAGE 'plpgsql';


John


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



[SQL] RE: Database Design Question

2001-07-27 Thread Joel Burton

On Fri, 27 Jul 2001, Jimmie Fulton wrote:

> I'm by no means a crack database designer, but I do have my ideas on this
> subject.  I prefer that every table has a unique integer (auto-incrementing)
> primary key.  Why?  Consistency.  With the alternative, some tables may not
> have a clear-cut candidate for a unique id.  In your example, you had
> "customer".  How many "John Smith"s would it take before we decide that is
> not a good identifier.  On the other hand, some tables would have perfectly
> logical identifiers.  Part numbers, SSNs  So, you would need to create
> some tables with integer primary keys, and others would have some other
> natural identifier.  That to me is inconsistent.  Every table should be,
> IMHO, predictable in it's definition of a primary key.  I don't even have to
> guess what the names of my primary keys are either because the are all named
> _ID.  Always.  I've only come up with these thoughts on my own,
> and have not extensively tried the other way, so I'd be interested in
> hearing other's ideas for the other side.

For large scale databases, there are theories aplenty about proper naming,
etc., and these should be be investigated. For small/simple databases,
this might be overkill in complexity and learning curve.

I teach a series of classes on small-scale database design for nonprofit
organizations, and recommend some simple rules:

 * for tables where there is no candidate key (ie a Person table where
there is no SSN, etc.), use the table name + "id", and use a SERIAL-type.

 * for tables where there is a candidate key, and that candidate key meets
all the usual criteria (always present, always unique, changes very
rarely), use the table name + "code", and use the appropriate type
(text/int/whatever), UNLESS

 * there exists a very common name for this piece of info. Rather than
calling a SSN a "personcode" (in the above example), call it the SSN

 * always put the primary key first in the table

Why not always use a SERIAL integer? It's a pain when a perfectly good
value exists. For example, in a table keeping track of US states, their
capitals, and governors, does it really make sense to code Maryland as
"45", when a memorable, unique, unlikely-to-change code already exists
("md")? Using a random number when a real-world code could do only forces
your user to do that lookup themselves.

[apologies to the international readers: Maryland is a state in the USA,
and "MD" is the postal code abbreviation for it]

I think that you could make some basic rules that would give you a system
that is intuitive and easy-to-rememeber, _without_ sacrificing clarity for
rigidity.

hth,
-- 
Joel Burton   <[EMAIL PROTECTED]>
Director of Information Systems, Support Center of Washington


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [SQL] Why does this plpgslq always return 1?

2001-07-27 Thread Stephan Szabo

On Fri, 27 Jul 2001, John Oakes wrote:

> Can anyone tell me why this always return 1?  Thanks!
> 
> CREATE FUNCTION passrate(date, date, text) RETURNS float AS '
> 
> DECLARE
>  begindate ALIAS FOR $1;
>  enddate ALIAS FOR $2;
>  passfail ALIAS FOR $3; 
>  ret float;
>  countp float;
>  counttotal float;
>  
> 
> BEGIN
> 
>  SELECT INTO countp COUNT(*)
>  FROM benchmark
>  WHERE passfail = passfail;

Ehhh? Were you expecting that to substitue
for *one* of those passfails and not the
other? ;)

Rename the alias variable to something else.

> 
>  SELECT INTO counttotal COUNT(*)
>  FROM benchmark;
> 
>  ret := countp / counttotal;
>  RETURN ret;
> END;'
> LANGUAGE 'plpgsql';


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly