Thomas
Jason Hihn wrote:
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Greg Spiegelberg Sent: Wednesday, October 08, 2003 3:11 PM To: PgSQL Performance ML Subject: Re: [PERFORM] Compare rows
Josh Berkus wrote:
Greg,probably right ...
Hmmm ... if few of those columns are NULL, then you areThe data represents metrics at a point in time on a system for network, disk, memory, bus, controller, and so-on. Rx, Tx, errors, speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole
process from data gathering, methods of temporarily storing then
loading to the database. Initially, 37+ tables were in use but
the one big-un has saved us over 3.4 minutes.
this is probably the most normalized design. If, however,many of columns
are NULL the majority of the time, then the design you shouldbe using is a
vertial child table, of the form ( value_type | value ).between instances
Such a vertical child table would also make your comparison
*much* easier, as it could be executed via a simple4-table-outer-join and 3
where clauses. So even if you don't have a lot of NULLs, youprobably want
to consider this.You lost me on that one. What's a "vertical child table"?
Parent table Fkey | Option | Value
------------------+--------+-------
| OS | Solaris
| DISK1 | 30g
^^^^^^^^ ^^^-- values fields are values in a column rather than 'fields'
Statistically, about 6% of the rows use more than 200 of the columns, 27% of the rows use 80-199 or more columns, 45% of the rows use 40-79 columns and the remaining 22% of the rows use 39 or less of the columns. That is a lot of NULLS. Never gave that much thought.
To ensure query efficiency, hide the NULLs and simulate the multiple tables I have a boatload of indexes, ensure that every query makees use of an index, and have created 37 views. It's worked pretty well so far
then I'd suggestThe reason for my initial question was this. We save changes only.If re-designing the table per the above is not a possibility,
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.
that you locate 3-5 columns that:or less of the
1) are not NULL for any row;
2) combined, serve to identify a tiny subset of rows, i.e. 3%
table.There are always, always, always 7 columns that contain data.
Then put a multi-column index on those columns, and do yourcomparison.
Hopefully the planner should pick up on the availablity of theindex and scan
only the rows retrieved by the index. However, there is the distinct possibility that the presence of 637 WHERE criteria willconfuse the planner,
causing it to resort to a full table seq scan; in that case,you will want to
use a subselect to force the issue.That's what I'm trying to avoid is a big WHERE (c1,c2,...,c637) <> (d1,d2,...,d637) clause. Ugly.
Or, as Joe Conway suggested, you could figure out some kind ofvalue hash that
uniquely identifies your rows.
I've given that some though and though appealing I don't think I'd care to spend the CPU cycles to do it. Best way I can figure to accomplish it would be to generate an MD5 on each row without the timestamp and store it in another column, create an index on the MD5 column, generate MD5 on each line I want to insert. Makes for a simple WHERE...
Okay. I'll give it a whirl. What's one more column, right?
Greg
-- Greg Spiegelberg Sr. Product Development Engineer Cranel, Incorporated. Phone: 614.318.4314 Fax: 614.431.8388 Email: [EMAIL PROTECTED] Cranel. Technology. Integrity. Focus.
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
---------------------------(end of broadcast)--------------------------- TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html