Re: [PERFORM] Compare rows

Thomas Swan Fri, 10 Oct 2003 05:30:49 -0700

I took this approach with a former company in designing an dynamic e-commerce system. This kept the addition of new products from requiring an alteration of the schema. With an ORB manager and cache control the performance was not significantly, but the automatic extensibility and the ease of maintainabilty was greatly enhanced.

Thomas

Jason Hihn wrote:

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Greg
Spiegelberg
Sent: Wednesday, October 08, 2003 3:11 PM
To: PgSQL Performance ML
Subject: Re: [PERFORM] Compare rows
Josh Berkus wrote:
Greg,
The data represents metrics at a point in time on a system for
network, disk, memory, bus, controller, and so-on.  Rx, Tx, errors,
speed, and whatever else can be gathered.
We arrived at this one 642 column table after testing the whole process from data gathering, methods of temporarily storing then loading to the database. Initially, 37+ tables were in use but the one big-un has saved us over 3.4 minutes.
Hmmm ... if few of those columns are NULL, then you are
probably right ...

this is probably the most normalized design. If, however,

many of columns

are NULL the majority of the time, then the design you should

be using is a

vertial child table, of the form ( value_type | value ).

Such a vertical child table would also make your comparison

between instances

*much* easier, as it could be executed via a simple

4-table-outer-join and 3

where clauses. So even if you don't have a lot of NULLs, you

probably want

to consider this.

You lost me on that one. What's a "vertical child table"?
Parent table Fkey | Option | Value ------------------+--------+------- | OS | Solaris | DISK1 | 30g ^^^^^^^^ ^^^-- values fields are values in a column rather than 'fields'
Statistically, about 6% of the rows use more than 200 of the columns,
27% of the rows use 80-199 or more columns, 45% of the rows use 40-79
columns and the remaining 22% of the rows use 39 or less of the columns.
That is a lot of NULLS.  Never gave that much thought.
To ensure query efficiency, hide the NULLs and simulate the multiple
tables I have a boatload of indexes, ensure that every query makees use
of an index, and have created 37 views.  It's worked pretty well so
far
The reason for my initial question was this. We save changes only. In other words, if system S has row T1 for day D1 and if on day D2 we have another row T1 (excluding our time column) we don't want to save it.

If re-designing the table per the above is not a possibility,

then I'd suggest

that you locate 3-5 columns that: 1) are not NULL for any row; 2) combined, serve to identify a tiny subset of rows, i.e. 3%

or less of the

table.

There are always, always, always 7 columns that contain data.

Then put a multi-column index on those columns, and do your

comparison.

Hopefully the planner should pick up on the availablity of the

index and scan

only the rows retrieved by the index. However, there is the distinct possibility that the presence of 637 WHERE criteria will

confuse the planner,

causing it to resort to a full table seq scan; in that case,

you will want to

use a subselect to force the issue.

That's what I'm trying to avoid is a big WHERE (c1,c2,...,c637) <> (d1,d2,...,d637) clause. Ugly.

Or, as Joe Conway suggested, you could figure out some kind of

value hash that

uniquely identifies your rows.
I've given that some though and though appealing I don't think I'd care
to spend the CPU cycles to do it.  Best way I can figure to accomplish
it would be to generate an MD5 on each row without the timestamp and
store it in another column, create an index on the MD5 column, generate
MD5 on each line I want to insert.  Makes for a simple WHERE...
Okay. I'll give it a whirl. What's one more column, right?

Greg
--
Greg Spiegelberg
 Sr. Product Development Engineer
 Cranel, Incorporated.
 Phone: 614.318.4314
 Fax:   614.431.8388
 Email: [EMAIL PROTECTED]
Cranel. Technology. Integrity. Focus.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [PERFORM] Compare rows

Reply via email to