Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Josh Berkus
Francisco,

> I think I understand your point, however it would be very laborious after
> you do all development to find out you need to de-normalize.

Not terribly.   Views and Rules are good for this.

> On your experience at which point it would actually help to do this
> de-normalization in PostgreSQL? I know there are numerous factors ,but any
> feedback based on previous experiences would help.

My experience?   If you're running on good hardware, it's completely 
unnecessary to vertically partition the table.   The only thing I'd do would 
be to look for columns which are frequently NULL and can be grouped together, 
and spin those off into a sub-table.   That is, if you have 4 columns which 
are generally either all null or all filled, and are all null for 70% of 
records then those 4 could make a nice child table.

-- 
-Josh Berkus

__AGLIO DATABASE SOLUTIONS___
Josh Berkus
   Complete information technology  [EMAIL PROTECTED]
and data management solutions   (415) 565-7293
   for law firms, small businesses   fax 621-2533
and non-profit organizations.   San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Francisco J Reyes
On Fri, 1 Aug 2003, Josh Berkus wrote:

> My attitude toward these normalization vs. performance issues is consistenly
> the same:  First, verify that you have a problem.   That is, build the
> database with everything in one table (or with child tables for Nullable
> fields, as above) and try to run your application.  If performance is
> appalling, *then* take denormalization steps to improve it.

I think I understand your point, however it would be very laborious after
you do all development to find out you need to de-normalize.

On your experience at which point it would actually help to do this
de-normalization in PostgreSQL? I know there are numerous factors ,but any
feedback based on previous experiences would help.

Right now the work I am doing is only for company internal use. If I was
to ever do work that outside users would have access to then I would be
looking at 100+ concurrent users.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Josh Berkus
Ron,

> You snipped out too much, because that's exactly what I said...
> Another way of writing it: only split the table if some of the fields
> are not unitary to the entity.

Sorry!  No offense meant.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Ron Johnson
On Fri, 2003-08-01 at 12:44, Josh Berkus wrote:
> Francisco,
> 
> > Yes all fields belong to the same entity. I used 100 as an example it may
> > be something like 60 to 80 fields (there are two tables in question). I
> > don't formally do 3rd normal form, but for the most part I do most of
> > the general concepts of normalization.
> >
> > > If not, then good design says to split the table.
> 
> Actually, no, it doesn't.   If all 60-80 fields are unitary and required 
> characteristics of the row-entity, normalization says keep them in one table.

You snipped out too much, because that's exactly what I said...
Another way of writing it: only split the table if some of the fields
are not unitary to the entity.

> The only time NF would recommend splitting the table is for fields which are 
> frequenly NULL for reasons other than missing data entry.  For those, you'd 
> create a child table.  Although while this is good 4NF, it's impractical in 
> PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow 
> indeed.

Good to know.

> My attitude toward these normalization vs. performance issues is consistenly 
> the same:  First, verify that you have a problem.   That is, build the 
> database with everything in one table (or with child tables for Nullable 
> fields, as above) and try to run your application.  If performance is 
> appalling, *then* take denormalization steps to improve it.

The OP was not talking about denormalizing ...

It was: will vertically partitioning a table increase performance.
And the answer is "sometimes",

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Josh Berkus
Francisco,

> Yes all fields belong to the same entity. I used 100 as an example it may
> be something like 60 to 80 fields (there are two tables in question). I
> don't formally do 3rd normal form, but for the most part I do most of
> the general concepts of normalization.
>
> > If not, then good design says to split the table.

Actually, no, it doesn't.   If all 60-80 fields are unitary and required 
characteristics of the row-entity, normalization says keep them in one table.

The only time NF would recommend splitting the table is for fields which are 
frequenly NULL for reasons other than missing data entry.  For those, you'd 
create a child table.  Although while this is good 4NF, it's impractical in 
PostgreSQL, where queries with several LEFT OUTER JOINs tend to be very slow 
indeed.

My attitude toward these normalization vs. performance issues is consistenly 
the same:  First, verify that you have a problem.   That is, build the 
database with everything in one table (or with child tables for Nullable 
fields, as above) and try to run your application.  If performance is 
appalling, *then* take denormalization steps to improve it.

I'm frequently distressed by the number of developers who make questionable 
design decisions "for performance reasons" without every verifying that they 
were, in fact, improving performance ...

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Ron Johnson
On Fri, 2003-08-01 at 12:14, Francisco J Reyes wrote:
> On Fri, 1 Aug 2003, Ron Johnson wrote:
> 
> > Do all 100 fields *really* all refer to the same *one* entity,
> > with no repeating values, etc?
> 
> Yes all fields belong to the same entity. I used 100 as an example it may
> be something like 60 to 80 fields (there are two tables in question). I
> don't formally do 3rd normal form, but for the most part I do most of
> the general concepts of normalization.

Woo hoo!!

> Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer
> (35, 58, 70). The total row length is a little over 400 characters in
> Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII
> representation for numbers so to store "1234567" it uses 7 bytes, whereas
> in PostgreSQL I can just make it an int and use 4 bytes)

But I'd only split if these big field are rarely used.  Note that
VARCHAR(xx) removes trailing spaces, so that also is a factor.

-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Francisco J Reyes
On Fri, 1 Aug 2003, Ron Johnson wrote:

> Do all 100 fields *really* all refer to the same *one* entity,
> with no repeating values, etc?

Yes all fields belong to the same entity. I used 100 as an example it may
be something like 60 to 80 fields (there are two tables in question). I
don't formally do 3rd normal form, but for the most part I do most of
the general concepts of normalization.

> If not, then good design says to split the table.

The original data was in Foxpro tables and I have made better normalized
tables in PostgreSQL.


> Also, if it's a high-activity table, but you only rarely need fields
> 60-90, then splitting them out to their own table might be useful
> (especially if some of those fields are large *CHAR or TEXT).

Yes some of the fields are varchars. 5 fields are varchar(22) and 3 longer
(35, 58, 70). The total row length is a little over 400 characters in
Foxpro. In postgreSQL may be less than 300 (ie Foxpro uses ASCII
representation for numbers so to store "1234567" it uses 7 bytes, whereas
in PostgreSQL I can just make it an int and use 4 bytes)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] How number of columns affects performance

2003-08-01 Thread Ron Johnson
On Fri, 2003-08-01 at 11:08, Francisco Reyes wrote:
> If a table which will be heavily used has numerous fields, yet only a
> handfull of them will be used heavily, would it make sense performance wise to split 
> it?
> 
> Example
> Table 1
> Field 1
> 
> Field 100
> 
> Table 2
> References Field 1 of table1
> .
> 
> Table n
> References Field 1 of table 1
> 
> So table 1 basically will be referenced by many tables and most of the
> time only a handfull of fields  of table 1 are needed. Don't have exact
> numbers, but let's say that more than 60% of queries to table 1 queries
> only use 20 fields or less.
> 
> If I split Table 1 then the second table will basically be a 1 to 1 to
> Table 1.

Do all 100 fields *really* all refer to the same *one* entity,
with no repeating values, etc?
If not, then good design says to split the table.

Also, if it's a high-activity table, but you only rarely need fields
60-90, then splitting them out to their own table might be useful
(especially if some of those fields are large *CHAR or TEXT).

> I have this simmilar scenario for two tables. One is close to 1 Million
> records and the other is about 300,000 records.
> 
> Programming wise it is much easier to only have one table, but I am just
> concerned about performance.
> 
> Most access to these tables will be indexed with some occassional
> sequential scans. Number of concurrent users now is probably 10 or less.
> Expect to grow to 20+ concurrent connections. Will this be more of an
> issue if I had hundreds/thousands of users?
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
-- 
+-+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED] |
| Jefferson, LA  USA  |
| |
| "I'm not a vegetarian because I love animals, I'm a vegetarian  |
|  because I hate vegetables!"|
|unknown  |
+-+



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly