Re: [PERFORM] Filesystem Direct I/O and WAL sync option

2007-07-05 Thread Dimitri

Gregory, thanks for good questions! :))
I got more lights on my throughput here :))

The running OS is Solaris9 (customer is still not ready to upgrade to
Sol10), and I think the main sync issue is coming from the old UFS
implementation... UFS mounted with 'forcedirectio' option uses
different sync logic as well accepting concurrent writing to the
same file which is giving here a higher performance level. I did not
expect really so big gain, so did not think to replay the same test
with direct I/O on and fsync=on too. For my big surprise - it also
reached 2800 tps as with fsync=off !!! So, initial question is no more
valid :))

As well my tests are executed just to validate server + storage
capabilities, and honestly it's really pity to see them used under old
Solaris version :))
but well, at least we know what kind of performance they may expect
currently, and think about migration before the end of this year...

Seeing at least 10.000 random writes/sec on storage sub-system during
live database test was very pleasant to customer and make feel them
comfortable for their production...

Thanks a lot for all your help!

Best regards!
-Dimitri

On 7/4/07, Gregory Stark [EMAIL PROTECTED] wrote:


Dimitri [EMAIL PROTECTED] writes:

 Yes Gregory, that's why I'm asking, because from 1800 transactions/sec
 I'm jumping to 2800 transactions/sec!  and it's more than important
 performance level increase :))

wow. That's kind of suspicious though. Does the new configuration take
advantage of the lack of the filesystem cache by increasing the size of
shared_buffers? Even then I wouldn't expect such a big boost unless you got
very lucky with the size of your working set compared to the two sizes of
shared_buffers.

It seems likely that somehow this change is not providing the same
guarantees
as fsync. Perhaps fsync is actually implementing IDE write barriers and the
sync mode is just flushing buffers to the hard drive cache and then
returning.

What transaction rate do you get if you just have a single connection
streaming inserts in autocommit mode? What kind of transaction rate do you
get
with both sync mode on and fsync=on in Postgres?

And did you say this with a battery backed cache? In theory fsync=on/off and
shouldn't make much difference at all with a battery backed cache. Stranger
and stranger.

--
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] improvement suggestions for performance design

2007-07-05 Thread tfinneid
Hi

I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve the
desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently 
between 6 and 20 seconds, depending on the setup. In the future X could
become as low as 3 seconds. Y can, within the next 5-10 years, become as
high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.

Now here is how I suggest doing this:

1- the tables

table attribute_values:
id  int
attr_type   int  ( references attribute_types(id) )
posXint
posYint
data_type   int
value   varchar(50)

table attribute_types:
id  int
namevarchar(200);



2- function

   a function that receives an array of data and inserts each attribute.
   perhaps one array per attribute data (type, posX, posY, data_type,
   value) so five arrays as in parameters ot the function

3- java client

   the client receives the data from a corba request, and splits it
   into, say 4 equally sized blocks and executes 4 threads that insert
   each block (this seems to be more efficient than just using one
   thread.)

Now I am wondering if this is the most efficient way of doing it?

- I know that I could group the attributes so that each type of attribute
gets its own table with all attributes in one row. But I am not sure if
that is any more efficient than ont attribute per row since I pass
everything to the function as an array.
With the above design a change in attribute types only requires changing
the data in a table instead of having to modify the client, the function
and the tables.

- I am also wondering if writing the client and function in C would create
a more efficient solution.

any comments?

ps, I am currently running postgres 8.1, but could probably use 8.2 if it
is needed for functionality or performance reasons. It will run on a sparc
machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
necessary and SCSI disks ( perhaps in raid 0 ).

regards

thomas




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread Heikki Linnakangas
I would strongly suggest that you use a proper relational schema, 
instead of storing everything in two tables. I don't know your 
application, but a schema like that is called an Entity-Attribute-Value 
(though your entity seems to be just posx and posy) and it should raise 
a big red flag in the mind of any database designer. In particular, 
constructing queries against an EAV schema is a major pain in the ass. 
This has been discussed before on postgresql lists as well, you might 
want to search and read the previous discussions.


Ignoring the EAV issue for a moment, it's hard to give advice without 
knowing what kind of queries are going to executed. Are the lookups 
always going to be by id? By posx/posy perhaps? By attribute?


[EMAIL PROTECTED] wrote:

Hi

I have the following scenario for a database that I need to design, and
would like some hints on what to improve or do differently to achieve the
desired performance goal, disregarding hardware and postgres tuning.

The premise is an attribute database that stores about 100 different
attribute types as attribute values. Every X seconds, Y number of new
attribute values are stored in the database. X is constant and currently 
between 6 and 20 seconds, depending on the setup. In the future X could

become as low as 3 seconds. Y can, within the next 5-10 years, become as
high as 200 000.

That means that for example, every 6 seconds 100 000 attributes needs to
be written to the database.

At the same time, somewhere between 5-20 users needs to read parts of
those newly written attributes, maybe in total 30 000 attributes.

This continues for the duration of the field operation, which could be
18hrs a day for 6 weeks. So the total db size is up towards 200 gigs.

Now here is how I suggest doing this:

1- the tables

table attribute_values:
id  int
attr_type   int  ( references attribute_types(id) )
posXint
posYint
data_type   int
value   varchar(50)

table attribute_types:
id  int
namevarchar(200);



2- function

   a function that receives an array of data and inserts each attribute.
   perhaps one array per attribute data (type, posX, posY, data_type,
   value) so five arrays as in parameters ot the function

3- java client

   the client receives the data from a corba request, and splits it
   into, say 4 equally sized blocks and executes 4 threads that insert
   each block (this seems to be more efficient than just using one
   thread.)

Now I am wondering if this is the most efficient way of doing it?

- I know that I could group the attributes so that each type of attribute
gets its own table with all attributes in one row. But I am not sure if
that is any more efficient than ont attribute per row since I pass
everything to the function as an array.
With the above design a change in attribute types only requires changing
the data in a table instead of having to modify the client, the function
and the tables.

- I am also wondering if writing the client and function in C would create
a more efficient solution.

any comments?

ps, I am currently running postgres 8.1, but could probably use 8.2 if it
is needed for functionality or performance reasons. It will run on a sparc
machine with solaris 10 and perhaps 4-6 processors, as many GB of RAM as
necessary and SCSI disks ( perhaps in raid 0 ).



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread tfinneid
 I would strongly suggest that you use a proper relational schema,
 instead of storing everything in two tables. I don't know your
 application, but a schema like that is called an Entity-Attribute-Value
 (though your entity seems to be just posx and posy) and it should raise
 a big red flag in the mind of any database designer. In particular,
 constructing queries against an EAV schema is a major pain in the ass.
 This has been discussed before on postgresql lists as well, you might
 want to search and read the previous discussions.

I get your point, but the thing is the attributes have no particular
relation to each other, other than belonging to same attribute groups.
There are no specific rules that states that certain attributes are always
used together, such as with an address record. It depends on what
attributes the operator wants to study. This is why I don't find any
reason to group the attributes into separate tables and columns.

I am still looking into the design of the tables, but I need to get at
proper test harness running before I can start ruling things out. And a
part of that, is for example, efficient ways of transferring the insert
data from the client to the db, instead of just single command inserts.
This is where bulk transfer by arrays probably would be preferable.

 Ignoring the EAV issue for a moment, it's hard to give advice without
 knowing what kind of queries are going to executed. Are the lookups
 always going to be by id? By posx/posy perhaps? By attribute?

the query will be by attribute type and posx/y. So for position x,y, give
me the following attributes...

thomas




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread Y Sidhu

On 7/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


 I would strongly suggest that you use a proper relational schema,
 instead of storing everything in two tables. I don't know your
 application, but a schema like that is called an Entity-Attribute-Value
 (though your entity seems to be just posx and posy) and it should raise
 a big red flag in the mind of any database designer. In particular,
 constructing queries against an EAV schema is a major pain in the ass.
 This has been discussed before on postgresql lists as well, you might
 want to search and read the previous discussions.

I get your point, but the thing is the attributes have no particular
relation to each other, other than belonging to same attribute groups.
There are no specific rules that states that certain attributes are always
used together, such as with an address record. It depends on what
attributes the operator wants to study. This is why I don't find any
reason to group the attributes into separate tables and columns.

I am still looking into the design of the tables, but I need to get at
proper test harness running before I can start ruling things out. And a
part of that, is for example, efficient ways of transferring the insert
data from the client to the db, instead of just single command inserts.
This is where bulk transfer by arrays probably would be preferable.

 Ignoring the EAV issue for a moment, it's hard to give advice without
 knowing what kind of queries are going to executed. Are the lookups
 always going to be by id? By posx/posy perhaps? By attribute?

the query will be by attribute type and posx/y. So for position x,y, give
me the following attributes...

thomas




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



I don't know much about this EAV stuff. Except to say that my company is in
a situation with a lot of adds and bulk deletes and I wish the tables were
designed with partitioning in mind. That is if you know how much, order of
magnitude, data each table will hold or will pass through (add and delete),
you may want to design the table with partitioning in mind. I have not done
any partitioning so I cannot give you details but can tell you that mass
deletes are a breeze because you just drop that part of the table. I think
it is a sub table. And that alleviates table bloat and excessive vacuuming.

Good luck.

--
Yudhvir Singh Sidhu
408 375 3134 cell


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread tfinneid
 On 7/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 I don't know much about this EAV stuff. Except to say that my company is
 in
 a situation with a lot of adds and bulk deletes and I wish the tables were
 designed with partitioning in mind. That is if you know how much, order of
 magnitude, data each table will hold or will pass through (add and
 delete),
 you may want to design the table with partitioning in mind. I have not
 done
 any partitioning so I cannot give you details but can tell you that mass
 deletes are a breeze because you just drop that part of the table. I
 think
 it is a sub table. And that alleviates table bloat and excessive
 vacuuming.

By partitioning, do you mean some sort of internal db table partitioning
scheme or just me dividing the data into different tables?

There want be many deletes, but there might of course be some.
Additionally, because of the
performance requirements, there wont be time to run vacuum in between the
insert, except for in non-operational periods. which will only be a couple
of hours during the day. So vacuum will have to be scheduled at those
times, instead of the normal intervals.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread Y Sidhu

On 7/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:


 On 7/5/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 I don't know much about this EAV stuff. Except to say that my company is
 in
 a situation with a lot of adds and bulk deletes and I wish the tables
were
 designed with partitioning in mind. That is if you know how much, order
of
 magnitude, data each table will hold or will pass through (add and
 delete),
 you may want to design the table with partitioning in mind. I have not
 done
 any partitioning so I cannot give you details but can tell you that mass
 deletes are a breeze because you just drop that part of the table. I
 think
 it is a sub table. And that alleviates table bloat and excessive
 vacuuming.

By partitioning, do you mean some sort of internal db table partitioning
scheme or just me dividing the data into different tables?

There want be many deletes, but there might of course be some.
Additionally, because of the
performance requirements, there wont be time to run vacuum in between the
insert, except for in non-operational periods. which will only be a couple
of hours during the day. So vacuum will have to be scheduled at those
times, instead of the normal intervals.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate



Internal db table partitioning. Check out:
http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html

--
Yudhvir Singh Sidhu
408 375 3134 cell


Re: [PERFORM] Query is taking 5 HOURS to Complete on 8.1 version

2007-07-05 Thread smiley2211

Hello all,

I've made the changes to view to use UNION ALL and the where NOT IN
suggestions...the query now takes a little under 3 hours instead of 5 --
here is the EXPLAIN ANALYZE:

*

 
QUERY PLAN  
  
   
-
---
 Limit  (cost=100013612.76..299939413.70 rows=1 width=8) (actual
time=10084289.859..10084289.861 rows=1 loops=1)
   -  Subquery Scan people_consent  (cost=100013612.76..624068438343.99
rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1)
 -  Append  (cost=100013612.76..624068438312.78 rows=3121
width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1)
   -  Nested Loop  (cost=100013612.76..100013621.50 rows=2
width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1)
 -  Unique  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1)
   -  Sort  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1)
 Sort Key: temp_consent.id
 -  Unique 
(cost=100013612.71..100013612.73 rows=2 width=36) (actual
time=10084245.195..10084277.468 rows=7292 loops=1)
   -  Sort 
(cost=100013612.71..100013612.72 rows=2 width=36) (actual
time=10084245.191..10084254.425 rows=7292 loops=1)
 Sort Key: id, daterecorded,
answer
 -  Append 
(cost=100013515.80..100013612.70 rows=2 width=36) (actual
time=10083991.226..10084228.613 rows=7292 loops=1)
   -  HashAggregate 
(cost=100013515.80..100013515.82 rows=1 width=36) (actual
time=10083991.223..10083998.046 rows=3666 loops=1)
 -  Nested Loop 
(cost=10060.61..100013515.80 rows=1 width=36) (actual
time=388.263..10083961.330 rows=3702 loops=1)
   -  Nested
Loop  (cost=10060.61..100013511.43 rows=1 width=36) (actual
time=388.237..10083897.268 rows=3702 loops=1)
 - 
Nested Loop  (cost=10060.61..100013507.59 rows=1 width=24) (actual
time=388.209..10083833.870 rows=3702 loops=1)
  
-  Nested Loop  (cost=10060.61..100013504.56 rows=1 width=24) (actual
time=388.173..10083731.122 rows=3702 loops=1)

Join Filter: (inner.question_answer_id = outer.id)

-  Nested Loop  (cost=60.61..86.33 rows=1 width=28) (actual
time=13.978..114.768 rows=7430 loops=1)

  
-  Index Scan using answers_answer_un on answers a  (cost=0.00..5.01 rows=1
width=28) (actual time=0.084..0.088 rows=1 loops=1)


Index Cond: ((answer)::text = 'Yes'::text)

  
-  Bitmap Heap Scan on questions_answers qa  (cost=60.61..81.23 rows=7
width=16) (actual time=13.881..87.112 rows=7430 loops=1)


Recheck Cond: ((qa.answer_id = outer.id) AND (((qa.question_tag)::text =
'consentTransfer'::text) OR ((qa.question_tag)::text = 'share
WithEval'::text)))


-  BitmapAnd  (cost=60.61..60.61 rows=7 width=0) (actual
time=13.198..13.198 rows=0 loops=1)

  
-  Bitmap Index Scan on qs_as_answer_id  (cost=0.00..5.27 rows=649 width=0)
(actual time=9.689..9.689 rows=57804 loops=1)


Index Cond: (qa.answer_id = outer.id)

  
-  BitmapOr  (cost=55.08..55.08 rows=6596 width=0) (actual
time=2.563..2.563 rows=0 loops=1)

  

Re: [PERFORM] Query is taking 5 HOURS to Complete on 8.1 version

2007-07-05 Thread Alvaro Herrera
smiley2211 wrote:
 
 Hello all,
 
 I've made the changes to view to use UNION ALL and the where NOT IN
 suggestions...the query now takes a little under 3 hours instead of 5 --
 here is the EXPLAIN ANALYZE:

It seems you have disabled nested loops --- why?  Try turning them back
on and let us see the EXPLAIN ANALYZE again.

It would be extremely helpful if you saved it in a file and attached it
separately so that the indentation and whitespace is not mangled by your
email system.  It would be a lot more readable that way.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread Heikki Linnakangas

[EMAIL PROTECTED] wrote:

I would strongly suggest that you use a proper relational schema,
instead of storing everything in two tables. I don't know your
application, but a schema like that is called an Entity-Attribute-Value
(though your entity seems to be just posx and posy) and it should raise
a big red flag in the mind of any database designer. In particular,
constructing queries against an EAV schema is a major pain in the ass.
This has been discussed before on postgresql lists as well, you might
want to search and read the previous discussions.


I get your point, but the thing is the attributes have no particular
relation to each other, other than belonging to same attribute groups.
There are no specific rules that states that certain attributes are always
used together, such as with an address record. It depends on what
attributes the operator wants to study. This is why I don't find any
reason to group the attributes into separate tables and columns.


ISTM that a properly normalized schema would look something like this:

create table position (
  posX int not null,
  posY int not null,
  primary key (posX, posY)
);

create table colour (
  posX int not null,
  posY int not null,
  colour varchar(50) not null,
  primary key (posX, posY),
  foreign key (posX, posY) references position (posX, posY)
);

create table population (
  posX int not null,
  posY int not null,
  population int notn u,
  primary key (posX, posY),
  foreign key (posX, posY) references position (posX, posY)
);

where colour and population are examples of attributes you want to 
store. If you have 100 different attributes, you'll have 100 tables like 
that. That may sound like a lot, but it's not.


This allows you to use proper data types for the attributes, as well as 
constraints and all the other goodies a good relational data model gives you


It also allows you to build proper indexes on the attributes. For 
example, if you store populations as text, you're going to have a hard 
time building an index that allows you to query for positions with a 
population between 100-2000 efficiently.


These are all imaginary examples, but what I'm trying to point out here 
is that a proper relational schema allows you to manage and query your 
data much more easily and with more flexibility, allows for future 
extensions.


A normalized schema will also take less space, which means less I/O and 
more performance, because there's no need to store metadata like the 
data_type, attr_type on every row. For performance reasons, you might 
actually want to not store the position-table at all in the above schema.


An alternative design would be to have a single table, with one column 
per attribute:


create table position (
  posX int not null,
  posY int not null,
  colour varchar(50),
  population int,
  ...
  primary key (posX, posY)
)

This is more space-efficient, especially if you have a lot of attributes 
on same coordinates. You can easily add and drop columns as needed, 
using ALTER TABLE.



I am still looking into the design of the tables, but I need to get at
proper test harness running before I can start ruling things out. And a
part of that, is for example, efficient ways of transferring the insert
data from the client to the db, instead of just single command inserts.
This is where bulk transfer by arrays probably would be preferable.


Before you start fiddling with functions, I'd suggest that you try 
batching the inserts with the JDBC PreparedStatement batch facility.


Splitting the inserts into multiple threads in your application sounds 
messy. The inserts would have to be in separate transactions, for 
example. Presumably your CORBA ORB will spawn multiple threads for you 
when there's a lot requests coming in, so the overall throughput should 
be the same with a single thread per request.


BTW, I concur with Y Sidhu that with data volumes as high as you have, 
partitioning is a good idea. It's a lot easier to manage 20 10 GB table 
partitions, than one 200 GB table. For example, VACUUM, CLUSTER, CREATE 
INDEX can be done partition per partition, instead of as a single huge 
operatio that runs for hours. Though if you choose to have just one 
table per attribute type, each table might be conveniently small by 
nature, so that no partitioning is required.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] improvement suggestions for performance design

2007-07-05 Thread Thomas Finneid


Heikki Linnakangas wrote:

ISTM that a properly normalized schema would look something like this:

create table position (
  posX int not null,
  posY int not null,
  primary key (posX, posY)
);

create table colour (
  posX int not null,
  posY int not null,
  colour varchar(50) not null,
  primary key (posX, posY),
  foreign key (posX, posY) references position (posX, posY)
);

create table population (
  posX int not null,
  posY int not null,
  population int notn u,
  primary key (posX, posY),
  foreign key (posX, posY) references position (posX, posY)
);


I agree that this is a way it could be done.

where colour and population are examples of attributes you want to 
store. If you have 100 different attributes, you'll have 100 tables like 
that. That may sound like a lot, but it's not.


In any case, there is no point in having one table per attribute, as 
some attributes are logically grouped and can therefore be grouped 
toghether in the table. Since there are 5-10 groups of attributes, 5-10 
tables would be enough.




This allows you to use proper data types for the attributes, as well as 
constraints and all the other goodies a good relational data model gives 
you


It also allows you to build proper indexes on the attributes. For 
example, if you store populations as text, you're going to have a hard 
time building an index that allows you to query for positions with a 
population between 100-2000 efficiently.


Performing queries on the attribute value is of no interrest, so that 
does not matter,


These are all imaginary examples, but what I'm trying to point out here 
is that a proper relational schema allows you to manage and query your 
data much more easily and with more flexibility, allows for future 
extensions.


They have been treating their data this way for the last 20 years, and 
there is nothing on the horizon that tells neither them nor me that it 
will be any different the next 10 years. So I am not sure I need to plan 
for that.


A normalized schema will also take less space, which means less I/O and 
more performance, 


That is what I am trying to find out, if it is true for this scenario as 
well.


because there's no need to store metadata like the 
data_type, attr_type on every row. 


data_type and attr_type are not decorative meta_data, they are actively 
used as query parameters for each attribute, if they where not there I 
would not be able to perform the queries I need to do.


For performance reasons, you might

actually want to not store the position-table at all in the above schema.

An alternative design would be to have a single table, with one column 
per attribute:


create table position (
  posX int not null,
  posY int not null,
  colour varchar(50),
  population int,
  ...
  primary key (posX, posY)
)

This is more space-efficient, especially if you have a lot of attributes 
on same coordinates. You can easily add and drop columns as needed, 
using ALTER TABLE.



I am still looking into the design of the tables, but I need to get at
proper test harness running before I can start ruling things out. And a
part of that, is for example, efficient ways of transferring the insert
data from the client to the db, instead of just single command inserts.
This is where bulk transfer by arrays probably would be preferable.


Before you start fiddling with functions, I'd suggest that you try 
batching the inserts with the JDBC PreparedStatement batch facility.


I have done that, now I need to have something to compare it against, 
preferably a function written in plpgsql and one in c.
So any other suggestions on how to efficiently bulk transfer the data to 
the db for insertion?


Splitting the inserts into multiple threads in your application sounds 
messy. 


Well, it has been tested and showed to make postgres perform much 
better, ie. 100 000 inserts separated between 4 threads performed much 
faster than with a single thread alone.


BTW, I concur with Y Sidhu that with data volumes as high as you have, 
partitioning is a good idea. 


Yes, I will be looking into to it.

regards

thomas


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-07-05 Thread Magnus Hagander
Tom Lane wrote:
 PFC [EMAIL PROTECTED] writes:
 What version of PostgreSQL are you using?
 
  I think newbies should be pushed a bit to use the latest versions,
 
 How about pushed *hard* ?  I'm constantly amazed at the number of people
 who show up in the lists saying they installed 7.3.2 or whatever random
 version they found in a dusty archive somewhere.  Please upgrade is at
 least one order of magnitude more valuable configuration advice than
 anything else we could tell them.

(picking up an old thread while at a boring wait at the airport.. anyway)

I keep trying to think of more nad better ways to do this :-) Perhaps we
should put some text on the bug reporting form (and in the documentation
about bug reporting) that's basically don't bother reporting a bug
unless you're on the latest in a branch, and at least make sure you're
on one of the maojr releases listed on www.postgresql.org?

Seems reasonable?

//Magnus



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Direct I/O

2007-07-05 Thread lai yoke hman

Hello,
How can I know my PostgreSQL 8 is using direct I/O or buffered I/O? If using 
buffered I/O, how can I enable direct I/O? What is the performance difference 
of them?
This is urgent, Thanks.
_
Windows Live Spaces is here! It’s easy to create your own personal Web site. 
http://spaces.live.com/?mkt=en-my

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Performance of PostgreSQL and Oracle

2007-07-05 Thread lai yoke hman

Hello,
I have seen some performance testing indicates that apparently the PostgreSQL 8 
is faster in writing data while seems like Oracle 10g is better in reading data 
from database, can any one tell me why? Or is there anyone done performance 
benchmark on them before?
This is urgent.
Thanks.
_
Call friends with PC-to-PC calling for free!
http://get.live.com/messenger/overview

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance of PostgreSQL and Oracle

2007-07-05 Thread Tom Lane
lai yoke hman [EMAIL PROTECTED] writes:
 I have seen some performance testing indicates that apparently the PostgreSQL 
 8 is faster in writing data while seems like Oracle 10g is better in reading 
 data from database, can any one tell me why? Or is there anyone done 
 performance benchmark on them before?

You won't find anything particularly definitive on this, because Oracle
forbids publishing benchmarks of their software.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-07-05 Thread Harald Armin Massa

Magnus,

don't bother reporting a bug

unless you're on the latest in a branch, and at least make sure you're
on one of the maojr releases listed on www.postgresql.org?

Seems reasonable?



absolutely. Should be standard practice.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2007 will take place in Vilnius, Lithuania from Monday 9th July
to Wednesday 11th July. See you there!