[PERFORM] Need help in setting optimal configuration for a huge database.

2005-10-22 Thread Kishore B
Hi All,

I am Kishore doing freelance development of J2EE applications. 

We switched to use Postgresql recently because of the advantages it has over other commercial databases. All went well untill recently, untill we began working on an application that needs to maintain a huge database. 


I am describing the problem we are facing below. Can you please take a look at the case, and help me in configuring the PostgreSQL.


We have only two tables, one of which contains 97% of the data and the other table which contains 2.8% of the data. All other contain only the remaining 
0.2% of data and are designed to support these two big tables. Currently we have 9 million of records in the first table and 0.2 million of records in the second table.We need to insert into the bigger table almost for every second , through out the life time. In addition, we receive at least 200,000 records a day at a fixed time.
We are facing a critical situation because of the performance of the database
. Even a basic query like select count(*) from bigger_table is taking about 4 minutes to return.The following is the system configuration.Database 
: Postgresql 7.3OS : Redhat LinuxProcessor : Athlon,Memory : 2 GBWe are expecting that at least 200 active connections need to be maintainedthrough out the day.Can anyyou please suggest the best configuration to satisfy the above requirements?

Thanks in advance. 

Thank you,
Kishore.



postgresql.conf
Description: Binary data

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Need help in setting optimal configuration for a huge database.

2005-10-22 Thread Kishore B
Hi Gunderson, 

Can I set the effective_cache_size to 20?
Yes, that should work fine.
Do you mean that I can set the effective_cache_size to 1.5 GB out of 2GB Memory that I have in the current system?

Can I set the sort_memory to 3072? We need to generate reports which makeheavy use of group by and order by clauses.

Based on the 2GB available memory, how do you want me to Please note further that we need to execute upto 10 data centric queries at any instance. Based on these specifications, how do you want me to allocate memory tothe following configuration parameters?


shared_buffers, (Current Setting : 48000 (375MB))
sort_memory, (Current setting 2048 kb (2MB))
effective_cache_size (Current setting: 10 (1GB))


On 10/23/05, Steinar H. Gunderson [EMAIL PROTECTED] wrote:
[please send replies to the list, not to me directly]On Sun, Oct 23, 2005 at 03:19:39AM +0530, Kishore B wrote:
*You definitely want to upgrade this if you can.  Memory : 2 GB * We can move upto 12 GB if need to be.I was referring to your PostgreSQL version, not your RAM. More RAM is almost
always an improvement, but for your data set, 2GB sounds quite good. (700krows is not really a "huge database", BTW -- I've seen people here haveseveral billion rows a _day_.)For now, let us set the configuraiton parameters for 2GB.
 I failed to mention earlier, that we have a dedicated server for database.Can I set the effective_cache_size to 20?Yes, that should work fine. Can I set the sort_mem size to 4096?
This depends a bit on the queries you're running. Remember that for each andevery sort you do, one of these (measured in 8kB buffers) will get allocated.Some tuning of your queries against this would probably be useful.
 Will the performance suffer, if I set these parameters too high?Yes, you can easily run into allocating too much RAM with too high sort_mem,which could kill your performance. Overestimating effective_cache_size is
AFAIK not half as bad, though -- it is merely a hint to the planner, it doesnot actually allocate memory./* Steinar */--Homepage: http://www.sesse.net/



[PERFORM] Need help in setting optimal configuration for a huge database.

2005-10-24 Thread Kishore B


Hi All,

I am Kishore doing freelance development of J2EE applications. 

We switched to use Postgresql recently because of the advantages it has over other commercial databases. All went well untill recently, untill we began working on an application that needs to maintain a huge database. 

I am describing the problem we are facing below. Can you please take a look at the case, and help me in configuring the PostgreSQL.

We have only two tables, one of which contains 97% of the data and the other table which contains 2.8% of the data. All other contain only the remaining 0.2% of data and are designed to support these two big tables. Currently we have 9 million of records in the first table and 0.2 million of records in the second table.We need to insert into the bigger table almost for every second , through out the life time. In addition, we receive at least 200,000 records a day at a fixed time.We are facing a critical situation because of the performance of the database. Even a basic query like select count(*) from bigger_table is taking about 4 minutes to return.The following is the system configuration.Database : Postgresql 7.3OS : Redhat LinuxProcessor : Athlon,Memory : 2 GBWe are expecting that at least 200 active connections need to be maintainedthrough out the day.

I am also attaching the configuration file that we are currently using.
Can anyyou please suggest the best configuration to satisfy the above requirements?
Thanks in advance. 

Thank you,
Kishore.

		 Yahoo! FareChase - Search multiple travel sites in one click.

 

 

postgresql.conf
Description: 3963038301-postgresql.conf

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Kishore B
Hi All, 

Thank you very much for your help in configuring the database.

Can you guys please take a look at the following query andlet meknow why the index is not considered in the plan?

Here is the extract of the condition string of the query that is taking the transaction_date in index condition:

where (account.id = search_engine.account_fk) and ( account.status = 't' and account.id = '40288a820726362f0107263c55d3') and ( search_engine.id = 
conversion.search_engine_fk and conversion.event_type ='daily_spend' and conversion.tactic = 'PPC' and conversion.transaction_date between '2005-01-01 00:00:00' and '2005-10-31 23:59:59') group by 
account.id;
Plan:
 - Index Scan using conversion_all on conversion (cost=0.00..6.02 rows=1 width=98) Index Cond: (((tactic)::text = 'PPC'::text) AND ((event_type)::text = 'daily_spend'::text) AND (transaction_date = '2005-01-01 00:00:00'::timestamp without time zone) AND (transaction_date = '2005-10-31 23:59:59'::timestamp without time zon (..)


Here is the extract of the condition string of the query that is not taking the transaction_date in index condition:

where ( account.status = 't' and account.id = search_engine.account_fk and account.id = '40288a820726362f0107263c55d3' ) and ( search_engine.id = 
conversion.search_engine_fk and conversion.tactic = 'PPC' and conversion.transaction_date = '2005-01-01 00:00:00' and conversion.transaction_date = '2005-10-31 23:59:59' ) group by account.id
;
Plan:
 - Index Scan using conv_evnt_tact_trans_date_sefk on conversion (cost=0.00..6.02 rows=1 width=132) Index Cond: (((outer.id)::text = (conversion.search_engine_fk)::text) AND ((conversion.tactic)::text = 'PPC'::text))
 Filter: ((transaction_date = '2005-01-01 00:00:00'::timestamp without time zone) AND (transaction_date = '2005-10-31 23:59:59'::timestamp without time zone))

I have the following indexes defined on the columns.
conv_evnt_tact_trans_date_sefk : (search_engine_fk, tactic, event_type, transaction_date);
conv_all : (tactic, event_type, transaction_date);

I am really confused when I saw this plan. In both queries, I am using the same columns in the where condition, but the optimizer is taking different indexes in these two cases.
Second, even though, I have the transaction_date column specified in the second instance, why is it not taking the constraint as index condition?

Thanks in advance.

Thank you,
Kishore.


Re: [PERFORM] Why Index is not working on date columns.

2005-10-25 Thread Kishore B
Hi Tom,

Thank you foryour response.

I surmise that you are testing on toy tables and extrapolating to whatwill happen on larger tables.

These tables participating here contain more than 8 million records as of now, and on every day, 200K records, will add to them.


Thank you,
Kishore.


On 10/25/05, Tom Lane [EMAIL PROTECTED] wrote:
Kishore B [EMAIL PROTECTED] writes:Can you guys please take a look at the following query and let me know why
 the index is not considered in the plan?Considered and used are two different things.The two examples you give have the same estimated cost (within twodecimal places) so the planner sees no particular reason to choose one
over the other.I surmise that you are testing on toy tables and extrapolating to whatwill happen on larger tables.This is an unjustified assumption.Create a realistic test data set, ANALYZE it, and then see if the
planner chooses indexes you like. regards, tom lane


[PERFORM] Why different execution times for different instances for the same query?

2005-10-25 Thread Kishore B
Hi All, 

We are executing a single query that returned veryfast on the first instance. Butwhen I executed the same queryformultiple times,it is giving strange results. It is not coming back.

When I checked with the processes running in the system, Iobserved that multiple instances of postmaster are running and all of them are consuming very high amounts of memory. I could also observe that they are sharing the memory in a uniform distribution across them. 


Please let me know if any body has experienced the same and how do they resolved it.

Thank you,
Kishore.