Re: [PERFORM] Hardware Configuration and other Stuff

2015-03-18 Thread Pavel Stehule
Hi

2015-03-18 20:07 GMT+01:00 Vivekanand Joshi vjo...@zetainteractive.com:

 Hi Team,



 I don't know under which section does this question comes, so I am posting
 this question to both Admin and performance mailing list. Apologies in
 advance.



 Objective:



 We are planning to use PostgreSQL instead of Netezza for our data
 warehouse as well as database solutions. Right now, we have all our clients
 in one Netezza box.  What we are thinking of migrating our clients to
 dedicated PostgreSQL for each of them. We will start with one of the
 client. If it works successfully, we will be migrating all the clients one
 by one. The objective is to get a better performance than our existing
 solution. We are hopeful of that mainly because of two reasons. Firstly, we
 will have a dedicated server for each of the client with good hardware
 instead of having one server with all the clients on that. Secondly, we can
 spend on hardware much easily than spending on a proprietary appliance.




It terrible depends on use case. Netezza is extremely optimized OLAP column
store database.  PoostgreSQL is optimized OLTP row store database. You
cannot to get same performance on OLAP queries on Postgres ever. I don't
think so dedicated hw can help. If you use Nettezza well, then it is
10-100x faster than Postgres.

You can try to use Postgres with cstore_fdw or maybe better MonetDB

Regards

Pavel


 I am hoping this community can help us to know that what would be the good
 infrastructure/hardware that can help us in achieving our goal.



 Here are few of the statistics which might act as a starting point.



 Availability: High (24*7).

 User Data : 700 GB which will increase to 1.5 TB in next 2-3 years.

 Number of User Databases : 2 (One is the main database, other is used only
 for working tables where tables gets deleted in every 48 hours)

 Number of tables : 200 (in the main database), (2000-3000 in working
 database)

 Size of top 5 biggest tables : 20-40 GB

 No of users concurrently accessing the system : 5-6 with write access. 10
 with read access.

 No of User Queries running on the system in a day : ~80K

 Read-only Queries (Select): ~60K

 Write queries: ~20K

 Data Import Queries: ~1K

 Typical Business Day : 18-20 hours.



 I can pass on few complex queries to let you guys know what are we doing.



 Here are few questions:



 1.) I don't need a load balancing solution. It must be high availability
 server and I can work with asynchronous replication. The most important
 thing here would be recovery should be as fast as possible.

 What approach would you recommend?



 2.) Recommendations on indexes, WAL, table spaces. I am not asking about
 on which key I need to make indexes, but an high level approach about how
 to keep them? This might come out as a weird question to many but please
 excuse me for being a novice.



 *Most Important Question:*



 3.) What would be the ideal hardware configuration for this requirement? I
 know there is not a one-stop answer for this, but let's take it is a
 starting point. We can come to a proper conclusion after a discussion.



 What are the best on-line resources/books which can tell us about the
 hardware requirements?



 Warm Regards,


 Vivekanand Joshi
 +919654227927



 [image: Zeta Interactive]

 185 Madison Ave. New York, NY 10016

 www.zetainteractive.com





Re: [PERFORM] Hardware Configuration and other Stuff

2015-03-18 Thread Tomas Vondra
Hi,

On 18.3.2015 20:07, Vivekanand Joshi wrote:
 Hi Team,
 
 I don't know under which section does this question comes, so I am
 posting this question to both Admin and performance mailing list.
 Apologies in advance.

Let's keep this in pgsql-performance.

 
 Objective:
 
 We are planning to use PostgreSQL instead of Netezza for our data 
 warehouse as well as database solutions. Right now, we have all our 
 clients in one Netezza box. What we are thinking of migrating our 
 clients to dedicated PostgreSQL for each of them. We will start with
 one of the client. If it works successfully, we will be migrating all
 the clients one by one. The objective is to get a better performance
 than our existing solution. We are hopeful of that mainly because of
 two reasons. Firstly, we will have a dedicated server for each of the
 client with good hardware instead of having one server with all the
 clients on that. Secondly, we can spend on hardware much easily than
 spending on a proprietary appliance.


OK.

 I am hoping this community can help us to know that what would be the
 good infrastructure/hardware that can help us in achieving our goal.
 
 Here are few of the statistics which might act as a starting point.
 
 Availability: High (24*7).
 
 User Data : 700 GB which will increase to 1.5 TB in next 2-3 years.

How do you measure the amount of data? Is that the amount of data before
loading, size of the database, or what?

Also, is this a single client (thus placed on a single box), or multiple
clients?

 Number of User Databases : 2 (One is the main database, other is
 used only for working tables where tables gets deleted in every 48
 hours)

You mentioned 700GB of data - is that just the main database, or both
databases?

 
 Number of tables : 200 (in the main database), (2000-3000 in working 
 database)
 
 Size of top 5 biggest tables : 20-40 GB
 
 No of users concurrently accessing the system : 5-6 with write
 access. 10 with read access.
 
 No of User Queries running on the system in a day : ~80K
 
 Read-only Queries (Select): ~60K
 
 Write queries: ~20K
 
 Data Import Queries: ~1K
 
 Typical Business Day : 18-20 hours.

So is this a typical batch environment when you do the loads at night,
but no during loads? That might be possible with clients on dedicated
boxes and would allow various optimizations.

 
 I can pass on few complex queries to let you guys know what are we
 doing.
 
 Here are few questions:
  
 1.) I don't need a load balancing solution. It must be high availability
 server and I can work with asynchronous replication. The most important
 thing here would be recovery should be as fast as possible.
 
 What approach would you recommend?

Streaming replication. I would probably start with sync replication.

 2.) Recommendations on indexes, WAL, table spaces. I am not asking
 about on which key I need to make indexes, but an high level approach
 about how to keep them? This might come out as a weird question to
 many but please excuse me for being a novice.

Not sure what exactly are you looking for - there's a lot of things, and
many of them depend on what hardware you plan to use.

The simplest indexing strategy is to design them along with the schema,
and evaluate them on queries (collect slow queries - create suitable
indexes - repeat).

 
 3.) What would be the ideal hardware configuration for this
 requirement? I know there is not a one-stop answer for this, but
 let's take it is a starting point. We can come to a proper conclusion
 after a discussion.

This is very complex question, to be honest. I assume you're looking for
regular servers, in that case a good server with not that many CPUs
(say, 32 cores seems to be enough for your workload), plenty of RAM and
good disk system to handle the load would be a good start.

 What are the best on-line resources/books which can tell us about
 the hardware requirements?

I'd say these two books would be helpful:

(1)
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-9-high-availability-cookbook

   - explains capacity planning etc.

(2)
https://www.packtpub.com/big-data-and-business-intelligence/postgresql-90-high-performance

   - a good book about PostgreSQL performance tuning

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance