Nick,

> I reckon do a system scan first, and parse the current PostgreSQL conf
> file to figure out what the settings are. Also back it up with a date
> and time appended to the end to make sure there is a backup before
> overwriting the real conf file. Then a bunch of questions. What sort of
> questions would need to be asked and which parameters would these
> questions affect? So far, and from my limited understanding of the .conf
> file, I reckon there should be the following

Hmmm ... but I do think that there should be a file to store the user's 
previous answers.   That way, the script can easily be re-run to fix config 
issues.

> Here is your config of your hardware as detected. Is this correct ?
>
>     This could potentially be several questions, i.e. one for proc, mem,
> os, hdd etc
>     Would affect shared_buffers, sort_mem, effective_cache_size,
> random_page_cost

Actually, I think this would break down into:
-- Are Proc & Mem correct?  If not, type in correct values
-- Is OS correct? If not, select from list
-- Your HDD: is it:
        1) IDE
        2) Fast multi-disk SCSI or low-end RAID
        3) Medium-to-high-end RAID

Other things, we don't care about.

> How was PostgreSQL compiled?
>
>     This would be parameters such as the block size and a few other
> compile time parameters. If we can get to some of these read-only
> parameters than that would make this step easier, certainly for the new
> recruits amongst us.

Actually, from my perspective, we shouldn't bother with this; if an admin 
knows enough to set an alternate blaock size for PG, then they know enough to 
tweak the Conf file by hand.  I think we should just issue a warning that 
this script:
1) does not work for anyone who is using non-default block sizes, 
2) may not work well for anyone using unusual locales, optimization flags, or 
other non-default compile options except for language interfaces.
3) cannot produce good settings for embedded systems;
4) will not work well for systems which are extremely low on disk space, 
memory, or other resouces.
        Basically, the script only really needs to work for the people who are 
installing PostgreSQL with the default options or from RPM on regular server 
or workstation machines with plenty of disk space for normal database 
purposes.  People who have more complicated setups can read the darned 
documentation and tune the conf file by hand.

> Is PostgreSQL the only thing being run on this computer?

First, becuase it affects a couple of other variables:

What kind of database server are you expecting to run?
A) Web Server (many small fast queries from many users, and not much update 
activity)
B) Online Transaction Processing (OLTP) database (many small updates 
constantly from many users; think "accounting application").
C) Online Analytical Reporting (OLAP) database (a few large and complicated 
read-only queries aggregating large quantites of data for display)
D) Data Transformation tool (loading large amounts of data to process, 
transform, and output to other software)
E) Mixed-Use Database Server (a little of all of the above)
F) Workstation (installing this database on a user machine which also has a 
desktop, does word processing, etc.)

If the user answers anything but (F), then we ask:

Will you be running any other signficant software on this server, such as a 
web server, a Java runtime engine, or a reporting application? (yes|no)

If yes, then:

How much memory do you expect this other software, in total, to regularly use 
while PostgreSQL is in use?  (# in MB; should offer default of 50% of the RAM 
scanned).

> How are the clients going to connect?
>
>     i.e. TCP or Unix sockets

We should warn them that they will still need to configure pg_hba.conf.

> How many clients can connect to this database at once?
>
>     Affects max_connections

Should add a parenthetical comment that for applications which use pooled 
connections, or intermittent connection, such as Web applications, the number 
of concurrent connections is often much lower than the number of concurrent 
users.

> How many databases and how many tables in each database are going to be
> present?
>
>     Affects max_fsm_pages, checkpoint_segments, checkpoint_timeout

Also need to ask if they have an idea of the total size of all databases, in 
MB or GB, which has a stronger relationship to those variables.

Also, this will give us a chance to check the free space on the PGDATA 
partition, and kick the user out with a warning if there is not at least 
2xExpected Size available.

> Do you want to vacuum you database regularly?
>
>     Initial question for cron job
>
> It is recomended that you vacuum analyze every night, do you want to do
> this?
> It is also recomended that you vacuum full every month, do you want to
> do this?

Depends on size/type of database.  For large OLTP databases, I recommend 
vacuum as often as every 5 mintues, analyze every hour, and Vacuum Full + 
Reindex once a week.   For a workstation database, your frequencies are 
probably OK.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to