Madison Kelly wrote:
Hi all,
I am new to the list and I didn't want to seem rude at all so I wanted to ask if this was okay first.
No problem. Reading your message below, you might want to try the performance list, but general is a good place to start.
I have a program I have written in perl which uses a postgresSQL database as the backend. The program works but the performance is really bad. I have been reading as much as I can on optimizing performance but still it isn't very reasonable. At one point I had my program able to process 175,000 records in 16min 10sec on a Pentium3 650MHz, 448MB RAM test machine. Since then I got a Pentium3 1GHz, 512MB system and I have tried a lot of things to get the performance up but now it is substantially slower and I can't seem to figure out what I am doing wrong.
A few places to start: 1. VACUUM FULL This will make sure any unused space is reclaimed 2. ANALYZE This will recalculate stats for the tables 3. Basic performce tuning: http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php There's also a good guide to the postgresql.conf file on varlena.com
Would it be appropriate to ask for help on my program on this list? Full disclosure: The program won't be initially GPL'ed because it is for my company but it will be released for free to home users and the source code will be made available (similar to other split-license programs) though once my company makes it's money back I think they will fully GPL it (I am on my boss's case about it :p ).
No problem - what you licence your software under is your concern. Once you've taken the basic steps described above, try to pick out a specific query that you think is too slow and provide:
1. PostgreSQL version 2. Basic hardware info (as you have) 3. Sizes of tables. 4. Output of EXPLAIN ANALYZE <query here>
The EXPLAIN ANALYZE runs the query and shows how much work PG thought it would be and how much it actually turned out to be.
HTH
Thank you very much!! I am using Psql 7.4 on a stock install of Fedora Core 2 on my IBM thinkpad a22m (P3 1GHz, 512MB RAM, not the fastest HDD). The drive carrier I am using is connected via USB2 and uses a few different hard drives with the fastest being a couple of Barracuda 7200.7 drives (2MB cache, 7,200rpm). I described the program in my reply to Martijn so here is some of the code (code not related to psql snipped, let me know if posting it would help - sorry for the wrapping...):
=-[ Calling the database ]-=
# Open the connection to the database
my $DB = DBI->connect("DBI:Pg:dbname=$db_name","$user")|| die("Connect error (Is PostgresSQL running?): $DBI::errstr");
# Prepare the select statements before using them for speed:
$select_sth = $DB->prepare("SELECT null FROM file_dir WHERE file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die "$DBI::errstr";
$select_up = $DB->prepare("UPDATE file_dir SET file_perm=?, file_own_user=?, file_own_grp=?, file_size=?, file_mod_date=?, file_mod_time=?, file_mod_time_zone=?, file_exist=? WHERE file_src_uuid=? AND file_parent_dir=? AND file_name=?") || die "$DBI::errstr";
$select_in = $DB->prepare("INSERT INTO file_dir ( file_src_uuid, file_name, file_dir, file_parent_dir, file_perm, file_own_user, file_own_grp, file_size, file_mod_date, file_mod_time, file_mod_time_zone, file_backup, file_restore, file_display, file_exist ) VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )") || die "$DBI::errstr";
# Set the 'file_exist' flag to 'false' and reset exiting files to 'true'.
$DB->do("UPDATE file_dir SET file_exist='f' WHERE file_src_uuid='$file_src_uuid'") || die "$DBI::errstr";
# Start scanning the drive
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) || die "$DBI::errstr";
if ( $num > 0 )
{ $select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name) || die "$DBI::errstr";
}
else
{
$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist) || die "$DBI::errstr";
}
# We need to grab the existing file settings for the special file '/.'
$DBreq=$DB->prepare("SELECT file_backup, file_restore, file_display FROM file_dir WHERE file_parent_dir='/' AND file_name='.' AND file_src_uuid='$file_src_uuid'") || die $DBI::errstr;
$file_backup=$DBreq->execute();
@file_backup=$DBreq->fetchrow_array();
[EMAIL PROTECTED];
[EMAIL PROTECTED];
[EMAIL PROTECTED];
# Jump into the re-entrant subroutine to scan directories and sub-dirs
&list_files($real_dir, $exclude_list_num, $relative_dir, $file_backup, $file_restore, $file_display);
# Inside the sub routine
# Does the directory/file/symlink already exist? (there are three of these for each file type)
$num = $select_sth->execute($file_src_uuid,$relative_dir,$file_name) || die "$DBI::errstr";
if ( $num > 0 )
{
$select_up->execute($file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_exist,$file_src_uuid,$file_parent_dir,$file_name) || die "$DBI::errstr";
}
else
{
# The file did not exist so we will use the passed parent settings for the 'file_backup' flag and leave the 'file_display' flag set to 'f'
$select_in->execute($file_src_uuid,$file_name,$file_dir,$file_parent_dir,$file_perm,$file_own_user,$file_own_grp,$file_size,$file_mod_date,$file_mod_time,$file_mod_time_zone,$file_backup,$file_restore,$file_display,$file_exist) || die "$DBI::errstr";
}
# If this was a file I would loop and process the next file in the directory, if it was a directory itself I would now re-enter the subroutine to process it's contents and when I fell back I would pick up where I left off
# Returning from the final subroutine and finishing up
$DB->do("VACUUM ANALYZE");
=-[ finished DB related source code ]-=
Here is the schema for the 'file_dir' table which I hit repeatedly here:
=-[ file_dir table and index schemas ]-=
CREATE TABLE file_dir ( -- Used to store info on every file on source partitions
file_id serial unique, -- make this 'bigserial' if there may be more than 2 billion files in the database
file_src_uuid varchar(40) not null, -- the UUID of the source partition hosting the original file
file_org_uuid varchar(40), -- the UUID that the file came from (when the file was moved by TLE-BU)
file_name varchar(255) not null, -- Name of the file or directory
file_dir bool not null, -- t = is directory, f = file
file_parent_dir varchar(255) not null, -- if directory '/foo/bar', parent is '/foo', if file '/foo/bar/file', parent is '/foo/bar'. The mount directory is treated as '/' so any directories below it will be ignored for this record.
file_perm varchar(10) not null, -- file or directory permissions
file_own_user varchar(255) not null, -- The file's owning user (by name, not UID!!)
file_own_grp varchar(255) not null, -- The file's owning group (by name, not GID!!)
file_size bigint not null, -- File size in bytes
file_mod_date varchar(12) not null, -- File's last edited date
file_mod_time varchar(20) not null, -- File's last edited time
file_mod_time_zone varchar(6) not null, -- File's last edited time zone
file_backup boolean not null default 'f', -- 't' = Include in backup jobs, 'f' = Do not include in backup jobs
file_restore boolean not null default 'f', -- 't' = Include in restore jobs, 'f' = Do not include in restore jobs
file_display boolean not null default 'f', -- 't' = display, 'f' = hide
file_exist boolean default 't' -- Used to catch files that have been deleted since the last scan. Before rescan, all files in a given src_uuid are set to 0 (deleted) and then as each file is found or updated it is reset back to 1 (exists) and anything left with a value of '0' at the end of the scan is deleted and we will remove their record.
);
-- CREATE INDEX file_dir_idx ON file_dir (file_src_uuid,file_name,file_parent_dir);
=-[ Finish file_dir table and index schemas ]-=
Thanks so much!!
Madison
---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match