Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Greg Stark

Vitaly Belman [EMAIL PROTECTED] writes:

   With all that said, VMWare is badly suited for running a database,
   while CoLinux can be run as a service (didn't try it yet though),
   VMWare always sits there, it is slow to go up, slow to go down and
   generally feels like a system hog.

Uhm, it sounds like you're using VMWare Workstation? VMWare has a range of
different versions including some that are specifically targeted towards
server situations. I think they had the idea that hosting companies would run
hundreds of virtual machines on a server and provide their hosting clients
with a virtual machine to play with.

That said, I'm curious why the emulated servers performed better than the
Native Windows port. My first thought is that they probably aren't syncing
every write to disk so effectively they're defeating the fsyncs, allowing the
host OS to buffer disk writes.

I would be curious to see better stats on things like a pgbench run which
would give some idea of the context switch efficiency, and a large select or
update, which would give some idea of the i/o throughput. Really there's no
excuse for the Windows port to be slower than an emulator. Barring effects
like the disk caching I mentioned, it should far outpace the emulators.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL and Kernel 2.6.x

2004-06-02 Thread Rajesh Kumar Mallah
V i s h a l Kashyap @ [Sai Hertz And Control Systems] wrote:
Dear all,
Have anyone compiled PostgreSQL with kernel 2.6.x if YES
1. Was their any performance gains
Else
1. Is it possible
2. What problems would keeping us away from compiling on kernel 2.6
We run pgsql on 2.6.6 there was upto 30% improvement in performance
for certain queries. None, everything works just fine.
Regds
Mallah.
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Jan Wieck
Using VMware myself quite extensively, I wonder what the disk 
configuration was that you created for the VM. Where the disks 
preallocated and did you make sure that they are contiguous on the NTFS 
filesystem? Did you install the VMware tools in the guest operating system?

What did you use to measure the performance?
Jan
On 6/1/2004 6:56 PM, Vitaly Belman wrote:
Hello pgsql-performance,
  I was using the native windows PostgreSQL 7.5Dev and was adviced by
  several people to use an emulated PostgreSQL instead, as it is just
  a beta.
  Well, I give it a whirl and tried both commercial VMWare and the
  freeweare open-source CoLinux, both work under Windows and both
  emulate Linux, that's a quick review of my experience with them, may
  someone in need learn from it.
  This might be not the best place for such a post, but since the
  subject was brought up here, I'll post it here as well. If someone
  thinks it should be posted somewhere else, let me know.
  Installation  Configuration
  
  
  VMWare:

  On the bright side, the installation went quite smoothly, VMWare
  configured all the network stuff by itself and I had no trouble
  using the net right away. On the grim side, the installation itself
  took ages, compared to the plug  play feel of CoLinux.
  Installing PostgreSQL on VMWare was quite straightforward, just as
  the the PostgreSQL documention goes.
  CoLinux:
  As I said, with CoLinux the installation itself goes very quickly.
  To get Linux running you need to download practically less than 20mb
  which include the distribution (Debian in my case) and the CoLinux
  setup. Configuring CoLinux took a bit longer than VMWare, yet, not
  long as I thought it would take. In fact, it can be very easy if you
  just follow the documention of CoLinux Wiki stuff, there are some
  very easy to follow tutorials there.
  Installing PostgreSQL on CoLinux proved a little more difficult
  (again, Debian), but I posted a quick tutorial that should smooth
  the process: http://www.colinux.org/wiki/index.php/PostgreSQL.
  Performance
  ---
  This was a totally subjective test (especially since one of the
  participants is in a beta stage), yet, that's what I tested and that's
  what I needed to know.
  To make the test as fair as possible, I did an exact dump of the
  same database. I ran the SQLs (around 10) in the same order on all
  of them and repeated the test several times. I also did an EXPLAIN
  on the queries to make sure all the databases work on the query the
  same way. It wasn't a full test though, I didn't test mass select
  load, nor inserts, nor work under heavy load, nor I tried different
  types of joins. All I did was to run some heavy (in execution time)
  queries. So you should take these tests just for what they are.
  That's what I got:
  The native window port performed poorly lagging
  30%-50% behind the VMWare/CoLinux solutions in execution times,
  rather sad, but not unexpected, I guess.
  CoLinux and VMWare give AROUND the same results, yet CoLinux did
  give slightly better performance (I'd say 5%-10%) but with such
  slight improvement and inconsistency I wouldn't count it as much.
  Conclusion
  --
  With all that said, VMWare is badly suited for running a database,
  while CoLinux can be run as a service (didn't try it yet though),
  VMWare always sits there, it is slow to go up, slow to go down and
  generally feels like a system hog.
  I'll go on with CoLinux for now and hope it will act as good as it
  looks.
  http://www.vmware.com/
  http://www.colinux.org/
  Thanks to Bryan and Matthew for their advices regarding the emulations.
Regards,
 Vitaly Belman
 
 ICQ: 1912453
 AIM: VitalyB1984
 MSN: [EMAIL PROTECTED]
 Yahoo!: VitalyBe

---(end of broadcast)---
TIP 8: explain analyze is your friend

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 That said, I'm curious why the emulated servers performed better than the
 Native Windows port. My first thought is that they probably aren't syncing
 every write to disk so effectively they're defeating the fsyncs, allowing the
 host OS to buffer disk writes.

It would be fairly easy to check this by repeating the comparisons with
fsync = off in postgresql.conf.  A performance number that doesn't
change much would be a smoking gun ;-).

The native port hasn't had any performance testing done on it yet, and
I wouldn't be surprised to hear of a gotcha or two.  Perhaps with the
recent schedule change there will be some time for performance tuning
before we go beta.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Trigger Function

2004-06-02 Thread Duane Lee - EGOVX
Title: RE: [PERFORM] Trigger  Function





Thanks for the response. I was pretty sure it couldn't be done the way I wanted to but felt I would ask anyway.


Thanks again,
Duane


-Original Message-
From: Mike Nolan [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, June 01, 2004 3:04 PM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Trigger  Function



 My problem is I defined the before and after
 fields in the audit table as TEXT and when I try to move NEW or OLD into
 these fields I get the error NEW used in query that is not in a rule. 


You're trying to insert record data into a text field, that doesn't work.
OLD and NEW can be used as either record identifiers (as in RETURN OLD)
or column qualifiers (as in OLD.colname), but you can't intermingle them.


I don't think postgres (pl/pgsql) has row-to-variable and variable-to-row 
functions like serialize and unserialize, that's probably what you'd need. 
It would probably be necessary to write something like that in C, since 
at this point pl/perl cannot be used for trigger functions. 


I've not tried using pl/php yet, the announcement for it says it can be 
used for trigger functions. 


My first thought is that even if there was a serialize/unserialize 
capabiity you might be able to write something using it that creates 
the log entry but not anything that allows you to query the log for 
specific column or row entries.


It would probably require a MAJOR extension of SQL to add it to pg,
as there would need to be qualifiers that can be mapped to specific
tables and columns. Even if we had that, storing values coming from 
multiple tables into a single audit table would present huge challenges.


I've found only two ways to implement audit logs:


1. Have separate log tables that match the structure of
 the tables they are logging.


2. Write a trigger function that converts columns to something you can
 store in a common log table. (I've not found a way to do this without
 inserting one row for each column being logged, though.)
--
Mike Nolan





Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Matthew Nuzum
I have colinux running on a Fedora Core 1 image.  I have the rhdb 3 (or
PostgreSQL RedHat Edition 3) on it running.  Here are tests with fsync on
and off:
  FSYNC OFF   FSYNC ON  RUN
136.9142.0  124.5149.1  1
122.1126.7  140.1169.7  2
125.7148.7  147.4180.4  3
103.3136.7  136.8166.3  4
126.5146.1  152.3187.9  5
114.4133.3  144.8176.7  6
124.0146.5  143.3175.0  7
121.7166.8  147.8180.5  8
127.3151.8  146.7180.0  9
124.6143.0  137.2167.5  10
--
122.7144.2  142.1173.3  AVG

I hope those numbers' formatting come through all right.  

This computer is an AMD Athlon 900MHz with 448MB Ram running XP Pro SP1
This is using Colinux 0.60 (not the recently released 0.61) and 96MB of RAM
allocated to linux.

The computer was idle but it was running Putty, Excel and Task Manager
during the process.  (I prefer to use Putty to SSH into the virtual computer
than to run the fltk console)

It occurs to me that the fsync may be performed to the linux filesystem, but
this filesystem is merely a file on the windows drive.  Would Windows cache
this file?  It's 2GB in size, so if it did, it would only be able to cache
part of it.

I'd like to run a more difficult test personally.  It seems like this test
goes too fast to be very useful.

If someone would like me to try something more specific, e-mail me right
away and I'll do it.  I must leave my office at 4:15 EDT and will not return
until Friday, although I can do another test on my home computer Thursday.

Matthew Nuzum   | Makers of Elite Content Management System
www.followers.net   | View samples of Elite CMS in action
[EMAIL PROTECTED]   | http://www.followers.net/portfolio/


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-performance-
 [EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Wednesday, June 02, 2004 11:25 AM
 To: Greg Stark
 Cc: Vitaly Belman; [EMAIL PROTECTED]; Bryan Encina; Matthew
 Nuzum
 Subject: Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux
 
 Greg Stark [EMAIL PROTECTED] writes:
  That said, I'm curious why the emulated servers performed better than
 the
  Native Windows port. My first thought is that they probably aren't
 syncing
  every write to disk so effectively they're defeating the fsyncs,
 allowing the
  host OS to buffer disk writes.
 
 It would be fairly easy to check this by repeating the comparisons with
 fsync = off in postgresql.conf.  A performance number that doesn't
 change much would be a smoking gun ;-).
 
 The native port hasn't had any performance testing done on it yet, and
 I wouldn't be surprised to hear of a gotcha or two.  Perhaps with the
 recent schedule change there will be some time for performance tuning
 before we go beta.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Leeuw van der, Tim
Hi All,

I think it would actually be interesting to see the performance of the Cygwin version 
for these same benchmarks, then we've covered all ways to run PostgreSQL on Windows 
systems. (I expect though that performance of Cygwin-PostgreSQL will improve 
considerably when an updated version is released that uses Cygwin native IPC instead 
of the ipc-daemon.)

regards,

--Tim

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Context Switching issue: Spinlock doesn't fix.

2004-06-02 Thread Josh Berkus
Folks,

I've been testing varying SPINS_PER_DELAY in a client's installation of 
PostgreSQL against a copy of a production database, to test varying this 
statistic as a way of fixing the issue.   

It does not seem to work.

I've tested all of the following graduated levels:

100 (the original)
250
500
1000
2000
5000
1
2
3
5

None of these quantities seem to make any difference at all in the number of 
context switches -- neither down nor up.   Seems to me like this is a dead 
end.   Does anyone have test results that show otherwise?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Postgres query optimization with varchar fields

2004-06-02 Thread W.H. van Atteveldt
Dear reader, 

I am investigating whether it is useful to directly query a database
containing a rather large text corpus (order of magnitude 100k - 1m
newspaper articles, so around 100 million words), or whether I should
use third party text indexing services. I want to know things such as:
how often is a certain word (or pattern) mentioned in an article and how
often it is mentioned with the condition that another word is nearby
(same article or n words distant).

I created a table listing the words one word per row, and created an
index on the word and wordnr columns. An example query would be:

simple: select articleid, count(*) as count from words w where articleid
in (select id from articles where batchid in (84,85,100,101,118,121))
and (word like 'PATTERN')  group by articleid
complex: select articleid, count(*) as count from words w where
articleid in (select id from articles where batchid in
(84,85,100,101,118,121)) and (word like 'PATTERN')  and exists (select
* from words w2 where w.articleid = w2.articleid and (word like
'PATTERN2')) group by articleid

According to the diagnostics, the database does use the indices for the
query, but it is still rather slow (around 10 minutes for a 'simple
query', x seconds for a complex one)

It is important that the complex query only counts instances where the
PATTERN is found and PATTERN2 only functions as a criterium and does not
add to the count.

My questions are: (technical details provided below)
- Does anyone disagree with the general setup?
- Is there a more sensible way to phrase my SQL?
- Any other ideas to improve performance?

Thanks,

Wouter van Atteveldt
Free University Amsterdam

--

Technicalities:

I am using a Postgresql 7.4.1 database on a linux machine (uname -a:
Linux swpc450.cs.vu.nl 2.4.22-1.2115.nptl #1 Wed Oct 29 15:31:21 EST
2003 i686 athlon i386 GNU/Linux). The table of interest is: (lemma, pos,
simplepos currently not used)

Table public.words
   Column   |  Type  |   Modifiers
++--
-
 id | integer| not null default
nextval('public.words_id_seq'::text)
 articleid  | integer| not null
 sentencenr | integer| not null
 word   | character varying(255) | not null
 lemma  | character varying(255) |
 pos| character varying(255) |
 simplepos  | character(1)   |
 wordnr | integer| not null
 parnr  | integer| not null
Indexes:
words_pkey primary key, btree (id)
words_aid btree (articleid)
words_word btree (word)
words_word_ptrn btree (word varchar_pattern_ops)
words_wordnr btree (wordnr)

Query plans:

anoko= explain select articleid, count(*) as count from words w where
articleid in (select id from articles where batchid in
(84,85,100,101,118,121)) and (word like 'integratie%')  group by
articleid;
 
QUERY PLAN


--
 HashAggregate  (cost=937959.21..937959.22 rows=2 width=4)
   -  Hash IN Join  (cost=95863.70..937816.01 rows=28640 width=4)
 Hash Cond: (outer.articleid = inner.id)
 -  Index Scan using words_word_ptrn on words w
(cost=0.00..836604.62 rows=208886 width=4)
   Index Cond: (((word)::text ~=~ 'integratie'::character
varying) AND ((word)::text ~~ 'integratif'::character varying))
   Filter: ((word)::text ~~ 'integratie%'::text)
 -  Hash  (cost=94998.60..94998.60 rows=146041 width=4)
   -  Index Scan using articles_batchid, articles_batchid,
articles_batchid, articles_batchid, articles_batchid, articles_batchid
on articles  (cost=0.00..94998.60 rows=146041 width=4)
 Index Cond: ((batchid = 84) OR (batchid = 85) OR
(batchid = 100) OR (batchid = 101) OR (batchid = 118) OR (batchid =
121))

explain select articleid, count(*) as count from words w where articleid
in (select id from articles where batchid in (84,85,100,101,118,121))
and (word like 'PATTERN')  and exists (select * from words w2 where
w.articleid = w2.articleid and (word like 'PATTERN2')) group by
articleid
anoko- ;
 
QUERY PLAN


--
 GroupAggregate  (cost=168253089.23..168254556.46 rows=1 width=4)
   -  Merge IN Join  (cost=168253089.23..168254484.85 rows=14320
width=4)
 Merge Cond: (outer.articleid = inner.id)
 -  Sort  (cost=168144438.23..168144699.33 rows=104443 width=4)
   

Re: [PERFORM] Hardware opinions wanted

2004-06-02 Thread Andrew Hammond
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I'm working through the aquisition process for a quad Opteron box right
now. I'll be benchmarking it against a quad processor p630 as well as a
quad Xeon after we get it and posting results here. But that's about a
month or two from now.
I expect that the results will be strongly in favour of the Opetron,
especially the price / performance since the Opteron box is being quoted
at about half the price of the p630 systems.
One thing you may wish to consider is going with lots of 10kRPM SATA
disks instead of 15kRPM SCSI disks. Two companies that I'm aware of
offer quad Opteron solutions with SATA raid:
http://www.quatopteron.com/
http://alltec.com/home.php
Andrew Hammond
DBA - Afilias
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFAvjYKgfzn5SevSpoRAvq0AJkBDXOKL52HXg43mQ6rXe/i9RzFkQCfYQn8
HpHP2U0jvjfYIvihNLFLbzA=
=LyqB
-END PGP SIGNATURE-
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Shachar Shemesh
Greg Stark wrote:
That said, I'm curious why the emulated servers performed better than the
Native Windows port. My first thought is that they probably aren't syncing
every write to disk so effectively they're defeating the fsyncs, allowing the
host OS to buffer disk writes.
 

I havn't tested it, and it's certanly possible. However, please bear in 
mind that it is also possible that it just gives better performance.

The reason this may be possible is that the emulation layer gets the CPU 
(and other resources) from the OS in bulk, and decides on it's own how 
to allocate it to the various processes running within the emulation. 
Inparticular, this on it's own is done using the stock Linux kernel. 
As Postgresql works sufficiently better on Linux than on Windows, this 
yields better performance.

Again - speculation only. Someone should defenitely make sure that no 
caching takes place where it shouldn't.

As a side note, I have had a chance to talk to Dan Aloni (coLinux 
maintainer) about running PostgreSQL on coLinux. He said that he knows 
that this particular use is high on people's priority list, but he feels 
it is totally unsafe to run a production database on alpha grade 
software. Then again, free software projects being what they are, this 
is usually what a maintainer would say.

Shachar
--
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Pl/Pgsql Functions running simultaneously

2004-06-02 Thread Marcus Whitney
Hello all,

  I have an import function that I have been working on for some time now, and 
it performed well up until recently.  It is doing a lot, and because the 
queries are not cached, I am not sure if that is what the problem is.  If a 
function takes a while, does it lock any of the tables it is accessing, even 
for SELECT?

Below is the bulk of the function:

-- set sql statement variables
create_import_file_sql := ''COPY '' || container_table || '' ('' || 
filtered_container_columns  || '') TO '' || 
quote_literal(formatted_import_file) || '' WITH NULL AS '' ||  
null_single_quotes;
upload_to_import_table_sql := ''COPY '' || import_table || '' ('' || 
field_names || '') FROM '' || quote_literal(formatted_import_file) || '' WITH 
NULL AS '' ||  null_single_quotes;
clean_personalization_fields_sql := ''UPDATE '' || import_table || '' 
SET emma_member_email = btrim(emma_member_email, '' || 
quote_literal(quoted_single_quote) || '') , emma_member_name_first = 
btrim(emma_member_name_first, '' || quote_literal(quoted_single_quote) || 
'') ,   emma_member_name_last = btrim(emma_member_name_last, '' || 
quote_literal(quoted_single_quote) || '') ;'';
clean_personalization_fields_sql2 := ''UPDATE '' || import_table || '' 
SET emma_member_email = btrim(emma_member_email) , emma_member_name_first = 
btrim(emma_member_name_first) ,   emma_member_name_last = 
btrim(emma_member_name_last) ;'';
set_account_id_sql := ''UPDATE '' || import_table || '' SET 
emma_account_id = '' || account_id;
set_default_active_status_sql := ''UPDATE '' || import_table || '' SET 
emma_member_status_id = 1'';
set_errors_for_null_email_sql := ''UPDATE '' || import_table || '' SET 
emma_member_status_id = 2 WHERE emma_member_email IS NULL'';
record_null_email_count_sql := ''UPDATE '' || import_history_table || 
'' SET emma_import_null_email_count = (SELECT COUNT(*) FROM '' || 
import_table || '' WHERE emma_member_email IS NULL) WHERE 
emma_import_history_id ='' || import_history_id;
set_errors_for_invalid_email_sql := ''UPDATE '' || import_table || ''  
SET emma_member_status_id = 2  WHERE emma_member_email !~* '' || email_regex;
record_invalid_email_count_sql := ''UPDATE '' || import_history_table 
|| ''  SET emma_import_invalid_email_count = ( SELECT COUNT(*) FROM '' || 
import_table || ''  WHERE emma_member_email !~* '' || email_regex || '' ) 
WHERE emma_import_history_id ='' || import_history_id;
get_dupes_in_import_sql := ''SELECT emma_member_email, 
emma_member_status_id FROM '' || import_table || '' GROUP BY 
emma_member_email, emma_member_status_id having count(*)  1'';
insert_dupes_sql := ''INSERT  INTO '' || dupe_table || '' SELECT * 
FROM '' || import_table || '' WHERE LOWER(emma_member_email) = LOWER('' || 
member_table || ''.emma_member_email)'';
record_table_dupe_count_sql := ''UPDATE '' || import_history_table || 
'' SET emma_import_table_dupe_email_count = (SELECT COUNT(*) FROM '' || 
import_table || '' WHERE emma_member_email = LOWER('' || member_table || 
''.emma_member_email)) WHERE emma_import_history_id ='' || import_history_id;
remove_dupes_from_import_table_sql := ''DELETE FROM '' || import_table 
|| '' WHERE LOWER(emma_member_email) = LOWER('' || member_table || 
''.emma_member_email)'';
create_clean_import_file_sql := ''COPY '' || import_table || '' TO '' 
|| quote_literal(clean_import_file) || '' WITH NULL AS '' ||  
null_single_quotes;
create_members_groups_ids_file_sql := ''COPY '' || import_table || 
'' (emma_member_id) TO '' || quote_literal(members_groups_ids_file) || '' 
WITH NULL AS '' ||  null_single_quotes;
empty_import_table_sql := ''TRUNCATE '' || import_table;
upload_clean_import_sql := ''COPY '' || member_table || '' FROM '' || 
quote_literal(clean_import_file) || '' WITH NULL AS '' ||  
null_single_quotes;
upload_members_groups_ids_sql := ''COPY '' || members_groups_ids_table 
|| '' (emma_member_id) FROM '' || quote_literal(members_groups_ids_file) || 
'' WITH NULL AS '' ||  null_single_quotes;
empty_members_groups_ids_sql := ''TRUNCATE '' || 
members_groups_ids_table;
empty_members_dupes_sql := ''TRUNCATE '' || dupe_table;
vacuum_sql := ''VACUUM '' || member_table || ''; VACUUM '' || 
import_table || ''; VACUUM '' || container_table || ''; VACUUM '' || 
members_groups_table || ''; VACUUM '' || members_groups_ids_table || ''; 
VACUUM '' || dupe_table;

-- BEGIN ACTIVITY
-- Create the filtered import file with the
EXECUTE create_import_file_sql;
-- Load data from the filtered file to the import table
EXECUTE upload_to_import_table_sql;
-- Set account id in import table
EXECUTE set_account_id_sql;
-- Set the status of all the records to 1
EXECUTE set_default_active_status_sql;
-- Clean personalization data
EXECUTE 

Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Gary Doades
On 2 Jun 2004 at 16:45, Merlin Moncure wrote:

 
 'better' does not mean 'faster'.  Win32 has a pretty decent journaling
 filesytem (ntfs) and a good I/O subsystem which includes IPC.  Process
 management is poor compared to newer linux kernels but this is
 unimportant except in extreme cases.  Right now the win32 native does
 not sync() (but does fsync()).  So, the performance is somewhere between
 fsync = off and fsync = on (probably much closer to fsync = on).  It is
 reasonable to assume that the win32 port will outperform the unix
 versions at many tasks (at the expense of safety) until the new sync()
 code is put in.
 
 If tested on the same source base, 40-60% differences can only be coming
 from the I/O subsystem.  There are other factors which aren't clear from
 this exchange like what version of gcc, etc.
 

Hmm, interesting.

I've been running the Win32 port for a couple of weeks now. Using the 
same database as a Linux 2.6 system. Same processor and memory 
but different disks.

Linux system has 10K rpm SCSI disks
Windows has 7200 rpm serial ATA disks.

When a lot of IO is involved the performance differences are very mixed 
as I would expect. Sometimes Windows wins, sometimes Linux.

BUT, very consistently, when NO IO is involved then the Win32 port is 
always around 20% slower than Linux. In cases where the EXPLAIN 
ANALYZE results are different I have disregarded. In all the cases that 
the EXPLAIN ANALYZE results are the same and no IO is involved the 
Win32 port is slower.

Currently I am putting this down to the build/gcc differences. I can't see 
why there should be this difference otherwise. (memory 
management??)

Regards,
Gary.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Inherited Tables Performance

2004-06-02 Thread Marcus Whitney
Is there any crossover in performance with sibling inherited tables?  

For Ex. 

if I have a parent table called : people

A child of 'people' called: Adults
and
A child of 'people' called: Kids

Does the work I do to Adults, namely copies, huge updates and such ever affect 
the performance of Kids?

Thanks.


-- 
marcus whitney

chief architect : cold feet creative

www.coldfeetcreative.com

800.595.4401
 


cold feet presents emma

email marketing for discriminating

organizations everywhere

visit www.myemma.com

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Greg Stark
Matthew Nuzum [EMAIL PROTECTED] writes:

 I have colinux running on a Fedora Core 1 image.  I have the rhdb 3 (or
 PostgreSQL RedHat Edition 3) on it running.  Here are tests with fsync on
 and off:
   FSYNC OFF FSYNC ON  RUN
 136.9  142.0  124.5149.1  1
 122.1  126.7  140.1169.7  2
 125.7  148.7  147.4180.4  3
 103.3  136.7  136.8166.3  4
 126.5  146.1  152.3187.9  5
 114.4  133.3  144.8176.7  6
 124.0  146.5  143.3175.0  7
 121.7  166.8  147.8180.5  8
 127.3  151.8  146.7180.0  9
 124.6  143.0  137.2167.5  10
 --
 122.7  144.2  142.1173.3  AVG
 
 I hope those numbers' formatting come through all right.  

No, they didn't. You used tabs? Are they four space tabs or 8 space tabs?
I assume 4 space tabs, but then what is the meaning of the four columns?
You have two columns for each fsync setting? One's under Windows and one's
under Vmware? Which is which?

 It occurs to me that the fsync may be performed to the linux filesystem, but
 this filesystem is merely a file on the windows drive.  Would Windows cache
 this file?  It's 2GB in size, so if it did, it would only be able to cache
 part of it.

Well VMWare certainly doesn't know that the linux process called fsync. For
all it knows the Linux kernel just schedule the i/o because it felt it was
time.

So the question is how does VMWare alway handle i/o normally. Does it always
handle i/o from the Guest OS synchronously or does it buffer it via the
Host OS's i/o system. 

I'm actually not sure which it does, it could be doing something strange. But
does seem most likely that it lets Windows buffer the writes, or does so
itself. It might also depend on whether you're using raw disks or a virtual
disk file. Undoable disks would throw another wrench in the works entirely.

Note that caching isn't really the question. It doesn't have to cache the
entire 2GB file or even very much of it. It just has to store the block that
linux wants to write and report success to linux without waiting for the disk
to report success. Linux will then think the file is sync'd to disk and allow
postgres to continue with the next transaction without actually waiting for
the physical disk to spin around to the right place and the head to seek and
perform the write.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Matthew Nuzum
On Wed, 2004-06-02 at 17:39, Greg Stark wrote:
 Matthew Nuzum [EMAIL PROTECTED] writes:
 
  I have colinux running on a Fedora Core 1 image.  I have the rhdb 3 (or
  PostgreSQL RedHat Edition 3) on it running.  Here are tests with fsync on
  and off:
FSYNC OFF   FSYNC ON  RUN
  136.9142.0  124.5149.1  1
  122.1126.7  140.1169.7  2
  125.7148.7  147.4180.4  3
  103.3136.7  136.8166.3  4
  126.5146.1  152.3187.9  5
  114.4133.3  144.8176.7  6
  124.0146.5  143.3175.0  7
  121.7166.8  147.8180.5  8
  127.3151.8  146.7180.0  9
  124.6143.0  137.2167.5  10
  --
  122.7144.2  142.1173.3  AVG
  
  I hope those numbers' formatting come through all right.  
 
 No, they didn't. You used tabs? Are they four space tabs or 8 space tabs?
 I assume 4 space tabs, but then what is the meaning of the four columns?
 You have two columns for each fsync setting? One's under Windows and one's
 under Vmware? Which is which?
 
Sorry that wasn't clear.  The pgbench program puts out two numbers,
can't remember what they are, I think one number included the time to
make the connection.  Therefore, the first two columns represent the two
values presented from pgbench with FSYNC off.  The second two columns
are those same to figures but with FSYNC ON.  The 5th column is the
run.  I did 10 runs and included the output of all runs so that incase
anything significant could be gleaned from the details, the data would
be there.

The executive summary is this:
Tom was curious if colinux might be deceiving the applications that
expect the fsync to occur.  He suspected that pgbench run with and
without fsync enabled might reveal something.  Therefore:
FSYNC ON:  142.1
FSYNC OFF: 122.7

Having FSYNC off seems to yield faster results.

I'd like some input on a more demanding test though, because these tests
run so quickly I can't help but be suspicious of their accuracy.  When
there are two OSs involved, it seems like the momentary activity of a
background process could skew these results.

  It occurs to me that the fsync may be performed to the linux filesystem, but
  this filesystem is merely a file on the windows drive.  Would Windows cache
  this file?  It's 2GB in size, so if it did, it would only be able to cache
  part of it.
 
 Well VMWare certainly doesn't know that the linux process called fsync. For
 all it knows the Linux kernel just schedule the i/o because it felt it was
 time.
 
 So the question is how does VMWare alway handle i/o normally. Does it always
 handle i/o from the Guest OS synchronously or does it buffer it via the
 Host OS's i/o system. 
We probably will never know what the internal workings of VMWare are
like because it is a closed source program.  I'm not slighting them, I
have purchased a license of VMWare and use it for my software testing. 
However, colinux is an open source project and we can easily find out
how they handle this.  I have little interest in this as I use this
merely as a tool to speed up my application development and do not run
any critical services what-so-ever.

 
 I'm actually not sure which it does, it could be doing something strange. But
 does seem most likely that it lets Windows buffer the writes, or does so
 itself. It might also depend on whether you're using raw disks or a virtual
 disk file. Undoable disks would throw another wrench in the works entirely.
In these tests I'm using a virtual disk file.  This is a 2GB file on the
hard drive that linux sees as a disk partition.  Colinux does not
support undoable disks in the way that vmware does.  Their wiky site
does not mention anything tricky being done to force disk writes to
actually be written; the implication therefore is that it leaves the i/o
completely at the discretion of XP.  Also note that XP Pro and 2000 Pro
both offer different caching options for the user to choose so unless it
does something to actually force a write the answer is probably who
knows.

 
 Note that caching isn't really the question. It doesn't have to cache the
 entire 2GB file or even very much of it. It just has to store the block that
 linux wants to write and report success to linux without waiting for the disk
 to report success. Linux will then think the file is sync'd to disk and allow
 postgres to continue with the next transaction without actually waiting for
 the physical disk to spin around to the right place and the head to seek and
 perform the write.

That's interesting to know.  I wondered about that.

So, my summary is this:
If you develop applications in windows that run in linux and you need a
testing platform you may like colinux a lot because of the following:
 * It's purchase price is 0
 * It's seems to be capable of running any (or at least many)
distribution based on 2.4 kernel
 * It appears to run much faster than VMWare (maybe because it 

Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Tom Lane
Matthew Nuzum [EMAIL PROTECTED] writes:
 I'd like some input on a more demanding test though, because these tests
 run so quickly I can't help but be suspicious of their accuracy.

So increase the number of transactions tested (-t switch to pgbench).

Be aware also that you really want -s (database size scale factor) to
exceed -c (number of concurrent clients) for meaningful results.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL on VMWare vs Windows vs CoLinux

2004-06-02 Thread Tom Lane
Merlin Moncure [EMAIL PROTECTED] writes:
 ... Right now the win32 native does
 not sync() (but does fsync()).  So, the performance is somewhere between
 fsync = off and fsync = on (probably much closer to fsync = on).  It is
 reasonable to assume that the win32 port will outperform the unix
 versions at many tasks (at the expense of safety) until the new sync()
 code is put in.

... which was three days ago.  Why are we still speculating?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html