Re: [PERFORM] optimized counting of web statistics

2005-06-28 Thread Rudi Starcevic
Hi,

I do my batch processing daily using a python script I've written. I
found that trying to do it with pl/pgsql took more than 24 hours to
process 24 hours worth of logs. I then used C# and in memory hash
tables to drop the time to 2 hours, but I couldn't get mono installed
on some of my older servers. Python proved the fastest and I can
process 24 hours worth of logs in about 15 minutes. Common reports run
in  1 sec and custom reports run in  15 seconds (usually).
  


When you say you do your batch processing in a Python script do you mean
a you are using 'plpython' inside
PostgreSQL or using Python to execut select statements and crunch the
data 'outside' PostgreSQL?

Your reply is very interesting.

Thanks.
Regards,
Rudi.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] slow queries, possibly disk io

2005-05-29 Thread Rudi Starcevic

Hi,

I had some disk io issues recently with NFS, I found the command 'iostat 
-x 5' to be a great help when using Linux.


For example here is the output when I do a 10GB file transfer onto hdc
Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
hdc  0.00 875.95  0.00 29.660.00 7244.89 0.00  3622.44   
244.27 3.07  103.52   1.78   5.27


The last field show the disk is 5.27% busy.

I have seen this value at 100%, adding more server brought it under 100%.
It seems that if you hit 100% problems sort of cascade all over that 
place. For example Apache connections went right up and hit their max.


I am not sure how accurate the % is but it has work pretty well for me.

Perhaps use this command in another window with you run your SQL and see 
what it shows.


HTH.
Kind regards,
Rudi.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] NFS RAID 0/5

2005-03-28 Thread Rudi Starcevic
Hi,
I have a web app using PostgreSQL which indexes, searches and 
streams/downloads online movies.
I think I have a problem with NFS and RAID, it is not strictly 
PostgreSQL but closely linked and I know
many people on this list are experienced with this technology. Apologies 
if it is off topic.
Sometimes it is hard to not  be the Developer, Database and System 
Administrator all rolled into one.

I have a FreeBSD box with 1TB disk space RAID 5, 800GB is used.
This is mount via NFS onto Debian Linux running Apache/PHP/PostgreSQL.
I have a script which loads the directory structure etc. into the database.
As users surf the site web pages are generated by selecting from the 
database as per a standard web app.
The server is on a 100mbit link and has reached up to 80mbits/s in the 
past not using NFS or RAID.

The problem is when users start to stream/download the content the load 
averages go through the roof.
Sometimes as high as 300.

I can only see mostly Apache processes running, up to 2000 is the max. 
limit.
Even after 200 Apache connections the load avg. is over 10.

Could it be that using RAID 5 and NFS is causing the high load avg. on 
the Linux web servers?
I have a machine with RAID 0 but not ready for a day or so.

I will soon need to move the databases onto the NFS partition and am 
concerned it will increase my problem.

Any advise much appreciated.
Thank you.
Regards,
Rudi





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi,
I have a question on bulk checking, inserting into a table and
how best to use an index for performance.
The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.
So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.
The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:
a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'
for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?
So far I believe my only options are to use either and index
or sequential scan and see which is faster.
A minute for your thoughts and/or suggestions would be great.
Thanks.
Regards,
Rudi.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi Jim,
Thanks for your time.
 If the bulk load has the possibility of duplicating data
Yes, each row will require either:
a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE
I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.
Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.
While INSERTS are more prevelant perhaps a seq. scan is better.
I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.
Many thanks.
Jim J wrote:
If the bulk load has the possibility of duplicating data, then you need 
to change methods.  Try bulk loading into a temp table,  index it like 
the original, eliminate the dups and merge the tables.

It is also possible to do an insert from the temp table into the final 
table like:
insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
join original on temp.street=original.street where original.street is null)

Good Luck
Jim
Rudi Starcevic wrote:
Hi,
I have a question on bulk checking, inserting into a table and
how best to use an index for performance.
The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.
So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.
The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:
a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'
for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?
So far I believe my only options are to use either and index
or sequential scan and see which is faster.
A minute for your thoughts and/or suggestions would be great.
Thanks.
Regards,
Rudi.
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


--
Regards,
Rudi.
Internet Media Productions
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Rudi Starcevic
Hi,
In an attempt to throw the authorities off his trail, [EMAIL PROTECTED] (Rudi 
Starcevic) transmitted:
A minute for your thoughts and/or suggestions would be great.
Heh heh 
Could you give a more concrete example?  E.g. - the DDL for the
table(s), most particularly.
Thanks, I didn't add the DDL as I though it may make my question too
long. I have the DDL at another office so I'll pick up this email
thread when I get there in a couple hours.
At first guess, I think you're worrying about a nonissue.  Each insert
will lead to a _modification_ of the various indices, which costs
_something_, but which is WAY less expensive than creating each index
from scratch.
Very interesting, modification and creation.
I will post another email later today.
Many thanks.
--
Regards,
Rudi.
Internet Media Productions
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] SELECT's take a long time compared to other DBMS

2003-09-03 Thread Rudi Starcevic
Hi,

Yes I Analyze also, but there was no need to because it was a fresh brand
new database.
Hmm ... Sorry I'm not sure then. I only use Linux with PG.
Even though it's 'brand new' you still need to Analyze so that any 
Indexes etc. are built.

I'll keep an eye on this thread - Good luck.

Regards
Rudi.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] How to configure the postgresql.conf files

2003-07-13 Thread Rudi Starcevic
Hi Chris,

I suggest you read this tech. document:

http://www.varlena.com/GeneralBits/

I think you'll it's the best place to start.

Cheers
Rudi.

Chris_Wu wrote:

Hello all!
  I'm a new to Postgresql , I have never used it  before.
  I am having an issue with configure the postgresql.conf file.
  The machine itself is a
  CPU= 2.66GHz P4 w/
  Memory= 2G
  Maybe you can tell me how to configure these parameters.
  shared_buffers=
  max_fsm_relations=
  max_fsm_pages=
  max_locks_per_transaction=
  wal_buffers=
  sort_mem=
  vacuum_mem=
  wal_files=
  wal_sync_method=
  wal_debug =
  commit_delay =
  commit_siblings =
  checkpoint_segments =
  checkpoint_timeout =
  fsync = true
  enable_seqscan =
  enable_indexscan =
  enable_tidscan =
  enable_sort =
  enable_nestloop =
  enable_mergejoin =
  enable_hashjoin =
  ksqo =
  effective_cache_size =
  random_page_cost =
  cpu_tuple_cost =
  cpu_index_tuple_cost =
  cpu_operator_cost =

  Would you mind to send me a copy of examples .(postgresql.conf)
  Thanks
  Sincerely,

Chris.Wu



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

  




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to configure the postgresql.conf files

2003-07-13 Thread Rudi Starcevic
Chris,

Oops - it's changed !

Here's the link's you need:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Cheers
Rudi.

Chris_Wu wrote:

Hello all!
  I'm a new to Postgresql , I have never used it  before.
  I am having an issue with configure the postgresql.conf file.
  The machine itself is a
  CPU= 2.66GHz P4 w/
  Memory= 2G
  Maybe you can tell me how to configure these parameters.
  shared_buffers=
  max_fsm_relations=
  max_fsm_pages=
  max_locks_per_transaction=
  wal_buffers=
  sort_mem=
  vacuum_mem=
  wal_files=
  wal_sync_method=
  wal_debug =
  commit_delay =
  commit_siblings =
  checkpoint_segments =
  checkpoint_timeout =
  fsync = true
  enable_seqscan =
  enable_indexscan =
  enable_tidscan =
  enable_sort =
  enable_nestloop =
  enable_mergejoin =
  enable_hashjoin =
  ksqo =
  effective_cache_size =
  random_page_cost =
  cpu_tuple_cost =
  cpu_index_tuple_cost =
  cpu_operator_cost =

  Would you mind to send me a copy of examples .(postgresql.conf)
  Thanks
  Sincerely,

Chris.Wu



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

  




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings