Re: [PERFORM] How to unique-ify HUGE table?

2008-12-23 Thread D'Arcy J.M. Cain
On Tue, 23 Dec 2008 12:25:48 -0500
Kynn Jones kyn...@gmail.com wrote:
 Hi everyone!
 I have a very large 2-column table (about 500M records) from which I want to
 remove duplicate records.
 
 I have tried many approaches, but they all take forever.
 
 The table's definition consists of two short TEXT columns.  It is a
 temporary table generated from a query:
 
 CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ;
 
 Initially I tried
 
  CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ;
 
 but after waiting for nearly an hour I aborted the query, and repeated it

Do you have an index on x and y?  Also, does this work better?

CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y;

What does ANALYZE EXPLAIN have to say?

-- 
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Please ignore ...

2008-05-01 Thread D'Arcy J.M. Cain
On Thu, 1 May 2008 02:55:10 -0400 (EDT)
Greg Smith [EMAIL PROTECTED] wrote:
 On Thu, 1 May 2008, D'Arcy J.M. Cain wrote:
 
  Whenever I see one of those I simply blackhole the server sending them.
 
 Ah, the ever popular vigilante spam method.  What if the message is coming 
 from, say, gmail.com, and it's getting routed so that you're not sure 
 which account is originating it?  Do you blackhole everybody on *that* 
 server just because there's one idiot?

Well, I actually do block gmail groups on another list that is
gatewayed to a newsgroup due to the volume of spam that originates from
there but in this case my experience has been that it is done by a
service.  For example, I reject all email from spamarrest.com.  There
is nothing I want to see from them.

 This is the same problem on a smaller scale.  It's not clear which account 
 is reponsible, and I believe I saw that there are other people using the 
 same ISP who also subscribe to the list.  That's why Marc is testing who 
 the guilty party is rather than unsubscribing everyone there.

Of course.  If someone is running it on a server independent of the ISP
that's a different story.  However, it is pretty hard to run that code
on most ISPs without the cooperation of the ISP.  That's why there are
companies like SpamArrest.  People who run their own server and are in
a position to do this themself tend to also be smart enough to
understand why it is a bad idea.

On the other hand, this type of thing is no different than spam and in
this day and age every ISP, no matter how big, has a responsibility to
deal with spammers on their own system and if they don't they deserve
to be blocked just like any other spam-friendly system.

The fact that Marc has to run this test and does not immediately know
who the guilty party is suggests to me that they are using a service.  I
never saw the offending message myself so perhaps it is coming from
SpamArrest and I just rejected the email on my SMTP server.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Please ignore ...

2008-04-30 Thread D'Arcy J.M. Cain
On Thu, 01 May 2008 01:16:00 -0300
Marc G. Fournier [EMAIL PROTECTED] wrote:
 Someone on this list has one of those 'confirm your email' filters on their 

Argh!  Why do people think that it is OK to make their spam problem
everyone else's problem?  Whenever I see one of those I simply
blackhole the server sending them.

People, please, I know the spam you get isn't your fault but it isn't my
fault either.  You clean up your mailbox and I'll clean up mine.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 15:07:38 -0500
Tom Lane [EMAIL PROTECTED] wrote:
 Michael Stone [EMAIL PROTECTED] writes:
  Well, a native IPv6 type would also be nice; inet is ridiculously 
  bloated for both IPv4 *and* IPv6. 
 
 Nonsense.  3 bytes overhead on a 16-byte address is not ridiculously
 bloated, especially if you want a netmask with it.

Besides, there are many cases where you want to track both ipv4 and
ipv6 for the same purpose and requiring two different fields would be
less than ideal.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to index IP data?

2008-01-11 Thread D'Arcy J.M. Cain
On Fri, 11 Jan 2008 21:56:38 -0500 (EST)
Greg Smith [EMAIL PROTECTED] wrote:
 On Fri, 11 Jan 2008, Steve Atkins wrote:
 
  You may well need netmasks to configure your interface, but there's 
  absolutely no need for them to identify an IP endpoint, which is all you 
  need to identify the destination the packet is going to, and that is the 
  most common use of IP addresses.
 
 Technically you can't ever send a packet unless you know both the endpoint 
 and your local netmask.  As the sender, you're obligated to determine if 
 the destination is on your local LAN (in which case you send it there) or 
 if it goes to the gateway.  That's similar to a routing decision, but it's 
 not quite--if you don't have to look in a routing table, it's not actually 
 part of routing.

Not sure what your point is here.  Sure, you need the netmask but not
of every IP address you send to, only for the IP/network that you are
on.  That's a grand total of one netmask per interface that you need to
know.  And you don't store it in your database.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Apache2 PostgreSQL http authentication

2007-10-07 Thread D'Arcy J.M. Cain
On Sun, 7 Oct 2007 09:14:43 -0400
Jeffrey Brower [EMAIL PROTECTED] wrote:
 As I say, from a performance point of view, I would really like to know if
 there is anything I can do to make sure that postgres is performing as
 quickly as possible under apache2 so that my http authentication is not
 impacted too significantly.

How often does the user information change?  Can you simply create
standard Apache password files from cron during non-busy hours?
Sometimes the lower tech solution works best.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Difference between Vacuum and Vacuum full

2007-10-02 Thread D'Arcy J.M. Cain
On Tue, 2 Oct 2007 21:45:37 -0400
Radhika S [EMAIL PROTECTED] wrote:
 But this has bought me to the question of what exactly is the
 difference between vacuum and vacuum full. If both give back free
 space to the disk, then why have vacuum full.

Not quite.  VACUUM FULL returns space to the system.  VACUUM only
frees the space for use by the database.  In most cases a simple VACUUM
is all you need since you are going to just be asking for the space
back anyway eventually as your database grows.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL Configuration Tool for Dummies

2007-06-19 Thread D'Arcy J.M. Cain
On Tue, 19 Jun 2007 12:58:26 -0400
Francisco Reyes [EMAIL PROTECTED] wrote:
 Campbell, Lance writes:
  3) I suggested JavaScript because most people that get started with 
  PostgreSQL will go to the web in order to find out about issues relating 
 
 Why not c?

Why not whatever and install it on www.PostgreSQL.org?  Is there any
reason that this tool would need to be run on every installation.  Run
it on the site and it can always be up to date and can be written in
whatever language is easiest to maintain on the mother system.

I would also like to make a pitch for a JavaScript-free tool.  Just
collect all the pertinent information, work it out and display the
results in a second page.  Some people just don't like JavaScript and
turn it off even if we can run it in our browser.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to avoid vacuuming a huge logging table

2007-02-21 Thread D'Arcy J.M. Cain
On Wed, 21 Feb 2007 21:58:33 -
Greg Sabino Mullane [EMAIL PROTECTED] wrote:
 SELECT 'vacuum verbose analyze 
 '||quote_ident(nspname)||'.'||quote_ident(relname)||';' 
   FROM pg_class c, pg_namespace n
   WHERE relkind = 'r'
   AND relnamespace = n.oid
   AND nspname = 'novac'
   ORDER BY 1;

I assume you meant AND nspname != 'novac'

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Unsubscribe

2006-10-04 Thread D'Arcy J.M. Cain
On Wed, 04 Oct 2006 09:00:45 -0700
Joshua D. Drake [EMAIL PROTECTED] wrote:
 So if you want to shut me up, lets get the footer added.

Of course, that doesn't fix the problem 100%.  I am on lists that do
show that info in the footer and people still send unsubscribe messages
to the list.

By the way, mailman has a nice feature that sends messages that look
like admin requests (such as unsubscribe) to the admin.  That cuts down
on the noise quite a bit.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Commit slower on faster PC

2006-07-12 Thread D'Arcy J.M. Cain
On Wed, 12 Jul 2006 10:16:40 -0600
Koth, Christian (DWBI) [EMAIL PROTECTED] wrote:
 I have noticed a strange performance behaviour using a commit statement on 
 two different machines. On one of the machines the commit is many times 
 faster than on the other machine which has faster hardware. Server and client 
 are running always on the same machine.
 
 Server version (same on both machines): PostgreSQL 8.1.3. (same binaries as 
 well)
 
 PC1:
 
 Pentium 4 (2.8 GHz)
 1GB RAM
 IDE-HDD (approx. 50 MB/s rw), fs: ext3
 Mandrake Linux: Kernel 2.4.22
 
 
 PC2:
 
 Pentium 4 (3.0 GHz)
 2GB RAM
 SCSI-HDD (approx. 65 MB/s rw), fs: ext3
 Mandrake Linux: Kernel 2.4.32
 
 
 Both installations of the database have the same configuration, different 
 from default are only the following settings on both machines:
 
 shared_buffers = 2
 listen_addresses = '*'
 max_stack_depth = 4096
 
 
 pgbench gives me the following results:
 PC1:
 
 transaction type: TPC-B (sort of)
 scaling factor: 1
 number of clients: 1
 number of transactions per client: 10
 number of transactions actually processed: 10/10
 tps = 269.905533 (including connections establishing)
 tps = 293.625393 (excluding connections establishing)
 
 PC2:
 
 transaction type: TPC-B (sort of)
 scaling factor: 1
 number of clients: 1
 number of transactions per client: 10
 number of transactions actually processed: 10/10
 tps = 46.061935 (including connections establishing)
 tps = 46.519634 (excluding connections establishing)

I'm not sure 10 transactions is enough of a test.  You could just be
seeing the result of your IDE drive lying to you about actually writing
your data.  There may be other considerations but I would start with
checking with 10,000 or 100,000 transactions to overcome the driver
buffering.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Postgres fsync off (not needed) with NetApp

2006-06-14 Thread D'Arcy J.M. Cain
On Wed, 14 Jun 2006 14:48:04 -0700
Dan Gorman [EMAIL PROTECTED] wrote:
 If I have a pg database attached to a powervault (PV) with just an  
 off-the-shelf SCSI card I generally want fsync on to prevent data  
 corruption in case the PV should loose power.
 However, if I have it attached to a NetApp that ensures data writes  
 to via the NVRAM can I safely turn fsync off to gain additional  
 performance?

I wouldn't.  Remember, you still have to get the data to the NetApp.
You don't want things sitting in the computer's buffers when it's power
goes down.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?

2006-05-30 Thread D'Arcy J.M. Cain
On Tue, 30 May 2006 17:54:00 -0500
Dave Dutcher [EMAIL PROTECTED] wrote:
 What I do when I'm feeling lazy is execute a delete statement and then
 an insert.  I only do it when I'm inserting/updating a very small number
 of rows, so I've never worried if its optimal for performance.  Besides
 I've heard that an update in postgres is similar in performance to a
 delete/insert.

Well, they are basically the same operation in PostgreSQL.  An update
adds a row to the end and marks the old one dead.  A delete/insert
marks the row dead and adds one at the end.  There may be some
optimization if the engine does both in one operation.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-07 Thread D'Arcy J.M. Cain
On Fri, 7 Apr 2006 20:59:19 +0100
Gavin Hamill [EMAIL PROTECTED] wrote:
  I'd put the old 4 way Xeon back in production and do some serious
  testing of this pSeries machine.  IBM should be willing to help you, I
  hope.
 
 They probably would if this had been bought new - as it is, we have
 rented the machine for a month from a 2nd-user dealer to see if it's
 capable of taking the load. I'm now glad we did this. 

We also had problems with a high end AIX system and we got no help from
IBM.  They expected you to put Oracle on and if you used anything else
you were on your own.  We had exactly the same issue.  We expected to
get an order of magnitude improvement and instead the app bogged down.
It also got worse over time.  We had to reboot every night to get
anything out of it.  Needless to say, they got their system back.

 
  My guess is that this is an OS issue.  Maybe there are AIX tweaks that
  will get it up to the same or higher level of performance as your four
  way xeon.  Maybe there aren't.
 
 The pSeries isn't much older than our Xeon machine, and I expected the
 performance level to be exemplary out of the box.. we've enabled the
 64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc
 flags as reccommended by Senica Cunningham on this very list..

That's Seneca.

We found that our money was better spent on multiple servers running
NetBSD with a home grown multi-master replication system.  Need more
power?  Just add more servers.

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql

2005-01-20 Thread D'Arcy J.M. Cain
On Mon, 17 Jan 2005 18:36:35 -0500
Dave Cramer [EMAIL PROTECTED] wrote:
 The *only* way to avoid this is to go to a 64 bit processor (opteron) 
 and then
 for greater performance use a linux distribution compiled for a 64bit 
 processor.

Or NetBSD (http://www.NetBSD.org/) which has been 64 bit clean since
1995 and has had the Opteron port integrated in its main tree (not as
patches to or a separate tree) since April 2003.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED]
http://www.NetBSD.org/

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?

2005-01-11 Thread D'Arcy J.M. Cain
On 11 Jan 2005 04:25:04 GMT
Christopher Browne [EMAIL PROTECTED] wrote:
 Xeon sux pretty bad...
 
  Linux or FreeBSD or _?_
 
 The killer question won't be of what OS is faster, but rather of
 what OS better supports the fastest hardware you can get your hands
 on.  

Well, if multiple OSs work on the hardware you like, there is nothing
wrong with selecting the fastest among them of course.  As for Linux or
FreeBSD, you may also want to consider NetBSD.  It seems that with the
latest releases of both, NetBSD outperforms FreeBSD in at least one
benchmark.

http://www.feyrer.de/NetBSD/gmcgarry/

The benchmarks were run on a single processor but you can always run the
benchmark on whatever hardware you select - assuming that it runs both.

Isn't there also a PostgreSQL specific benchmark available?

-- 
D'Arcy J.M. Cain darcy@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Thanks Chariot Solutions

2004-11-01 Thread D'Arcy J.M. Cain
On Sun, 31 Oct 2004 13:38:55 -0500 (EST)
[EMAIL PROTECTED] wrote:
 
 Many thanks to Chariot Solutions, http://chariotsolutions.com, for
 hosting Bruce Momjian giving one of his PostgreSQL seminars outside of
 Philadelphia, PA yesterday. There were about sixty folks there, one
 person driving from Toronto and another coming from California (!).

Seconded.  It was definitely worth the drive from Toronto.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30

2004-10-27 Thread D'Arcy J.M. Cain
On Wed, 13 Oct 2004 12:21:27 -0400 (EDT)
Aaron Mulder [EMAIL PROTECTED] wrote:
 All,
   My company (Chariot Solutions) is sponsoring a day of free
 PostgreSQL training by Bruce Momjian (one of the core PostgreSQL
 developers).  The day is split into 2 sessions (plus a QA session):

Is there anyone else from the Toronto area going down that would like to
share the driving?  I am planning to drive down Friday morning and drive
back Sunday.  I'm not looking for expense sharing.  I just don't want to
drive for eight hours straight.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED] |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Postgres on Netapp

2004-01-17 Thread D'Arcy J.M. Cain
On January 16, 2004 11:53 am, Larry Rosenman wrote:
 --On Monday, January 12, 2004 13:45:45 -0800 Shankar K [EMAIL PROTECTED]
 wrote:
  We are considering to use NetApp filer for a highly
  busy 24*7 postgres database and the reason we chose
 I run a (not very busy) PG cluster on a NetAPP.

I run a very busy PG installation on one.

 It seems to do just fine.

Ditto.

 The issue is the speed of the network connection.  In my case it's only
 FastEthernet (100BaseTX).  If it's very busy, you may need to look
 at GigE.

With the price of GigE adapters I wouldn't consider anything else.

I have a huge database that takes about an hour to copy.  The netApp snapshot 
feature is very nice because I can get a moment in time image of the 
database.  Even though I can't run from the snapshot because it is read only 
(*) and PG needs to write to files just to open the database, I can copy it 
and get a runnable version of the DB.  If I copy directly from the original I 
can get many changes while copying and wind up with a copy that will not run.

(*): It would be nice if PG had a flag that allowed a database to be opened in 
read only mode without touching anything in the directory.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-06 Thread D'Arcy J.M. Cain
On January 6, 2004 01:42 am, Shridhar Daithankar wrote:
 On Tuesday 06 January 2004 01:22, Rod Taylor wrote:
  Anyway, with Rules you can force this:
 
  ON INSERT UPDATE counter SET tablecount = tablecount + 1;
 
  ON DELETE UPDATE counter SET tablecount = tablecount - 1;

 That would generate lot of dead tuples in counter table. How about

 select relpages,reltuples from pg_class where relname=tablename;

 Assuming the stats are recent enough, it would be much faster and
 accurate..

Well, I did this:

cert=# select relpages,reltuples from pg_class where relname= 'certificate';
 relpages |  reltuples
--+-
   399070 | 2.48587e+07
(1 row)

Casting seemed to help:

cert=# select relpages,reltuples::bigint from pg_class where relname= 
'certificate';
 relpages | reltuples
--+---
   399070 |  24858736
(1 row)

But:

cert=# select count(*) from certificate;
[*Crunch* *Crunch* *Crunch*]
  count
--
 19684668
(1 row)

Am I missing something?  Max certificate_id is 20569544 btw.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-06 Thread D'Arcy J.M. Cain
On January 6, 2004 07:20 am, Shridhar Daithankar wrote:
 On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote:
  On January 6, 2004 01:42 am, Shridhar Daithankar wrote:
  cert=# select relpages,reltuples::bigint from pg_class where relname=
  'certificate';
   relpages | reltuples
  --+---
 399070 |  24858736
  (1 row)
 
  But:
 
  cert=# select count(*) from certificate;
  [*Crunch* *Crunch* *Crunch*]
count
  --
   19684668
  (1 row)
 
  Am I missing something?  Max certificate_id is 20569544 btw.

 Do 'vacuum analyze certificate' and try..:-)

Kind of invalidates the part about being accurate then, don't it?  Besides, I 
vacuum that table every day (*) and we have reorganized the schema so that we 
never update it except in exceptional cases.  I would be less surprised if 
the result was less than the real count since we only insert into that table.

In any case, if I have to vacuum a 20,000,000 row table to get an accurate 
count then I may as well run count(*) on it.

(*): Actually I only analyze but I understand that that should be sufficient.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] What's faster?

2003-12-27 Thread D'Arcy J.M. Cain
On December 26, 2003 07:11 pm, Keith Bottner wrote:
 I have a database where the vast majority of information that is related to
 a customer never changes. However, there is a single field (i.e. balance)
 that changes potentially tens to hundreds of times per day per customer
 (customers ranging in the 1000s to 1s). This information is not
 indexed. Because Postgres requires VACUUM ANALYZE more frequently on
 updated tables, should I break this single field out into its own table,
 and if so what kind of a speed up can I expect to achieve. I would be
 appreciative of any guidance offered.

We went through this recently.  One thing we found that may apply to you is 
how many fields in the client record have a foreign key constraint.  We find 
that tables with lots of FKeys are a lot more intensive on updates.  In our 
case it was another table, think of it as an order or header table with a 
balance, that has over 10 million records.  Sometimes we have 200,000 
transactions a day where we have to check the balance.  We eventually moved 
every field that could possibly be updated on a regular basis out to separate 
tables.  The improvement was dramatic.

-- 
D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net   |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster