Re: [PERFORM] How to unique-ify HUGE table?
On Tue, 23 Dec 2008 12:25:48 -0500 Kynn Jones kyn...@gmail.com wrote: Hi everyone! I have a very large 2-column table (about 500M records) from which I want to remove duplicate records. I have tried many approaches, but they all take forever. The table's definition consists of two short TEXT columns. It is a temporary table generated from a query: CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... ; Initially I tried CREATE TEMP TABLE huge_table AS SELECT DISTINCT x, y FROM ... ; but after waiting for nearly an hour I aborted the query, and repeated it Do you have an index on x and y? Also, does this work better? CREATE TEMP TABLE huge_table AS SELECT x, y FROM ... GROUP BY x, y; What does ANALYZE EXPLAIN have to say? -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Please ignore ...
On Thu, 1 May 2008 02:55:10 -0400 (EDT) Greg Smith [EMAIL PROTECTED] wrote: On Thu, 1 May 2008, D'Arcy J.M. Cain wrote: Whenever I see one of those I simply blackhole the server sending them. Ah, the ever popular vigilante spam method. What if the message is coming from, say, gmail.com, and it's getting routed so that you're not sure which account is originating it? Do you blackhole everybody on *that* server just because there's one idiot? Well, I actually do block gmail groups on another list that is gatewayed to a newsgroup due to the volume of spam that originates from there but in this case my experience has been that it is done by a service. For example, I reject all email from spamarrest.com. There is nothing I want to see from them. This is the same problem on a smaller scale. It's not clear which account is reponsible, and I believe I saw that there are other people using the same ISP who also subscribe to the list. That's why Marc is testing who the guilty party is rather than unsubscribing everyone there. Of course. If someone is running it on a server independent of the ISP that's a different story. However, it is pretty hard to run that code on most ISPs without the cooperation of the ISP. That's why there are companies like SpamArrest. People who run their own server and are in a position to do this themself tend to also be smart enough to understand why it is a bad idea. On the other hand, this type of thing is no different than spam and in this day and age every ISP, no matter how big, has a responsibility to deal with spammers on their own system and if they don't they deserve to be blocked just like any other spam-friendly system. The fact that Marc has to run this test and does not immediately know who the guilty party is suggests to me that they are using a service. I never saw the offending message myself so perhaps it is coming from SpamArrest and I just rejected the email on my SMTP server. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Please ignore ...
On Thu, 01 May 2008 01:16:00 -0300 Marc G. Fournier [EMAIL PROTECTED] wrote: Someone on this list has one of those 'confirm your email' filters on their Argh! Why do people think that it is OK to make their spam problem everyone else's problem? Whenever I see one of those I simply blackhole the server sending them. People, please, I know the spam you get isn't your fault but it isn't my fault either. You clean up your mailbox and I'll clean up mine. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best way to index IP data?
On Fri, 11 Jan 2008 15:07:38 -0500 Tom Lane [EMAIL PROTECTED] wrote: Michael Stone [EMAIL PROTECTED] writes: Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not ridiculously bloated, especially if you want a netmask with it. Besides, there are many cases where you want to track both ipv4 and ipv6 for the same purpose and requiring two different fields would be less than ideal. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Best way to index IP data?
On Fri, 11 Jan 2008 21:56:38 -0500 (EST) Greg Smith [EMAIL PROTECTED] wrote: On Fri, 11 Jan 2008, Steve Atkins wrote: You may well need netmasks to configure your interface, but there's absolutely no need for them to identify an IP endpoint, which is all you need to identify the destination the packet is going to, and that is the most common use of IP addresses. Technically you can't ever send a packet unless you know both the endpoint and your local netmask. As the sender, you're obligated to determine if the destination is on your local LAN (in which case you send it there) or if it goes to the gateway. That's similar to a routing decision, but it's not quite--if you don't have to look in a routing table, it's not actually part of routing. Not sure what your point is here. Sure, you need the netmask but not of every IP address you send to, only for the IP/network that you are on. That's a grand total of one netmask per interface that you need to know. And you don't store it in your database. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Apache2 PostgreSQL http authentication
On Sun, 7 Oct 2007 09:14:43 -0400 Jeffrey Brower [EMAIL PROTECTED] wrote: As I say, from a performance point of view, I would really like to know if there is anything I can do to make sure that postgres is performing as quickly as possible under apache2 so that my http authentication is not impacted too significantly. How often does the user information change? Can you simply create standard Apache password files from cron during non-busy hours? Sometimes the lower tech solution works best. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Difference between Vacuum and Vacuum full
On Tue, 2 Oct 2007 21:45:37 -0400 Radhika S [EMAIL PROTECTED] wrote: But this has bought me to the question of what exactly is the difference between vacuum and vacuum full. If both give back free space to the disk, then why have vacuum full. Not quite. VACUUM FULL returns space to the system. VACUUM only frees the space for use by the database. In most cases a simple VACUUM is all you need since you are going to just be asking for the space back anyway eventually as your database grows. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL Configuration Tool for Dummies
On Tue, 19 Jun 2007 12:58:26 -0400 Francisco Reyes [EMAIL PROTECTED] wrote: Campbell, Lance writes: 3) I suggested JavaScript because most people that get started with PostgreSQL will go to the web in order to find out about issues relating Why not c? Why not whatever and install it on www.PostgreSQL.org? Is there any reason that this tool would need to be run on every installation. Run it on the site and it can always be up to date and can be written in whatever language is easiest to maintain on the mother system. I would also like to make a pitch for a JavaScript-free tool. Just collect all the pertinent information, work it out and display the results in a second page. Some people just don't like JavaScript and turn it off even if we can run it in our browser. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How to avoid vacuuming a huge logging table
On Wed, 21 Feb 2007 21:58:33 - Greg Sabino Mullane [EMAIL PROTECTED] wrote: SELECT 'vacuum verbose analyze '||quote_ident(nspname)||'.'||quote_ident(relname)||';' FROM pg_class c, pg_namespace n WHERE relkind = 'r' AND relnamespace = n.oid AND nspname = 'novac' ORDER BY 1; I assume you meant AND nspname != 'novac' -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Unsubscribe
On Wed, 04 Oct 2006 09:00:45 -0700 Joshua D. Drake [EMAIL PROTECTED] wrote: So if you want to shut me up, lets get the footer added. Of course, that doesn't fix the problem 100%. I am on lists that do show that info in the footer and people still send unsubscribe messages to the list. By the way, mailman has a nice feature that sends messages that look like admin requests (such as unsubscribe) to the admin. That cuts down on the noise quite a bit. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Commit slower on faster PC
On Wed, 12 Jul 2006 10:16:40 -0600 Koth, Christian (DWBI) [EMAIL PROTECTED] wrote: I have noticed a strange performance behaviour using a commit statement on two different machines. On one of the machines the commit is many times faster than on the other machine which has faster hardware. Server and client are running always on the same machine. Server version (same on both machines): PostgreSQL 8.1.3. (same binaries as well) PC1: Pentium 4 (2.8 GHz) 1GB RAM IDE-HDD (approx. 50 MB/s rw), fs: ext3 Mandrake Linux: Kernel 2.4.22 PC2: Pentium 4 (3.0 GHz) 2GB RAM SCSI-HDD (approx. 65 MB/s rw), fs: ext3 Mandrake Linux: Kernel 2.4.32 Both installations of the database have the same configuration, different from default are only the following settings on both machines: shared_buffers = 2 listen_addresses = '*' max_stack_depth = 4096 pgbench gives me the following results: PC1: transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 10 number of transactions actually processed: 10/10 tps = 269.905533 (including connections establishing) tps = 293.625393 (excluding connections establishing) PC2: transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 1 number of transactions per client: 10 number of transactions actually processed: 10/10 tps = 46.061935 (including connections establishing) tps = 46.519634 (excluding connections establishing) I'm not sure 10 transactions is enough of a test. You could just be seeing the result of your IDE drive lying to you about actually writing your data. There may be other considerations but I would start with checking with 10,000 or 100,000 transactions to overcome the driver buffering. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Postgres fsync off (not needed) with NetApp
On Wed, 14 Jun 2006 14:48:04 -0700 Dan Gorman [EMAIL PROTECTED] wrote: If I have a pg database attached to a powervault (PV) with just an off-the-shelf SCSI card I generally want fsync on to prevent data corruption in case the PV should loose power. However, if I have it attached to a NetApp that ensures data writes to via the NVRAM can I safely turn fsync off to gain additional performance? I wouldn't. Remember, you still have to get the data to the NetApp. You don't want things sitting in the computer's buffers when it's power goes down. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] INSERT OU UPDATE WITHOUT SELECT?
On Tue, 30 May 2006 17:54:00 -0500 Dave Dutcher [EMAIL PROTECTED] wrote: What I do when I'm feeling lazy is execute a delete statement and then an insert. I only do it when I'm inserting/updating a very small number of rows, so I've never worried if its optimal for performance. Besides I've heard that an update in postgres is similar in performance to a delete/insert. Well, they are basically the same operation in PostgreSQL. An update adds a row to the end and marks the old one dead. A delete/insert marks the row dead and adds one at the end. There may be some optimization if the engine does both in one operation. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.
On Fri, 7 Apr 2006 20:59:19 +0100 Gavin Hamill [EMAIL PROTECTED] wrote: I'd put the old 4 way Xeon back in production and do some serious testing of this pSeries machine. IBM should be willing to help you, I hope. They probably would if this had been bought new - as it is, we have rented the machine for a month from a 2nd-user dealer to see if it's capable of taking the load. I'm now glad we did this. We also had problems with a high end AIX system and we got no help from IBM. They expected you to put Oracle on and if you used anything else you were on your own. We had exactly the same issue. We expected to get an order of magnitude improvement and instead the app bogged down. It also got worse over time. We had to reboot every night to get anything out of it. Needless to say, they got their system back. My guess is that this is an OS issue. Maybe there are AIX tweaks that will get it up to the same or higher level of performance as your four way xeon. Maybe there aren't. The pSeries isn't much older than our Xeon machine, and I expected the performance level to be exemplary out of the box.. we've enabled the 64-bit kernel+userspace, and compiled pg for 64-bitness with the gcc flags as reccommended by Senica Cunningham on this very list.. That's Seneca. We found that our money was better spent on multiple servers running NetBSD with a home grown multi-master replication system. Need more power? Just add more servers. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Increasing RAM for more than 4 Gb. using postgresql
On Mon, 17 Jan 2005 18:36:35 -0500 Dave Cramer [EMAIL PROTECTED] wrote: The *only* way to avoid this is to go to a 64 bit processor (opteron) and then for greater performance use a linux distribution compiled for a 64bit processor. Or NetBSD (http://www.NetBSD.org/) which has been 64 bit clean since 1995 and has had the Opteron port integrated in its main tree (not as patches to or a separate tree) since April 2003. -- D'Arcy J.M. Cain [EMAIL PROTECTED] http://www.NetBSD.org/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] which dual-CPU hardware/OS is fastest for PostgreSQL?
On 11 Jan 2005 04:25:04 GMT Christopher Browne [EMAIL PROTECTED] wrote: Xeon sux pretty bad... Linux or FreeBSD or _?_ The killer question won't be of what OS is faster, but rather of what OS better supports the fastest hardware you can get your hands on. Well, if multiple OSs work on the hardware you like, there is nothing wrong with selecting the fastest among them of course. As for Linux or FreeBSD, you may also want to consider NetBSD. It seems that with the latest releases of both, NetBSD outperforms FreeBSD in at least one benchmark. http://www.feyrer.de/NetBSD/gmcgarry/ The benchmarks were run on a single processor but you can always run the benchmark on whatever hardware you select - assuming that it runs both. Isn't there also a PostgreSQL specific benchmark available? -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Thanks Chariot Solutions
On Sun, 31 Oct 2004 13:38:55 -0500 (EST) [EMAIL PROTECTED] wrote: Many thanks to Chariot Solutions, http://chariotsolutions.com, for hosting Bruce Momjian giving one of his PostgreSQL seminars outside of Philadelphia, PA yesterday. There were about sixty folks there, one person driving from Toronto and another coming from California (!). Seconded. It was definitely worth the drive from Toronto. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30
On Wed, 13 Oct 2004 12:21:27 -0400 (EDT) Aaron Mulder [EMAIL PROTECTED] wrote: All, My company (Chariot Solutions) is sponsoring a day of free PostgreSQL training by Bruce Momjian (one of the core PostgreSQL developers). The day is split into 2 sessions (plus a QA session): Is there anyone else from the Toronto area going down that would like to share the driving? I am planning to drive down Friday morning and drive back Sunday. I'm not looking for expense sharing. I just don't want to drive for eight hours straight. -- D'Arcy J.M. Cain [EMAIL PROTECTED] | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgres on Netapp
On January 16, 2004 11:53 am, Larry Rosenman wrote: --On Monday, January 12, 2004 13:45:45 -0800 Shankar K [EMAIL PROTECTED] wrote: We are considering to use NetApp filer for a highly busy 24*7 postgres database and the reason we chose I run a (not very busy) PG cluster on a NetAPP. I run a very busy PG installation on one. It seems to do just fine. Ditto. The issue is the speed of the network connection. In my case it's only FastEthernet (100BaseTX). If it's very busy, you may need to look at GigE. With the price of GigE adapters I wouldn't consider anything else. I have a huge database that takes about an hour to copy. The netApp snapshot feature is very nice because I can get a moment in time image of the database. Even though I can't run from the snapshot because it is read only (*) and PG needs to write to files just to open the database, I can copy it and get a runnable version of the DB. If I copy directly from the original I can get many changes while copying and wind up with a copy that will not run. (*): It would be nice if PG had a flag that allowed a database to be opened in read only mode without touching anything in the directory. -- D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Select max(foo) and select count(*) optimization
On January 6, 2004 01:42 am, Shridhar Daithankar wrote: On Tuesday 06 January 2004 01:22, Rod Taylor wrote: Anyway, with Rules you can force this: ON INSERT UPDATE counter SET tablecount = tablecount + 1; ON DELETE UPDATE counter SET tablecount = tablecount - 1; That would generate lot of dead tuples in counter table. How about select relpages,reltuples from pg_class where relname=tablename; Assuming the stats are recent enough, it would be much faster and accurate.. Well, I did this: cert=# select relpages,reltuples from pg_class where relname= 'certificate'; relpages | reltuples --+- 399070 | 2.48587e+07 (1 row) Casting seemed to help: cert=# select relpages,reltuples::bigint from pg_class where relname= 'certificate'; relpages | reltuples --+--- 399070 | 24858736 (1 row) But: cert=# select count(*) from certificate; [*Crunch* *Crunch* *Crunch*] count -- 19684668 (1 row) Am I missing something? Max certificate_id is 20569544 btw. -- D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Select max(foo) and select count(*) optimization
On January 6, 2004 07:20 am, Shridhar Daithankar wrote: On Tuesday 06 January 2004 17:48, D'Arcy J.M. Cain wrote: On January 6, 2004 01:42 am, Shridhar Daithankar wrote: cert=# select relpages,reltuples::bigint from pg_class where relname= 'certificate'; relpages | reltuples --+--- 399070 | 24858736 (1 row) But: cert=# select count(*) from certificate; [*Crunch* *Crunch* *Crunch*] count -- 19684668 (1 row) Am I missing something? Max certificate_id is 20569544 btw. Do 'vacuum analyze certificate' and try..:-) Kind of invalidates the part about being accurate then, don't it? Besides, I vacuum that table every day (*) and we have reorganized the schema so that we never update it except in exceptional cases. I would be less surprised if the result was less than the real count since we only insert into that table. In any case, if I have to vacuum a 20,000,000 row table to get an accurate count then I may as well run count(*) on it. (*): Actually I only analyze but I understand that that should be sufficient. -- D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] What's faster?
On December 26, 2003 07:11 pm, Keith Bottner wrote: I have a database where the vast majority of information that is related to a customer never changes. However, there is a single field (i.e. balance) that changes potentially tens to hundreds of times per day per customer (customers ranging in the 1000s to 1s). This information is not indexed. Because Postgres requires VACUUM ANALYZE more frequently on updated tables, should I break this single field out into its own table, and if so what kind of a speed up can I expect to achieve. I would be appreciative of any guidance offered. We went through this recently. One thing we found that may apply to you is how many fields in the client record have a foreign key constraint. We find that tables with lots of FKeys are a lot more intensive on updates. In our case it was another table, think of it as an order or header table with a balance, that has over 10 million records. Sometimes we have 200,000 transactions a day where we have to check the balance. We eventually moved every field that could possibly be updated on a regular basis out to separate tables. The improvement was dramatic. -- D'Arcy J.M. Cain [EMAIL PROTECTED]|vex}.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster