[PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread kevin kempter
Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes 10 hours and eventually runs out of disk space on a

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread kevin kempter
Also, I'm running version 8.3 on a centOS box with 2 dual core CPU's and 32Gig of ram On May 16, 2008, at 12:58 AM, kevin kempter wrote: Sorry I goofed on the query text Here's the correct query: select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id,

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Claus Guttesen
I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes 10 hours and eventually runs out of disk space on a 1.4TB file

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread kevin kempter
Sorry I goofed on the query text Here's the correct query: select f14.xpublisher_dim_id, f14.xtime_dim_id, f14.xlocation_dim_id, f14.xreferrer_dim_id, f14.xsite_dim_id, f14.xsystem_cfg_dim_id, f14.xaffiliate_dim_id, f14.customer_id, f14.pf_dts_id, f14.episode_id, f14.sessionid,

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Simon Riggs
On Fri, 2008-05-16 at 00:31 -0600, kevin kempter wrote: I'm running the join shown below and it takes 10 hours and eventually runs out of disk space on a 1.4TB file system Well, running in 10 hours doesn't mean there's a software problem, nor does running out of disk space. Please crunch

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Richard Huxton
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) I'm running the join shown below and it takes 10 hours and eventually runs out of

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread kevin kempter
I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically expand the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not a cartesion

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Richard Huxton
kevin kempter wrote: I'm expecting 9,961,914 rows returned. Each row in the big table should have a corresponding key in the smaller tale, I want to basically expand the big table column list by one, via adding the appropriate key from the smaller table for each row in the big table. It's not

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread Richard Huxton
kevin kempter wrote: Hi List; I have a table with 9,961,914 rows in it (see the describe of bigtab_stats_fact_tmp14 below) I also have a table with 7,785 rows in it (see the describe of xsegment_dim below) Something else is puzzling me with this - you're joining over four fields. from

Re: [PERFORM] Join runs for 10 hours and then fills up 1.3TB of disk space

2008-05-16 Thread kevin kempter
On further investigation it turns out that I/we have a serious data issue in that my small table is full of 'UNKNOWN' tags so my query cannot associate the data correctly - thus I will end up with 2+ billion rows. Thanks everyone for your help On May 16, 2008, at 1:38 AM, Simon Riggs

Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-16 Thread Scott Marlowe
On Thu, May 15, 2008 at 9:38 AM, Joshua D. Drake [EMAIL PROTECTED] wrote: Guillaume Cottenceau wrote: Matthew Wakeling matthew 'at' flymine.org writes: It is still relevant, as with 5% margin, you can afford changing that to 0% with tune2fs, just the time for you to start PG and remove some

Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-16 Thread Joshua D. Drake
On Fri, 16 May 2008 11:07:17 -0600 Scott Marlowe [EMAIL PROTECTED] wrote: Sorry, but that's like saying that open heart surgery isn't a fix for clogged arteries because you should have been taking aspirin everyday and exercising. It might not be the best answer, but sometimes it's the only

[PERFORM] very slow left join

2008-05-16 Thread Ben
I've inherited an Oracle database that I'm porting to Postgres, and this has been going quite well until now. Unfortunately, I've found one view (a largish left join) that runs several orders of magnitude slower on Postgres than it did on Oracle. = select version();

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Greg Smith
On Thu, 15 May 2008, Alvaro Herrera wrote: Starting a transaction does not write anything to pg_clog. For Matt and others, some details here are in src/backend/access/transam/README: pg_clog records the commit status for each transaction that has been assigned an XID. Transactions and

Re: [PERFORM] very slow left join

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 11:56 AM, Ben [EMAIL PROTECTED] wrote: I've inherited an Oracle database that I'm porting to Postgres, and this has been going quite well until now. Unfortunately, I've found one view (a largish left join) that runs several orders of magnitude slower on Postgres than it

Re: [PERFORM] very slow left join

2008-05-16 Thread Ben
On Fri, 16 May 2008, Scott Marlowe wrote: Just for giggles, try running the query like so: set enable_nestloop = off; explain analyze ... and see what happens. I'm guessing that the nested loops are bad choices here. You guess correctly, sir! Doing so shaves 3 orders of magnitude off the

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Simon Riggs
On Fri, 2008-05-16 at 14:05 -0400, Greg Smith wrote: After reading the code and that documentation a bit, the part I'm still not sure about is whether the CLOG entry is created when the XID is assigned and then kept current as the state changes, or whether that isn't even in CLOG until the

Re: [PERFORM] very slow left join

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 12:21 PM, Ben [EMAIL PROTECTED] wrote: On Fri, 16 May 2008, Scott Marlowe wrote: Just for giggles, try running the query like so: set enable_nestloop = off; explain analyze ... and see what happens. I'm guessing that the nested loops are bad choices here. You

Re: [PERFORM] very slow left join

2008-05-16 Thread Ben
On Fri, 16 May 2008, Scott Marlowe wrote: Well, I'm guessing that you aren't in locale=C and that the text Correct, I am not. And my understanding is that by moving to the C locale, I would loose utf8 validation, so I don't want to go there. Though, it's news to me that I would get any kind

Re: [PERFORM] very slow left join

2008-05-16 Thread Craig Ringer
Ben wrote: On Fri, 16 May 2008, Scott Marlowe wrote: Well, I'm guessing that you aren't in locale=C and that the text Correct, I am not. And my understanding is that by moving to the C locale, I would loose utf8 validation, so I don't want to go there. Though, it's news to me that I would

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Alvaro Herrera
Greg Smith wrote: After reading the code and that documentation a bit, the part I'm still not sure about is whether the CLOG entry is created when the XID is assigned and then kept current as the state changes, or whether that isn't even in CLOG until the transaction is committed. It

Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Scott Marlowe
On Wed, May 14, 2008 at 9:33 AM, Rusty Conover [EMAIL PROTECTED] wrote: Returning to this problem this morning, I made some more insight. One way I did find that worked to control the loop (but doesn't yield the same results because its a left join) select wc_rule.id from wc_rule left join

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Alvaro Herrera
Alvaro Herrera wrote: pg_clog is allocated in pages of 8kB apiece(*). On allocation, pages are zeroed, which is the bit pattern for transaction in progress. So when a transaction starts, it only needs to ensure that the pg_clog page that corresponds to it is allocated, but it need not write

Re: [PERFORM] Please ignore ...

2008-05-16 Thread Alvaro Herrera
Marc G. Fournier wrote: Someone on this list has one of those 'confirm your email' filters on their mailbox, which is bouncing back messages ... this is an attempt to try and narrow down the address that is causing this ... So it seems you're still unable to determine the problematic

Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Rusty Conover
On May 16, 2008, at 2:35 PM, Scott Marlowe wrote: On Wed, May 14, 2008 at 9:33 AM, Rusty Conover [EMAIL PROTECTED] wrote: Returning to this problem this morning, I made some more insight. One way I did find that worked to control the loop (but doesn't yield the same results because its a

Re: [PERFORM] Regexps - never completing join.

2008-05-16 Thread Scott Marlowe
On Fri, May 16, 2008 at 3:37 PM, Rusty Conover [EMAIL PROTECTED] wrote: On May 16, 2008, at 2:35 PM, Scott Marlowe wrote: On Wed, May 14, 2008 at 9:33 AM, Rusty Conover [EMAIL PROTECTED] wrote: Returning to this problem this morning, I made some more insight. One way I did find that

Re: [PERFORM] I/O on select count(*)

2008-05-16 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes: Greg Smith wrote: After reading the code and that documentation a bit, the part I'm still not sure about is whether the CLOG entry is created when the XID is assigned and then kept current as the state changes, or whether that isn't even in CLOG