Re: [PERFORM] Simple machine-killing query!

2004-10-21 Thread Aaron Werman
Sounds like you need some way to match a subset of the data first, rather than try indices that are bigger than the data. Can you add operation indices, perhaps on the first 10 bytes of the keys in both tables or on a integer hash of all of the strings? If so you could join on the exact set differe

Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?

2004-10-21 Thread Aaron Werman
The intuitive thing would be to put pg into a file system. /Aaron On Thu, 21 Oct 2004 12:44:10 +0200, Leeuw van der, Tim <[EMAIL PROTECTED]> wrote: > Hi, > > I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of > hacking that doesn't touch PG code'. > > Hacking

Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30

2004-10-20 Thread Aaron Werman
I'm driving from Tenafly NJ and going to both sessions. If you're able to get to the George Washington Bridge (A train to 178th Street [Port Authority North] and a bus over the bridge), I can drive you down. I'm not sure right now about the return because I have confused plans to meet someone. /Aa

Re: [PERFORM] Performance vs Schemas

2004-10-19 Thread Aaron Werman
Right - if you split a table to a lot of more selective tables, it can often dramatically change the plan options (e.g. - in a single table, selectivity for a query may be 1% and require an expensive nested loop while in the more restrictive table it may match 14% of the data and do a cheaper scan)

[PERFORM] mmap (was First set of OSDL Shared Mem scalability results, some wierdness ...

2004-10-14 Thread Aaron Werman
pg to my mind is unique in not trying to avoid OS buffering. Other dbmses spend a substantial effort to create a virtual OS (task management, I/O drivers, etc.) both in code and support. Choosing mmap seems such a limiting an option - it adds OS dependency and limits kernel developer options (2G li

Re: [PERFORM] Normal case or bad query plan?

2004-10-12 Thread Aaron Werman
Makes sense. See DB2 8.2 info on their new implementation of cross column statistics. If this is common and you're willing to change code, you can fake that by adding a operation index on some hash function of both columns, and search for both columns and the hash. - Original Message - Fr

Re: [PERFORM] Data warehousing requirements

2004-10-07 Thread Aaron Werman
- Original Message - From: "Gabriele Bartolini" <[EMAIL PROTECTED]> To: "Aaron Werman" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, October 07, 2004 1:07 PM Subject: Re: [PERFORM] Data warehousing requirements > At 13.30 07/10/2004

Re: [PERFORM] Data warehousing requirements

2004-10-07 Thread Aaron Werman
Consider how the fact table is going to be used, and review hacking it up based on usage. Fact tables should be fairly narrow, so if there are extra columns beyond keys and dimension keys consider breaking it into parallel tables (vertical partitioning). Horizontal partitioning is your friend; esp

Re: [PERFORM] Caching of Queries

2004-10-01 Thread Aaron Werman
I'm not sure I understand your req fully. If the same request is repeatedly done with same parameters, you could implement a proxy web server with a croned script to purge stale pages. If there is substantially the same data being summarized, doing your own summary tables works; if accessed enough,

Re: [PERFORM] Slow update/insert process

2004-10-01 Thread Aaron Werman
Some quick notes:   - Using a side effect of a function to update the database feels bad to me - how long does the  SELECT into varQueryRecord md5(upc.keyp   function take / what does it's explain look like? - There are a lot of non-indexed columns on that delta master table, such as key

Re: [PERFORM] Caching of Queries

2004-10-01 Thread Aaron Werman
or a big table loaded monthly.) - Original Message - From: "Jeff" <[EMAIL PROTECTED]> To: "Mitch Pirtle" <[EMAIL PROTECTED]> Cc: "Aaron Werman" <[EMAIL PROTECTED]>; "Scott Kirkwood" <[EMAIL PROTECTED]>; "Neil Conway" <[EMAI

Re: [PERFORM] Caching of Queries

2004-09-28 Thread Aaron Werman
- Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Aaron Werman" <[EMAIL PROTECTED]> Cc: "Iain" <[EMAIL PROTECTED]>; "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, September 28, 20

Re: [PERFORM] Caching of Queries

2004-09-28 Thread Aaron Werman
- Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Iain" <[EMAIL PROTECTED]> Cc: "Jim C. Nasby" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Monday, September 27, 2004 11:17 PM Subject: Re: [PERFORM] Caching of Queries > "Iain" <[EMAIL PROTECTED]> writes: > > I can only te

Re: [PERFORM] Caching of Queries

2004-09-26 Thread Aaron Werman
There is a difference between MySQL and Oracle here. Oracle, to reduce parse/planner costs, hashes statements to see if it can match an existing optimizer plan. This is optional and there are a few flavors that range from a characher to characyter match through parse tree matches through replacing

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread aaron werman
From: "Harald Lau (Sector-X)" <[EMAIL PROTECTED]> ... > From: "Mischa Sandberg" <[EMAIL PROTECTED]> > > > If your company is currently happy with MySQL, there probably are > > other (nontechnical) reasons to stick with it. I'm impressed that > > you'd consider reconsidering PG. > > I'd like to seco

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-14 Thread Aaron Werman
> Mark Cotner wrote: > > The time has come to reevaluate/rearchitect an > > application which I built about 3 years ago. There > > are no performance concerns with MySQL, but it would > > benefit greatly from stored procedures, views, etc. > From: "Mischa Sandberg" <[EMAIL PROTECTED]> > If your

Re: [PERFORM] linux distro for better pg performance

2004-05-04 Thread Aaron Werman
The comparison is actually dead on. If you have lots of write through / read behind cache, RAID 5 can run very quickly, until the write rate overwhelms the cache - at which point the 4 I/O per write / 2 per read stops it. This means that RAID 5 works, except when stressed, which is a bad paradigm.

Re: [PERFORM] Help with performance problems

2004-04-23 Thread Aaron Werman
Your second server has queuing (load averages are highish), only 2 processes running, and almost all cycles are idle. You need to track down your bottleneck. Have you looked at iostat/vmstat? I think it would be useful to post these, ideally both before and after full vacuum analyze. /Aaron

Re: [PERFORM] Looking for ideas on how to speed up warehouse loading

2004-04-23 Thread Aaron Werman
By definition, it is equivalent to: SELECT t1.id, t2.url FROM referral_temp t2 LEFT /*OUTER*/ JOIN d_referral t1 ON t2.url = t1.referral_raw_url union all SELECT null, url FROM referral_temp WHERE url is null ORDER BY 1; /Aaron - Original Message - From: "Joe Conway" <[EMAIL PROTECTED

Re: [PERFORM] Moving postgres to FC disks

2004-04-20 Thread Aaron Werman
I agree on not linking and adding non-SAN disk dependancy to your DB. I'm trying to understand your FS reasoning. I have never seen XFS run faster than ReiserFS in any situation (or for that matter beat any FS in performance except JFS). XFS has some nifty very large file features, but we're

Re: [PERFORM] possible improvement between G4 and G5

2004-04-20 Thread Aaron Werman
There are a few things that you can do to help force yourself to be I/O bound. These include: - RAID 5 for write intensive applications, since multiple writes per synch write is good. (There is a special case for logging or other streaming sequential writes on RAID 5) - Data journaling file syste

Re: [PERFORM] good pc but bad performance,why?

2004-04-07 Thread Aaron Werman
fsync I'm thinking 50 inserts, if autocommiting is 50TPS = ~100 IO per second (50 WAL + checkpoint IO) = roughly the I/O rate of a single drive. Huang - Are you using a single drive for pg? If so, there is a safety problem of both the data and logs used for recovery on the same drive. If the drive

Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Aaron Werman
- Original Message - From: "Josh Berkus" <[EMAIL PROTECTED]> To: "Aaron Werman" <[EMAIL PROTECTED]>; "Qing Zhao" <[EMAIL PROTECTED]>; "Tom Lane" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, April 06, 2

Re: [PERFORM] possible improvement between G4 and G5

2004-04-06 Thread Aaron Werman
- Original Message - From: "Tom Lane" <[EMAIL PROTECTED]> To: "Qing Zhao" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, April 06, 2004 1:47 AM Subject: Re: [PERFORM] possible improvement between G4 and G5 > Qing Zhao <[EMAIL PROTECTED]> writes: > > We have got a G5 64-bit p

Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-05 Thread Aaron Werman
CTED]> To: <[EMAIL PROTECTED]> Sent: Sunday, April 04, 2004 5:06 PM Subject: Re: [PERFORM] single index on more than two coulumns a bad thing? Hi Aaron, > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] Behalf Of > Aaron Werman > Sent: v

Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-03 Thread Aaron Werman
Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron - Original Message - From: "Josh Berkus" <[EMAIL PROTECTED]>

Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-02 Thread Aaron Werman
another thing that I have all over the place is a hierarchy: index on grandfather_table(grandfather) index on father_table(grandfather, father) index on son_table(grandfather, father, son) almost all of my indices are composite. Are you thinking about composite indices with low cardinality leading

Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux

2004-03-23 Thread Aaron Werman
Are you talking about http://www.potentialtech.com/wmoran/postgresql.php#conclusion - Original Message - From: "Subbiah, Stalin" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; "Matt Clark" <[EMAIL PROTECTED]>; "Subbiah, Stalin" <[EMAIL PROTECTED]>; "'Andrew Sullivan'" <[EMAIL PROTECTED]>; <[

Re: [PERFORM] atrocious update performance

2004-03-16 Thread Aaron Werman
Quick observations: 1. We have an explanation for what's going on, based on the message being exactly 666 lines long :-) 2. I'm clueless on the output, but perhaps Tom can see something. A quick glance shows that the strace seemed to run 27 seconds, during which it did: count| call ---|---

Re: [PERFORM] atrocious update performance

2004-03-15 Thread Aaron Werman
The original point was about a very slow update of an entire table with a plan that looped, and over a dozen conditional indices - vs. a 'create as' in a CPU starved environment. I stand by my statement about observing the orders of magnitude difference. In theory I agree that the update should be

Re: [PERFORM] atrocious update performance

2004-03-15 Thread Aaron Werman
Bulk updates are generally dogs (not just in pg), so I avoid doing them by doing faster selects and inserts. You can create a new table using 'create table as' to produce your target results. This is real fast - avoiding the row iteration in insert, allowing the select optimizer to run and no index

Re: [PERFORM] Scaling further up

2004-03-14 Thread Aaron Werman
Sorry about not chiming in before - I've been too swamped to think. I agree with most of the points, but a lot of these posts are interesting and seem to describe systems from an SA perspective to my DBA-centric view. - Original Message - From: "Marty Scholes" <[EMAIL PROTECTED]> To: <[EM