Re: [PERFORM] multi-line copy (was: Re: COPY Hacks)

2005-04-07 Thread Mischa Sandberg
Quoting Greg Sabino Mullane <[EMAIL PROTECTED]>: > > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface, > > I accidentally strung together several \n-terminated input lines, > > and sent them to the server with a single "putline". > ... > > So. Is it a feechur? Worth stress-testing?

[PERFORM] Strange serialization problem

2005-04-13 Thread Mischa Sandberg
I have a performance problem; I'd like any suggestions on where to continue investigation. A set of insert-only processes seems to serialize itself. :-( The processes appear to be blocked on disk IO, and probably the table drive, rather than the pg_xlog drive. Each process is inserting a block

Re: [PERFORM] Slow copy with little CPU/disk usage

2005-04-19 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > A friend of mine has an application where he's copying in 4000 rows at a > > time into a table that has about 4M rows. Each row is 40-50 bytes. This > > is taking 25 seconds on a dual PIII-1GHz with 1G of R

Re: [PERFORM] Index bloat problem?

2005-04-21 Thread Mischa Sandberg
Quoting Bill Chandler <[EMAIL PROTECTED]>: > ... The normal activity is to delete 3-5% of the rows per day, > followed by a VACUUM ANALYZE. ... > However, on occasion, deleting 75% of rows is a > legitimate action for the client to take. > > In case nobody else has asked: is your max_fsm_page

Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-22 Thread Mischa Sandberg
Quoting Alvaro Herrera <[EMAIL PROTECTED]>: > One further question is: is this really a meaningful test? I mean, in > production are you going to query 30 rows regularly? And is the > system always going to be used by only one user? I guess the question > is if this big select is representa

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-26 Thread Mischa Sandberg
Quoting Andrew Dunstan <[EMAIL PROTECTED]>: > After some more experimentation, I'm wondering about some sort of > adaptive algorithm, a bit along the lines suggested by Marko Ristola, but limited to 2 rounds. > > The idea would be that we take a sample (either of fixed size, or > some sm

Re: [PERFORM] Suggestions for a data-warehouse migration routine

2005-04-28 Thread Mischa Sandberg
Quoting Richard Rowell <[EMAIL PROTECTED]>: > I've ported enough of my companies database to Postgres to make > warehousing on PG a real possibility. I thought I would toss my > data > migration architecture ideas out for the list to shoot apart.. > [...] Not much feedback required. Yes, droppi

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-04-28 Thread Mischa Sandberg
Quoting Josh Berkus : > > >Perhaps I can save you some time (yes, I have a degree in Math). If I > > >understand correctly, you're trying extrapolate from the correlation > > >between a tiny sample and a larger sample. Introducing the tiny sample > > >into any decision can only produce a less accu

Re: [PERFORM] COPY vs INSERT

2005-05-03 Thread Mischa Sandberg
> Steven Rosenstein <[EMAIL PROTECTED]> writes: > > My question is, are there any advantages, drawbacks, or outright > > restrictions to using multiple simultaneous COPY commands to load > data into > > the same table? Do you mean, multiple COPY commands (connections) being putline'd from the same

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-05-03 Thread Mischa Sandberg
Quoting Markus Schaber <[EMAIL PROTECTED]>: > Hi, Josh, > > Josh Berkus wrote: > > > Yes, actually. We need 3 different estimation methods: > > 1 for tables where we can sample a large % of pages (say, >= 0.1) > > 1 for tables where we sample a small % of pages but are "easily > estimated" > >

Re: [HACKERS] [PERFORM] Bad n_distinct estimation; hacks suggested?

2005-05-03 Thread Mischa Sandberg
Quoting Josh Berkus : > Mischa, > > > Okay, although given the track record of page-based sampling for > > n-distinct, it's a bit like looking for your keys under the > streetlight, > > rather than in the alley where you dropped them :-) > > Bad analogy, but funny. Bad analogy? Page-

Re: [PERFORM] COPY vs INSERT

2005-05-04 Thread Mischa Sandberg
Quoting David Roussel <[EMAIL PROTECTED]>: > > COPY invokes all the same logic as INSERT on the server side > > (rowexclusive locking, transaction log, updating indexes, rules). > > The difference is that all the rows are inserted as a single > > transaction. This reduces the number of fsync's on

Re: [PERFORM] COPY vs INSERT

2005-05-04 Thread Mischa Sandberg
Quoting Kris Jurka <[EMAIL PROTECTED]>: > On Wed, 4 May 2005, Mischa Sandberg wrote: > > > Copy makes better use of the TCP connection for transmission. COPY > uses > > the TCP connection like a one-way pipe. INSERT is like an RPC: the > > sender has

[PERFORM] Whence the Opterons?

2005-05-06 Thread Mischa Sandberg
After reading the comparisons between Opteron and Xeon processors for Linux, I'd like to add an Opteron box to our stable of Dells and Sparcs, for comparison. IBM, Sun and HP have their fairly pricey Opteron systems. The IT people are not swell about unsupported purchases off ebay. Anyone care to

Re: [PERFORM] Whence the Opterons?

2005-05-08 Thread Mischa Sandberg
Thanks to everyone for their pointers to suppliers of Opteron systems. The system I'm pricing is under a tighter budget than a production machine, because it will be for perftests. Our customers tend to run on Dells but occasionally run on (Sun) Opterons. ---(end of bro

Re: [PERFORM] Query tuning help

2005-05-09 Thread Mischa Sandberg
Quoting Russell Smith <[EMAIL PROTECTED]>: > On Mon, 9 May 2005 11:49 am, Dan Harris wrote: > > On May 8, 2005, at 6:51 PM, Russell Smith wrote: > [snip] > > select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat > > FROM em > > JOIN ea ON em.incidentid = ea.incidentid --- sligh

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > I'm not really familiar enough with hash indexes to know if this > would > work, but if the maximum bucket size was known you could use that to > determine a maximum range of buckets to look at. In some cases, that > range would include only one bucket

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting [EMAIL PROTECTED]: > > exploring the option of buying 10 cheapass > > machines for $300 each. At the moment, that $300 buys you, from > Dell, a > > 2.5Ghz Pentium 4 > > Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of > the 2.5 > GHz Pentium, especially for PostgreSQL.

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > Well, in a hash-join right now you normally end up feeding at least > one > side of the join with a seqscan. Wouldn't it speed things up > considerably if you could look up hashes in the hash index instead? You might want to google on "grace hash" and

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Alex Stapleton <[EMAIL PROTECTED]>: > This is why I mention partitioning. It solves this issue by storing > different data sets on different machines under the same schema. > These seperate chunks of the table can then be replicated as well for > data redundancy and so on. MySQL are wor

Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-10 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>: > Mischa Sandberg <[EMAIL PROTECTED]> writes: > > The PG hash join is the simplest possible: build a hash table in > memory, and match an input stream against it. > > [ raised eyebrow... ] Apparently you've not read the

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: > > This is why I mention partitioning. It solves this issue by storing > > different data sets on different machines under the same schema. > > These seperate chunks of the table can then be replicated as well for > > data redundancy and so o

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: > > >>*laff* > >>Yeah, like they've been working on views for the last 5 years, and > >>still haven't released them :D :D :D > > > > ? > > http://dev.mysql.com/doc/mysql/en/create-view.html > > ...for MySQL 5.0.1+ ? > > Give me a call when i

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting "Jim C. Nasby" <[EMAIL PROTECTED]>: > To the best of my knowledge no such work has been done. There is a > project (who's name escapes me) that lets you run queries against a > remote postgresql server from a postgresql connection to a different > server, which could serve as the basis for

Re: [PERFORM] Prefetch

2005-05-11 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne <[EMAIL PROTECTED]>: > > Another trick you can use with large data sets like this when you > want > > results > > back in seconds is to have regularly updated tables that aggregate > the data > > along each column normally aggregated against the main data set. > >

Federated PG servers -- Was: Re: [PERFORM] [GENERAL] "Hash index" vs. "b-tree index" (PostgreSQL

2005-05-11 Thread Mischa Sandberg
Was curious why you pointed out SQL-MED as a SQL-standard approach to federated servers. Always thought of it as covering access to non-SQL data, the way the lo_* interface works; as opposed to meshing compatible (to say nothing of identical) SQL servers. Just checked Jim Melton's last word on that

Re: [PERFORM] Bad plan after vacuum analyze

2005-05-11 Thread Mischa Sandberg
Quoting Guillaume Smet <[EMAIL PROTECTED]>: > Hi, > > We have some performances problem on a particular query. ... I have to say it, this was the best laid-out set of details behind a problem I've ever seen on this list; I'm going to try live up to it, the next time I have a problem of my own.

Re: [PERFORM] PostgreSQL strugling during high load

2005-05-13 Thread Mischa Sandberg
Quoting Tom Lane <[EMAIL PROTECTED]>: > "Mindaugas Riauba" <[EMAIL PROTECTED]> writes: > > ... So contents of database changes very fast. Problem is that > when > > pg_autovacuum does vacuum those changes slows down too much. > > The "vacuum cost" parameters can be adjusted to make vacuums fired

Re: [PERFORM] Best hardware

2005-06-04 Thread Mischa Sandberg
Quoting Bernd Jagla <[EMAIL PROTECTED]>: > ... the speed of the head of the HD is actually > limitiing. Also, I only experimented with RAID5, and heard that > RAID10 will be good for reading but not writing. Au contraire. RAID5 is worse than RAID10 for writing, because it has the extra implicit r

Re: [PERFORM] Tuning

2007-02-05 Thread Mischa Sandberg
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-performance- > [EMAIL PROTECTED] On Behalf Of John Parnefjord > Sent: Tuesday, January 30, 2007 2:05 AM > Subject: Re: [PERFORM] Tuning > EnterpriseDB advocates: 250 KB + 8.2 KB * shared_buffers + 14.2 kB * > max_connections up t

Re: [PERFORM] Selects query stats?

2006-05-23 Thread Mischa Sandberg
Dan Gorman wrote: All, I might be completely crazy here, but it seems every other database exposes select query stats. Postgres only exposes updates/deletes/inserts. Is there something I am missing here? Perhaps. You can EXPLAIN ANALYZE a SELECT, just like i/u/d -- but then you don't get th

Re: [PERFORM] Selects query stats?

2006-05-23 Thread Mischa Sandberg
Dan Gorman wrote: What I am looking for is that our DB is doing X selects a min. What specifically would you like to measure? Duration for specific queries? Queries in an app for which you have no source? There may be a way to get what you want by other means ... Details? I gather you cannot j

Re: [PERFORM] Selects query stats?

2006-05-23 Thread Mischa Sandberg
Alvaro Herrera wrote: Yeah, the problem seems underspecified. So, Dan, the question is, what are you trying to measure? This might be a statistic that management has always been given, for Oracle, and you need to produce the "same" number for PostgreSQL. If not, it's hard to figure out what a

Re: [PERFORM] vacuuming problems continued

2006-06-06 Thread Mischa Sandberg
Andrew Sullivan wrote: On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote: Hi, We just don't seem to be getting much benefit from autovacuum. Running a manual vacuum seems to still be doing a LOT, which suggests to me that I should either run a cron job and disable autovacuum, or just run a

Re: [PERFORM] vacuuming problems continued

2006-06-06 Thread Mischa Sandberg
Joshua D. Drake wrote: - in our env, clients occasionally hit max_connections. This is a known and (sort of) desired pushback on load. However, that sometimes knocks pg_autovacuum out. That is when you use: superuser_reserved_connections Blush. Good point. Though, when we hit max_connection

[PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg
Jim C. Nasby wrote: ... Actually, in 8.1.x I've seen some big wins from greatly increasing the amount of shared_buffers, even as high as 50% of memory, thanks to the changes made to the buffer management code. ... Anyone else run into a gotcha that one of our customers ran into? PG 7.4.8 runnin

Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg
Tom Lane wrote: Mischa Sandberg <[EMAIL PROTECTED]> writes: vmstat showed that it was swapping like crazy. Dropped shared_buffers back down again. Swapping stopped. Does Solaris have any call that allows locking a shmem segment in RAM? Yes, mlock(). But want to understand what'

Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg
Jim C. Nasby wrote: On Tue, Jun 13, 2006 at 03:21:34PM -0700, Mischa Sandberg wrote: Raised shared_buffers to 16000 (128MB). DB server dropped to a CRAWL. vmstat showed that it was swapping like crazy. Dropped shared_buffers back down again. Swapping stopped. What's sort_mem set

Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-13 Thread Mischa Sandberg
Jim C. Nasby wrote: On Tue, Jun 13, 2006 at 04:20:34PM -0700, Mischa Sandberg wrote: Jim C. Nasby wrote: What's sort_mem set to? I suspect you simply ran the machine out of memory. 8192 (8MB). No issue when shared_buffers was 2000; same apps always. So if all 50 backends were running a

Re: [PERFORM] Solaris shared_buffers anomaly?

2006-06-14 Thread Mischa Sandberg
Tom Lane wrote: Mischa Sandberg <[EMAIL PROTECTED]> writes: Tom Lane wrote: Does Solaris have any call that allows locking a shmem segment in RAM? Yes, mlock(). But want to understand what's going on before patching. Sure, but testing it with mlock() might help you understand w

Re: [PERFORM] Optimizer internals

2006-06-15 Thread Mischa Sandberg
Mark Lewis wrote: On Thu, 2006-06-15 at 14:05 -0400, John Vincent wrote: Now I've been told by our DBA that we should have been able to wholy satisfy that query via the indexes. DB2 can satisfy the query using only indexes because DB2 doesn't do MVCC. You can get pretty much the same effect

[PERFORM] Bad plan for join to aggregate of join.

2006-09-12 Thread Mischa Sandberg
PG 8.0.3 is choosing a bad plan between a query. I'm going to force the plan (by making one join into a function). I'd like to know if this is unexpected; in general, can PG see that a join on an grouped-by field can be pushed down into the query as an indexable filter? The query below joins a t

Re: [PERFORM] Bad plan for join to aggregate of join.

2006-09-12 Thread Mischa Sandberg
Tom Lane wrote: Mischa Sandberg <[EMAIL PROTECTED]> writes: can PG see that a join on an grouped-by field can be pushed down into the query as an indexable filter? No. The GROUP BY serves as a partial optimization fence. If you're concerned about the speed of this query, I recom

Re: [HACKERS] [PERFORM] Hints proposal

2006-10-20 Thread Mischa Sandberg
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bucky Jordan Sent: Thursday, October 12, 2006 2:19 PM To: josh@agliodbs.com; Jim C. Nasby Cc: pgsql-hackers@postgresql.org; pgsql-performance@postgresql.org Subject: Re: [HACKERS] [PERFORM] Hints proposal >

Re: [PERFORM] Major differences between oracle and postgres performance - what can I do ?

2004-06-29 Thread Mischa Sandberg
Don't know about Oracle, but select-distinct in MSSQL2K will indeed throw away duplicates, which chops the CPU time. Very easy to see in the graphic query plan, both in terms of CPU and the number of rows retrieved from a single-node or nested-loop subtree. Definitely a worthwhile optimization. "T

[PERFORM] Range query optimization

2004-06-29 Thread Mischa Sandberg
I'm trying to make a (qua-technical, qua-business) case for switching from MS SQL, and one of the types of query that really doesn't sit well with MS SQL2K is: -- All fields integers or equivalent. -- Table T(k, x: nonkey fields...) -- Table U(k, a, z: m)-- for each value of (k) a set of non-i

Re: [PERFORM] Query performance

2004-06-30 Thread Mischa Sandberg
Usually, when you post a request like this, you should provide something a little more concrete (the CREATE TABLE statement for that table, with Since you didn't, I'll posit something that sounds like what you're using, and take a stab at your problem.   TABLE Prices (     stock    VARCHAR(

Re: [PERFORM] Inverted-list databases (was: Working on huge RAM based datasets)

2004-07-13 Thread Mischa Sandberg
""Andy Ballingall"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > On another thread, (not in this mailing list), someone mentioned that there > are a class of databases which, rather than caching bits of database file > (be it in the OS buffer cache or the postmaster workspace), co

Re: [PERFORM] my boss want to migrate to ORACLE

2004-08-05 Thread Mischa Sandberg
Regarding Raid5 at all, you might want to look at http://www.baarf.com ""Stephane Tessier"" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > I think with your help guys I'll do it! > > I'm working on it! > > I'll work on theses issues: > > we have space for more ram(we use 2 gigs on p

[PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-25 Thread Mischa Sandberg
Coming from the MSSQL world, I'm used to the first step in optimization to be, choose your clustered index and choose it well. I see that PG has a one-shot CLUSTER command, but doesn't support continuously-updated clustered indexes. What I infer from newsgroup browsing is, such an index is impossib

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-29 Thread Mischa Sandberg
Ummm ... not quite. In MSSQL/Sybase/Oracle, a clustered index maintains its space saturation as part of each update operation. High activity does indeed result in less-full pages (typically 60-80% full for tables with heavy deletions or rowsize changes). To bring the percentage back up, you run

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-29 Thread Mischa Sandberg
I think you've probably fingered the kicker of why PG doesn't have this kind of clustering already. Hence perhaps the need for other approaches to the issue (the disk-IO efficiency of reading groups of rows related by a common key) that other DB's (with in-place update) address with synchronous c

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-29 Thread Mischa Sandberg
This discussion is starting to sound like the split in HEAP memory management evolution, into garbage-collecting (e.g. Java) and non-garbage-collecting (e.g. C++). Reclamation by GC's these days has become seriously sophisticated. CLUSTER resembles the first generation of GC's, which were singl

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-29 Thread Mischa Sandberg
Sheer nitpick here... A B-tree is where the records (data) live at all levels of the tree; B+ tree is where the records are only at the leaf level. That's what Knuth calls them, anyway. Clustered indexes for all known dbs are true B+ trees. Nonclustered indexes could be B-trees (probably aren't), s

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-29 Thread Mischa Sandberg
J. Andrew Rogers wrote: On Thu, 2004-08-26 at 12:30, Magnus Hagander wrote: IIRC, SQL Server always creates clustered indexes by default for primary keys. That would surprise me actually. Yaz, it should. It doesn't ALWAYS create clustered (unique) index for primary keys, but clustered is the defau

Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-09-10 Thread Mischa Sandberg
Mischa Sandberg wrote: Coming from the MSSQL world, I'm used to the first step in optimization to be, choose your clustered index and choose it well. I see that PG has a one-shot CLUSTER command, but doesn't support continuously-updated clustered indexes. What I infer from newsgroup b

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-13 Thread Mischa Sandberg
Mark Cotner wrote: Hi all, I had a difficult time deciding which list to post this to, so please forgive me if this list doesn't perfectly match my questions. My decision will not solely be based on performance, but it is the primary concern. I would be very appreciative if you all could comment

Re: [PERFORM] Help with extracting large volumes of records across related

2004-09-13 Thread Mischa Sandberg
Damien Dougan wrote: Basically we have a number of tables, which are exposed as 2 public views (say PvA and PvB). For each row in PvA, there are a number of related rows in PvB (this number is arbitrary, which is one of the reasons why it cant be expressed as additional columns in PvA - so we reall

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables

2004-09-15 Thread Mischa Sandberg
Simon Riggs wrote: Jim C. Nasby On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote: PostgreSQL's functionality is in many ways similar to Oracle Partitioning. Loading up your data in many similar tables, then creating a view like: CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS SELECT

Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Mischa Sandberg
Googling 'upsert' (an Oraclism, I believe) will get you hits on Oracle and DB2's implementation of MERGE, which does what AMOUNTS to what is described below (one mass UPDATE...FROM, one mass INSERT...WHERE NOT EXISTS). No, you shouldn't iterate row-by-row through the temp table. Whenever possibl

[PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-18 Thread Mischa Sandberg
Our product (Sophos PureMessage) runs on a Postgres database. Some of our Solaris customers have Oracle licenses, and they've commented on the performance difference between Oracle and Postgresql on such boxes. In-house, we've noticed the 2:1 (sometimes 5:1) performance difference in inserting row

Re: [PERFORM] sequential scan on select distinct

2004-10-11 Thread Mischa Sandberg
Tom Lane wrote: Ole Langbehn <[EMAIL PROTECTED]> writes: What do you think about the idea of an "UniqueSort" which would do sort+unique in one pass ? This is what oracle does and it is quite fast with it... Hashing is at least as fast, if not faster. regards, tom lane I got good mileage in a

Re: [PERFORM] Normal case or bad query plan?

2004-10-19 Thread Mischa Sandberg
This may sound more elaborate than it's worth, but I don't know of a better way to avoid a table scan. You want to index on a computed value that is a common prefix of your FROM and TO fields. The next step is to search on a fixed SET of prefixes of different lengths. For example, some of your ran