[PERFORM] tsearch2/GIST performance factors?

2005-10-17 Thread Craig A. James
We are indexing about 5 million small documents using tsearch2/GIST. Each document contains 2 to 50 words. This is a write once, read many situation. Write performance is unimportant, and the database contents are static. (We build it offline.) We're having problems with inconsistent

Re: [PERFORM] tsearch2/GIST performance factors?

2005-10-18 Thread Craig A. James
Oleg wrote: Did you consider *decreasing* SIGLENINT ? Size of index will diminish and performance could be increased. I use in current project SIGLENINT=15 The default value for SIGLENINT actually didn't work at all. It was only by increasing it that I got any performance at all. An

Re: [PERFORM] Used Memory

2005-10-24 Thread Craig A. James
Kevin Grittner wrote: In addition to what Mark pointed out, there is the possibility that a query is running which is scanning a large table or otherwise bringing in a large number of pages from disk. That would first use up all available unused cache space, and then may start replacing some of

Re: [PERFORM] Used Memory

2005-10-24 Thread Craig A. James
Scott Marlowe wrote: What's needed is a way for the application developer to explicitely say, This object is frequenly used, and I want it kept in memory. There's an interesting conversation happening on the linux kernel hackers mailing list right about now that applies:

Re: [PERFORM] Is There Any Way ....

2005-10-24 Thread Craig A. James
Jim C. Nasby jnasby ( at ) pervasive ( dot ) com wrote: Stefan Weiss wrote: ... IMO it would be useful to have a way to tell PG that some tables were needed frequently, and should be cached if possible. This would allow application developers to consider joins with these tables as cheap,

[PERFORM] Expensive function and the optimizer

2005-11-07 Thread Craig A. James
I have a function, call it myfunc(), that is REALLY expensive computationally. Think of it like, If you call this function, it's going to telephone the Microsoft Help line and wait in their support queue to get the answer. Ok, it's not that bad, but it's so bad that the optimizer should

[PERFORM] Perl DBD and an alarming problem

2005-11-16 Thread Craig A. James
I am mystified by the behavior of alarm in conjunction with Postgres/perl/DBD. Here is roughly what I'm doing: eval { local $SIG{ALRM} = sub {die(Timeout);}; $time = gettimeofday; alarm 20; $sth = $dbh-prepare(a query that may take a long time...); $sth-execute();

[PERFORM] Storage/Performance and splitting a table

2005-11-19 Thread Craig A. James
In a recent thread, several people pointed out that UPDATE = DELETE+INSERT. This got me to wondering. I have a table that, roughly, looks like this: create table doc ( id integer primary key, document text, keywords tsvector ); where keywords has a GIST index.

Re: [PERFORM] Perl DBD and an alarming problem

2005-11-19 Thread Craig A. James
When I set statement_timeout in the config file, it just didn't do anything - it never timed out (PG 8.0.3). ... but did you reload the server after you [changed statement_timeout]? Mystery solved. I have two servers; I was reconfiguring one and restarting the other. Duh. Thanks, Craig

[PERFORM] Hyperthreading slows processes?

2005-11-20 Thread Craig A. James
This article on ZDNet claims that hyperthreading can *hurt* performance, due to contention in the L1/L2 cache by a second process: http://news.zdnet.co.uk/0,39020330,39237341,00.htm Has anyone tested this on Postgres yet? (And based on a recent somewhat caustic thread about performance on

Re: [PERFORM] 15,000 tables

2005-12-01 Thread Craig A. James
So say I need 10,000 tables, but I can create tablespaces. Wouldn't that solve the performance problem caused by Linux's (or ext2/3's) problems with large directories? For example, if each user creates (say) 10 tables, and I have 1000 users, I could create 100 tablespaces, and assign groups

[PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
I asked a while back if there were any plans to allow developers to override the optimizer's plan and force certain plans, and received a fairly resounding No. The general feeling I get is that a lot of work has gone into the optimizer, and by God we're going to use it! I think this is just

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Tom, I see this over and over. Tweak the parameters to force a certain plan, because there's no formal way for a developer to say, I know the best plan. I think you've misunderstood those conversations entirely. The point is not to force the planner into a certain plan, it is to explore

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Christopher Kings-Lynne wrote: select * from my_table where row_num = 5 and row_num 10 and myfunc(foo, bar); You just create an index on myfunc(foo, bar) Thanks, but myfunc() takes parameters (shown here as foo, bar), one of which is not a column, it's external and

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Christopher Kings-Lynne wrote: select * from my_table where row_num = 5 and row_num 10 and myfunc(foo, bar); You just create an index on myfunc(foo, bar) only if myfunc(foo, bar) is immutable... And if it's not then the best any database can do is to index scan

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Christopher Kings-Lynne wrote: I don't necessarily disagree with your assertion that we need planner hints, but unless you or someone else is willing to submit a patch with the feature it's unlikely to ever be implemented... Now that's an answer I understand and appreciate. Open-source

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Mark Kirkwood wrote: I hear what you are saying, but to use this fine example - I don't know what the best plan is - these experiments part of an investigation to find *if* there is a better plan, and if so, why Postgres is not finding it. There isn't a database in the world that is as smart

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Tom Lane wrote: This discussion has been had before (many times) ... see the -hackers archives for detailed arguments. The one that carries the most weight in my mind is that planner hints embedded in applications will not adapt to changing circumstances --- the plan that was best when you

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Christopher Kings-Lynne wrote: Can you paste explain analyze and your effective_cache_size, etc. settings. ... This seems like a case where PostgreSQL's current optimiser should easily know what to do if your config settings are correct and you've been running ANALYZE, so I'd like to see your

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Christopher Kings-Lynne wrote: So your main example bad query is possibly just a case of lack of analyze stats and wrong postgresql.conf config? And that's what causes you to shut down your database? Don't you want your problem FIXED? I'm trying to help by raising a question that I think is

Re: [PERFORM] Overriding the optimizer

2005-12-15 Thread Craig A. James
Kevin Brown wrote: Hints are dangerous, and I consider them a last resort. If you consider them a last resort, then why do you consider them to be a better alternative than a workaround such as turning off enable_seqscan, when all the other tradeoffs are considered? If I understand

Re: [PERFORM] Overriding the optimizer

2005-12-16 Thread Craig A. James
Jaime Casanova wrote: The context is this - in a busy OLTP system, sometimes a query comes through that, for whatever reason (foolishness on my part as a developer, unexpected use by a user, imperfection of the optimizer, etc.), takes a really long time to run, usually because it table-scans one

Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Craig A. James
Alessandro Baretta [EMAIL PROTECTED] writes: I am aware that what I am dreaming of is already available through cursors, but in a web application, cursors are bad boys, and should be avoided. What I would like to be able to do is to plan a query and run the plan to retreive a limited number of

Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Craig A. James
Alessandro Baretta wrote: I think you're trying to do something at the wrong layer of your architecture. This task normally goes in your middleware layer, not your database layer. I am developing my applications in Objective Caml, and I have written the middleware layer myself. I could

Re: [PERFORM] [GENERAL] Creation of tsearch2 index is very slow

2006-01-20 Thread Craig A. James
Tom Lane wrote: Well, we're trying to split an index page that's gotten full into two index pages, preferably with approximately equal numbers of items in each new page (this isn't a hard requirement though). ... If that's correct, what you really want is to divide the values so that the

Re: [PERFORM] Query optimization with X Y JOIN

2006-01-26 Thread Craig A. James
First, this isn't really the right place to ask -- this forum is about performance, not SQL syntax. Second, this isn't a question anyone can answer in a reasonable length of time. What you're asking for usually is taught in a class on relational database theory, which is typically a semester

Re: [PERFORM] [GENERAL] Creation of tsearch2 index is very

2006-01-26 Thread Craig A. James
Ron [EMAIL PROTECTED] writes: We have two problems here. The first is that the page splitting code for these indexes currently has O(N^2) performance. The second is that whatever solution we do use for this functionality, we still need good performance during searches that use the index. No,

Re: [PERFORM] Huge Data sets, simple queries

2006-01-29 Thread Craig A. James
Mike Biamonte wrote: Does anyone have any experience with extremely large data sets? I'm mean hundreds of millions of rows. The queries I need to run on my 200 million transactions are relatively simple: select month, count(distinct(cardnum)) count(*), sum(amount) from transactions group by

Re: [PERFORM] Storing Digital Video

2006-02-09 Thread Craig A. James
Nate Byrnes wrote: I must claim some ignorance, I come from the application world... but, from a data integrity perspective, it makes a whole lot of sense to store video, images, documents, whatever in the database rather than on the file system external to it. Personally, I would use LOB's,

Re: [PERFORM] Reliability recommendations

2006-02-15 Thread Craig A. James
Jeremy Haile wrote: We are a small company looking to put together the most cost effective solution for our production database environment. Currently in production Postgres 8.1 is running on this machine: Dell 2850 2 x 3.0 Ghz Xeon 800Mhz FSB 2MB Cache 4 GB DDR2 400 Mhz 2 x 73 GB 10K SCSI

Re: qsort again (was Re: [PERFORM] Strange Create Index

2006-02-16 Thread Craig A. James
Markus Schaber wrote: Ron wrote: ...and of course if you know enough about the data to be sorted so as to constrain it appropriately, one should use a non comparison based O(N) sorting algorithm rather than any of the general comparison based O(NlgN) methods. Sounds interesting, could you

Re: [PERFORM] Cost Issue - How do I force a Hash Join

2006-02-20 Thread Craig A. James
Virag Saksena [EMAIL PROTECTED] writes: The individual queries run in 50-300 ms. However the optimizer is choosing a nested loop to join them rather than a Hash join... I have what appears to be the identical problem. This is a straightforward query that should be fairly quick, but takes

Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Craig A. James
Bruce Momjian wrote: Dell often says part X is included, but part X is not the exact same as part X sold by the original manufacturer. To hit a specific price point, Dell is willing to strip thing out of commodity hardware, and often does so even when performance suffers. For many people, this

Re: [PERFORM] Reliability recommendations

2006-02-24 Thread Craig A. James
Joshua D. Drake wrote: I find this strains credibility, that this major manufacturer of PC's would do something deceptive that hurts performance, when it would be easily detected and widely reported. Can anyone cite a specific instances where this has happened? Such as, I bought Dell model

Re: [PERFORM] No vacuum for insert-only database?

2006-03-13 Thread Craig A. James
Alvaro Herrera wrote: If I only insert data into a table, never update or delete, then I should never have to vacuum it. Is that correct? You still need to vacuum eventually, to avoid transaction Id wraparound issues. But not as often. Thanks. Any suggestions for what not as often means?

Re: [PERFORM] Migration study, step 1: bulk write performance optimization

2006-03-20 Thread Craig A. James
Mikael Carneholm wrote: I am responisble for an exciting project of evaluating migration of a medium/large application for a well-known swedish cartruck manufacturer ... The goal right now is to find the set of parameters that gives as short bulk insert time as possible, minimizing downtime

[PERFORM] Poor performance o

2006-03-21 Thread Craig A. James
I'm reposting this -- I sent this out a month ago but never got a response, and hope someone can shed some light on this. Thanks, Craig -- This is a straightforward query that should be fairly quick, but takes about 30 minutes. It's a query across three tables, call

Re: [PERFORM] Poor performance o

2006-03-21 Thread Craig A. James
Tom Lane wrote: Craig A. James [EMAIL PROTECTED] writes: It looks to me like the problem is the use of nested loops when a hash join should be used, but I'm no expert at query planning. Given the sizes of the tables involved, you'd likely have to boost up work_mem before the planner would

Re: [PERFORM] Decide between Postgresql and Mysql (help of

2006-03-29 Thread Craig A. James
Gorshkov wrote: /flame on if you were *that* worried about performance, you wouldn't be using PHP or *any* interperted language /flame off sorry - couldn't resist it :-) I hope this was just a joke. You should be sure to clarify - there might be some newbie out there who thinks you are

Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James
I think this is an old question, but I want to know if it really is well worth to not create some foreign keys an deal with the referential integrity at application-level? Trust me : do it in the application and you'll enter a world of hurt. I'm doing it with some mysql apps,

Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James
Jim C. Nasby wrote: 1. You have only one application that modifies the data. (Otherwise, you have to duplicate the rules across many applications, leading to a code-maintenance nightmare). You forgot something: 1a: You know that there will never, ever, ever, ever, be any other application

Re: [PERFORM] FOREIGN KEYS vs PERFORMANCE

2006-04-12 Thread Craig A. James
Jim C. Nasby wrote: No, I don't agree with this. Too many people waste time designing for what if... scenarios that never happen. You don't want to be dumb and design something that locks out a foreseeable and likely future need, but referential integrity doesn't meet this criterion.

Re: [PERFORM] Memory and/or cache issues?

2006-05-05 Thread Craig A. James
mcelroy, tim wrote: Sorry, been up all night and maybe provided too much information or not the right information and only confused folks, tired I guess. When I say 'in use' I am referring to the 'used' column. Thanks all who have responded to this inquiry, I appreciate it. Here's free

[PERFORM] Postgres gets stuck

2006-05-09 Thread Craig A. James
I'm having a rare but deadly problem. On our web servers, a process occasionally gets stuck, and can't be unstuck. Once it's stuck, all Postgres activities cease. kill -9 is required to kill it -- signals 2 and 15 don't work, and /etc/init.d/postgresql stop fails. Here's what the process

Re: [PERFORM] Postgres gets stuck

2006-05-11 Thread Craig A. James
Chris wrote: This is a deadly bug, because our web site goes dead when this happens, ... Sounds like a deadlock issue. ... stats_command_string = true and restart postgresql. then you'll be able to: select * from pg_stat_activity; to see what queries postgres is running and that might give

Re: [PERFORM] Postgres gets stuck

2006-05-11 Thread Craig A. James
Tom Lane wrote: My suspicion is that it's an incompatibility between malloc() libraries. On Linux there's only supposed to be one malloc, ie, glibc's version. On other platforms I'd be worried about threaded vs non-threaded libc (because the backend is not threaded), but not Linux. I guess I

Re: [PERFORM] Speed Up Offset and Limit Clause

2006-05-16 Thread Craig A. James
Christian Paul Cosinas wrote: I am creating an application that gets the value of a large table and write it to a file. Why I want to use offset and limit is for me to create a threaded application so that they will not get the same results. For example: Thread 1 : gets offset 0 limit 5000

[PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Craig A. James
Here's a corner case that might interest someone. It tripped up one of our programmers. We have a table with 10 million rows. The ID column is indexed, the table has been vacuum/analyzed. Compare these two queries: select * from tbl where id = 1000 limit 1; select * from tbl where

Re: [PERFORM] Optimizer: limit not taken into account

2006-05-17 Thread Craig A. James
Tom Lane wrote: There is not anything in there that considers whether the table's physical order is so nonrandom that the search will take much longer than it would given uniform distribution. It might be possible to do something with the correlation statistic in simple cases ... In this

Re: [PERFORM] Query hanging/not finishing inconsistently

2006-05-22 Thread Craig A. James
Meetesh Karia wrote: Hi all, We've recently started having a problem where a query that normally executes in ~15ms starts to take upwards of 20s to complete. When the connection that ran query is returned to the connection pool, it appears as though a transaction is still in progress so the

Re: [PERFORM] Some queries starting to hang

2006-06-06 Thread Craig A. James
Simon Riggs wrote: Well, it's a big query. If it ought to take a second or two, and instead is taking an hour or two (1800 times the expected runtime), that might be close enough to never to exhaust Chris' patience. Besides, we don't know whether the 1800 might itself be an underestimate (too

Re: [PERFORM] Some queries starting to hang

2006-06-06 Thread Craig A. James
Tom Lane wrote: The idea I just had was: why do we need EXPLAIN ANALYZE to run to completion? In severe cases like this thread, we might be able to discover the root cause by a *partial* execution of the plan, as long as it was properly instrumented. That way, the OP might have been able to

[PERFORM] function not called if part of aggregate

2006-06-11 Thread Craig A. James
My application has a function, call it foo(), that requires initialization from a table of about 800 values. Rather than build these values into the C code, it seemed like a good idea to put them on a PG table and create a second function, call it foo_init(), which is called for each value,

Re: [PERFORM] function not called if part of aggregate

2006-06-13 Thread Craig A. James
Greg Stark wrote: However that's not enough to explain what you've shown. How about you show the actual query and actual plan you're working with? The plan you've shown can't result from the query you sent. Mea culpa, sort of. But ... in fact, the plan I sent *was* from query I sent, with the

Re: [HACKERS] [PERFORM] Sun Donated a Sun Fire T2000 to the PostgreSQL

2006-06-22 Thread Craig A. James
Arjen van der Meijden wrote: First of all, this graph has no origin. Its a bit difficult to test with less than one cpu. Sure it does. I ran all the tests. They all took infinite time, and I got zero results. And my results are 100% accurate and reliable. It's perfectly valid data. :-)

[PERFORM] Sort order in sub-select

2006-06-29 Thread Craig A. James
Here is a question about SQL. I have a one-to-many pair of tables (call them P and C for parent and child). For each row of P, there are many rows in C with data, and I want to sort P on the min(c.data). The basic query is simple: select p_id, min(data) as m from c group by p_id order by

[PERFORM] explain analyze reports 20x more time than actual

2006-06-29 Thread Craig A. James
I have a query that needs to run faster, with the obvious solution being to add an index. But to confirm this, I ran explain analyze. When I run the actual query, it consistently takes 6-7 seconds by the wall clock. My application with a verbose mode enabled reports 6.6 seconds

[PERFORM] Sort order in sub-select

2006-07-02 Thread Craig A. James
Here is a subtle question about SQL. I have a one-to-many pair of tables (call them P and C for parent and child). For each row of P, there are many rows in C with data, and I want to sort P on the min(c.data). The basic query is simple: select p_id, min(data) as m from c group by p_id

Re: [PERFORM] Is postgresql ca do the job for software deployed in

2006-07-03 Thread Craig A. James
Richard Broersma Jr wrote: Each table with-in the database is assigned an OID and is located inside the DB directory. So if there is a file-system limitation on the number of files with-in a given directory it would also be a limit to the number of tables that could be created for each

[PERFORM] need vacuum after insert/truncate/insert?

2006-07-06 Thread Craig A. James
If I insert a bunch of rows, then truncate, then insert a bunch more rows, do I need to vacuum? I've been assuming that TRUNCATE TABLE is a brute-force technique that more-or-less tosses the old table and starts fresh so that no vacuum is necessary. Second question: Same scenario as above,

[PERFORM] Kill a session

2006-07-11 Thread Craig A. James
There have been dozens, perhaps hundreds, of entries in the pg-admin, pg-general, and pg-performance lists regarding killing a session, but as far as I can tell, there is no Postgres solution. Did I miss something? This raises the question: Why doesn't Postgres have a kill session command

Re: [PERFORM] Kill a session

2006-07-12 Thread Craig A. James
Magnus Hagander wrote: This raises the question: Why doesn't Postgres have a kill session command that works? Oracle has it, and it's invaluable; there is no substitute. Various writers to these PG lists have raised the question repeatedly. Is it just a matter that nobody has had the time

[PERFORM] [Fwd: Delivery Status Notification (Failure)]

2006-07-12 Thread Craig A. James
I can't find an address to complain about the mailing list itself, so apologies but I'm posting directly to this list. Every time I post to this group, I get returned mails about OTHER subscribers' invalid accounts, like the one below. What's up? This seems to be a new phenomenon. Should

Re: [PERFORM] Kill a session

2006-07-13 Thread Craig A. James
Thanks for your reply, Mark: I'm back to my original question: How do you kill a runaway query without bringing down the whole database? Is there really no answer to this? ... if you actually want to disconnect a backend process then there is nothing to let you do this remotely. I recently

Re: [PERFORM] Kill a session

2006-07-13 Thread Craig A. James
Steinar H. Gunderson wrote: On Wed, Jul 12, 2006 at 08:43:18AM -0700, Craig A. James wrote: Then you killed the wrong backend... No queries run in postmaster. They all run in postgres backends. The postmaster does very little actual work, other than keeping track of everybody else. It turns out

Re: [PERFORM] Kill a session

2006-07-14 Thread Craig A. James
Tom Lane wrote: Craig A. James [EMAIL PROTECTED] writes: Bottom line is that I was expecting instant death with SIGTERM, but instead got an agonizing, drawn out -- but safe -- death of the query. What was the query exactly? Our expectation is that all or at least most queries should respond

Re: [PERFORM] increment Rows in an SQL Result Set postgresql

2006-07-15 Thread Craig A. James
Hassan, 1. I have a function that given two parameter produces an arbitrary id, and text. However arbitrary the id and text are, they are in certain order. i.e. it is imperative that whatever processing I do, the order is preserved. What type of function is this? Did you write it in C? An

Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-22 Thread Craig A. James
The real issue here is caching across successive queries, an effect that Postgres doesn't deal with very well at the moment. If you run these queries from a standing start (freshly booted machine) you'll likely find that the indexscan plan is indeed slower than the seqscan/hash plan, just like

Re: [PERFORM] Forcing using index instead of sequential scan?

2006-07-23 Thread Craig A. James
Michael Stone wrote: On Sat, Jul 22, 2006 at 10:26:53AM -0700, Craig A. James wrote: This causes massive file-system activity and flushes all files that the kernel has cached. If you run this between each Postgres test (let it run for a couple minutes), it gives you an apples-to-apples

[PERFORM] Moving a tablespace

2006-08-22 Thread Craig A. James
Suppose, hypothetically of course, someone lacked foresight, and put a tablespace somewhere with a dumb name, like /disk2, instead of using a symbolic link with a more descriptive name. And then /disk2 needs to be renamed, say to /postgres_data, and this (hypothetical) DBA realizes he has

Re: [PERFORM] Poor performance on seq scan

2006-09-12 Thread Craig A. James
tin tout KB/t tps MB/s KB/t tps MB/s us ni sy in id 1 14 128.00 1 0.10 128.00 1 0.10 5 0 94 1 0 0 12 123.98 104 12.56 123.74 104 12.56 8 0 90 2 0 0 12 125.66 128 15.75 125.26 128 15.68 10 0 85 6 0 0 12 124.66 129 15.67 124.39 129 15.64 12 0

[PERFORM] RAID 0 not as fast as expected

2006-09-14 Thread Craig A. James
I'm experiment with RAID, looking for an inexpensive way to boost performance. I bought 4 Seagate 7200.9 120 GB SATA drives and two SIIG dual-port SATA cards. (NB: I don't plan to run RAID 0 in production, probably RAID 10, so no need to comment on the failure rate of RAID 0.) I used this

Re: [PERFORM] RAID 0 not as fast as expected

2006-09-14 Thread Craig A. James
Alan Hodgson wrote: On Thursday 14 September 2006 11:05, Craig A. James [EMAIL PROTECTED] wrote: I'm experiment with RAID, looking for an inexpensive way to boost performance. I bought 4 Seagate 7200.9 120 GB SATA drives and two SIIG dual-port SATA cards. (NB: I don't plan to run RAID 0

[PERFORM] Simple join optimized badly?

2006-10-07 Thread Craig A. James
I have two tables, SAMPLE and HITLIST that when joined, generate a monsterous sort. HITLIST_ROWS has about 48,000 rows SAMPLE has about 16 million rows The joined column is indexed in SAMPLE HITLIST_ROWS is a scratch table which is used a few times then discarded. HITLIST_ROWS has no

Re: [PERFORM] Simple join optimized badly?

2006-10-08 Thread Craig A. James
... and add 100 other problems. Hints are used because the DBA thinks that they are smarter than the optimizer; 99% of the time, they are wrong. Just try manually optimizing a complex query, you'll see -- with three join types, several scan types, aggregates, bitmaps, internal and external

Re: [PERFORM] Simple join optimized badly?

2006-10-08 Thread Craig A. James
Bruce Momjian wrote: I can do 100! on my computer, but can't do it in my head. A poor example. 100! is a simple repetative calculation, something computers are very good at. Optimizing an SQL query is very difficult, and a completely different class of problem. The fact is the PG team has

Re: [PERFORM] Simple join optimized badly?

2006-10-08 Thread Craig A. James
. this is an argument for cost-for-functions rather than hints AFAICS. Perhaps you scanned past what I wrote a couple paragraphs farther down. I'm going to repeat it because it's the KEY POINT I'm trying to make: Craig James wrote: Now you might argue that function-cost needs to be added to the optimizer's

Re: [PERFORM] Simple join optimized badly?

2006-10-09 Thread Craig A. James
Brian Herlihy wrote: PG does support hints actually.. The only thing is, the hints are expressed in an obscure, ad-hoc and implementation dependant language. For example, the Don't use index X hint (the one I used) can be accessed by replacing your index with an index on values derived from

Re: [PERFORM] Hints proposal

2006-10-15 Thread Craig A. James
Josh Berkus wrote: I actually think the way to attack this issue is to discuss the kinds of errors the planner makes, and what tweaks we could do to correct them. Here's the ones I'm aware of: -- Incorrect selectivity of WHERE clause -- Incorrect selectivity of JOIN -- Wrong estimate of rows

Re: [PERFORM] Hints proposal

2006-10-15 Thread Craig A. James
So let's cut to the bone: If someone thinks a proposal is a bad idea, and they're volunteering their time on an open-source project, why would they implement the proposal? In all the heat and smoke, I believe there are two basic conclusions we all agree on. 1. Optimizer: a) A perfect

Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Craig A. James
These tables are particularly egregious examples of ignorant database design. You need to understand the relational model This email is a *particularly* egregious example of rudeness. You owe Mr. Staubo, and the Postgress community, an apology. There is absolutely no reason to insult

Re: [PERFORM] Best COPY Performance

2006-10-23 Thread Craig A. James
Jim C. Nasby wrote: http://stats.distributed.net used to use a perl script to do some transformations before loading data into the database. IIRC, when we switched to using C we saw 100x improvement in speed, so I suspect that if you want performance perl isn't the way to go. I think you can

Re: [PERFORM] Best COPY Performance

2006-10-24 Thread Craig A. James
Jim C. Nasby wrote: Well, given that perl is using an entire CPU, it sounds like you should start looking either at ways to remove some of the overhead from perl, or to split that perl into multiple processes. I use Perl for big database copies (usually with some processing/transformation

Re: [PERFORM] Best COPY Performance

2006-10-25 Thread Craig A. James
Jim C. Nasby wrote: Wait... so you're using perl to copy data between two tables? And using a cursor to boot? I can't think of any way that could be more inefficient... What's wrong with a plain old INSERT INTO ... SELECT? Or if you really need to break it into multiple transaction blocks, at

Re: [PERFORM] Best COPY Performance

2006-10-25 Thread Craig A. James
Spiegelberg, Greg wrote: The data are on two different computers, and I do processing of the data as it passes through the application. Otherwise, the INSERT INTO ... SELECT is my first choice. Would dblink() help in any way? It might if perl wasn't so damned good at this. ;-) Craig

[PERFORM] Postgres server crash

2006-11-15 Thread Craig A. James
For the third time today, our server has crashed, or frozen, actually something in between. Normally there are about 30-50 connections because of mod_perl processes that keep connections open. After the crash, there are three processes remaining: # ps -ef | grep postgres postgres 23832

Re: [PERFORM] Postgres server crash

2006-11-16 Thread Craig A. James
Russell Smith wrote: For the third time today, our server has crashed... I would guess it's the linux OOM if you are running linux. You need to turn off killing of processes when you run out of memory. Are you getting close to running out of memory? Good suggestion, it was a memory leak

Re: [PERFORM] Postgres server crash

2006-11-16 Thread Craig A. James
By the way, in spite of my questions and concerns, I was *very* impressed by the recovery process. I know it might seem like old hat to you guys to watch the WAL in action, and I know on a theoretical level it's supposed to work, but watching it recover 150 separate databases, and find and

Re: [PERFORM] Postgres server crash

2006-11-18 Thread Craig A. James
Richard Troy wrote: I did that - spent about two f-ing hours looking for what I wanted. (Guess I entered poor choices for my searches. -frown- ) There are a LOT of articles that TALK ABOUT OOM, but prescious few actually tell you what you can do about it. Trying to save you some time: On linux

Re: [PERFORM] Postgres server crash

2006-11-19 Thread Craig A. James
Michael Stone wrote: At one point someone complained about the ability to configure, e.g., IRIX to allow memory overcommit. I worked on some large IRIX installations where full memory accounting would have required on the order of 100s of gigabytes of swap, due to large shared memory

Re: [PERFORM] Postgres server crash

2006-11-19 Thread Craig A. James
You realize that it had to be turned on explicitly on IRIX, right? But don't let facts get in the way of a good rant... On the contrary, with Irix 4 and earlier it was the default, but it caused so many problems that SGI switched the default to OFF in IRIX 5. But because it had been available

Re: [PERFORM] When to vacuum a table?

2006-11-26 Thread Craig A. James
Rod Taylor wrote: Just a minor clarification here: INSERT does not create dead rows, only UPDATE and DELETE do. Thus, if you only insert rows, you do not need to vacuum (although you probably need to analyze). Is there no real-time garbage collection at all in Postgres? And if so, is this

Re: [PERFORM] Bad iostat numbers

2006-12-05 Thread Craig A. James
Alex Turner wrote: The problem I see with software raid is the issue of a battery backed unit: If the computer loses power, then the 'cache' which is held in system memory, goes away, and fubars your RAID. I'm not sure I see the difference. If data are cached, they're not written whether it

Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-11 Thread Craig A. James
Ron wrote: We are not going to get valuable contributions nor help people become more valuable to the community by flaming them into submission. Let's support getting definitive evidence. No matter who brings it to the table ;-) Thanks, Ron, for a voice of respect and reason. Since I

[PERFORM] glibc double-free error

2007-01-01 Thread Craig A. James
I'm using pg_dump/pg_restore to quickly copy databases between servers. But my server keeps crashing when I run pg_restore: glibc detected *** double free or corruption (!prev): 0x0a00b1a0 Postgres: 8.1.4 Linux: 2.6.12-1.1381_FC3 glibc: 2.3.6-0.fc3.1 Server: Dell CPU: Xeon 2.80GHz

Re: [PERFORM] glibc double-free error

2007-01-01 Thread Craig A. James
Tom Lane wrote: Craig A. James [EMAIL PROTECTED] writes: I'm using pg_dump/pg_restore to quickly copy databases between servers. But my server keeps crashing when I run pg_restore: glibc detected *** double free or corruption (!prev): 0x0a00b1a0 What can I do to help diagnose

[PERFORM] Trivial function query optimized badly

2007-01-03 Thread Craig A. James
Well, once again I'm hosed because there's no way to tell the optimizer the cost for a user-defined function. I know this issue has already been raised (by me!) several times, but I have to remind everyone about this. I frequently must rewrite my SQL to work around this problem. Here is the

Re: [PERFORM] Trivial function query optimized badly

2007-01-03 Thread Craig A. James
Adam Rich wrote: Craig, What version of postgres are you using? I just tested this on PG 8.1.2 and was unable to reproduce these results. I wrote a simple function that returns the same text passed to it, after sleeping for 1 second. I use it in a where clause, like your example below, and

Re: [PERFORM] Trivial function query optimized badly

2007-01-03 Thread Craig A. James
Tom Lane wrote: Craig A. James [EMAIL PROTECTED] writes: CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles' LANGUAGE 'C' STRICT IMMUTABLE; Umm ... this is a single-argument function. db= explain analyze select version_id

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-05 Thread Craig A. James
Guy Rouillier wrote: I've got back access to my test system. I ran another test run with the same input data set. This time I put pg_xlog on a different RAID volume (the unused one that I suspect is a software RAID), and I turned fsync=off in postgresql.conf. I left the rest of the

  1   2   >