Re: [PERFORM] Deleting bytea, autovacuum, and 8.2/8.4 differences

2010-03-15 Thread Matthew Wakeling
On Mon, 15 Mar 2010, Tom Lane wrote: For an example like this one, you have to keep in mind that the toast-table rows for the large bytea value have to be marked deleted, too. Also, since I/O happens in units of pages, the I/O volume to delete a tuple is just as much as the I/O to create it. (T

Re: [PERFORM] GiST index performance

2010-03-15 Thread Matthew Wakeling
On Thu, 25 Feb 2010, Bruce Momjian wrote: Was there every any conclusion on this issue? Not really. Comments inline: Matthew Wakeling wrote: Revisiting the thread a month back or so, I'm still investigating performance problems with GiST indexes in Postgres. Looking at

Re: [PERFORM] pg_dump far too slow

2010-03-18 Thread Matthew Wakeling
On Sun, 14 Mar 2010, David Newall wrote: nohup time pg_dump -f database.dmp -Z9 database I presumed pg_dump was CPU-bound because of gzip compression, but a test I ran makes that seem unlikely... There was some discussion about this a few months ago at http://archives.postgresql.org/pgsql-

Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes

2010-03-22 Thread Matthew Wakeling
On Fri, 19 Mar 2010, Stephen Frost wrote: ...it has to go to an external on-disk sort (see later on, and how to fix that). This was covered on this list a few months ago, in http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and http://archives.postgresql.org/pgsql-performa

Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling
On Sat, 20 Mar 2010, Yeb Havinga wrote: The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with

Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling
On Mon, 22 Mar 2010, Yeb Havinga wrote: Yes, that is certainly a factor. For example, the page size for bioseg which we use here is 130 entries, which is very excessive, and doesn't allow very deep trees. On the other hand, it means that a single disc seek performs quite a lot of work. Yeah,

Re: [PERFORM] memory question

2010-03-25 Thread Matthew Wakeling
On Wed, 24 Mar 2010, Campbell, Lance wrote: I have 24 Gig of memory on my server... Our server manager seems to think that I have way to much memory. He thinks that we only need 5 Gig. You organisation probably spent more money getting your server manager to investigate how much RAM you need

Re: [PERFORM] Optimizer showing wrong rows in plan

2010-03-29 Thread Matthew Wakeling
On Mon, 29 Mar 2010, Tadipathri Raghu wrote: As per the documentation, one page is 8kb, when i create a table with int as one column its 4 bytes. If i insert 2000 rows, it should be in one page only as its 8kb, but its extending vastly as expected. Example shown below, taking the previous example

Re: [PERFORM] Performance regarding LIKE searches

2010-03-30 Thread Matthew Wakeling
On Mon, 29 Mar 2010, randa...@bioinfo.wsu.edu wrote: WHERE ... lower(n.name) LIKE 'Scaffold:scaffold_163:1000..1199%' ... I'm sure you noticed that this is never going to return any rows? Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-p

Re: [PERFORM] experiments in query optimization

2010-03-31 Thread Matthew Wakeling
On Tue, 30 Mar 2010, Faheem Mitha wrote: work_mem = 1 GB (see diag.{tex/pdf}). Sure, but define sane setting, please. I guess part of the point is that I'm trying to keep memory low You're trying to keep memory usage low, but you have work_mem set to 1GB? Matthew -- "Prove to thyself that

Re: [PERFORM] LIMIT causes planner to do Index Scan using a less optimal index

2010-04-07 Thread Matthew Wakeling
On Wed, 7 Apr 2010, sherry.ctr@faa.gov wrote: Please just let me know if Postgres can do this kind of index or not. create index idx1 on tb1(col1, col2) Then later we can find it is useful or useless. Have you tried it? Grzegorz Jaśkiewicz wrote: something like this: create index id

Re: [PERFORM] Planner not using column limit specified for one column for another column equal to first

2010-04-19 Thread Matthew Wakeling
On Sat, 17 Apr 2010, Віталій Тимчишин wrote: As of making planner more clever, may be it is possible to introduce division on "fast queries" and "long queries", so that if after fast planning cost is greater then some configurable threshold, advanced planning techniques (or settings) are used. As

Re: [PERFORM] merge join killing performance

2010-05-18 Thread Matthew Wakeling
On Tue, 18 May 2010, Scott Marlowe wrote: Aggregate (cost=902.41..902.42 rows=1 width=4) -> Merge Join (cost=869.97..902.40 rows=1 width=4) Merge Cond: (f.eid = ev.eid) -> Index Scan using files_eid_idx on files f (cost=0.00..157830.39 rows=3769434 width=8) Okay, that's we

Re: [PERFORM] merge join killing performance

2010-05-19 Thread Matthew Wakeling
On Wed, 19 May 2010, Scott Marlowe wrote: It's apparently estimating (wrongly) that the merge join won't have to scan very much of "files" before it can stop because it finds an eid value larger than any eid in the other table.  So the issue here is an inexact stats value for the max eid. I wan

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Wed, 19 May 2010, David Jarvis wrote: extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND That portion of the WHERE clause cannot use an index on m.taken. Postgres does not look inside functions (like extract) to see if something indexable is present. To get an index to work, you c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Thu, 20 May 2010, David Jarvis wrote: I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: Yes, because there is only one station_id selected. That's exactly what an index is for. Then c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
On Fri, 21 May 2010, Yeb Havinga wrote: For time based data I would for sure go for year based indexing. On the contrary, most of the queries seem to be over many years, but rather restricting on the time of year. Therefore, partitioning by month or some other per-year method would seem sensi

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-21 Thread Matthew Wakeling
Regarding the leap year problem, you might consider creating a modified day of year field, which always assumes that the year contains a leap day. Then a given number always resolves to a given date, regardless of year. If you then partition (or index) on that field, then you may get a benefit.

Re: [PERFORM] prepared query performs much worse than regular query

2010-05-21 Thread Matthew Wakeling
On Fri, 21 May 2010, Richard Yen wrote: Any ideas why the query planner chooses a different query plan when using prepared statements? This is a FAQ. Preparing a statement makes Postgres create a plan, without knowing the values that you will plug in, so it will not be as optimal as if the v

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-06-01 Thread Matthew Wakeling
On Sun, 23 May 2010, David Jarvis wrote: The measurement table indexes (on date and weather station) were not being used because the only given date ranges (e.g., 1900 - 2009) were causing the planner to do a full table scan, which is correct. I wonder if you might see some benefit from CLUSTER

Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling
On Fri, 28 May 2010, Merlin Moncure wrote: At best, if you are a ninja with the marginally documented backend api, you will create code that goes about as fast as your pl/pgsql function for 10 times the amount of input work, unless there are heavy amounts of 'other than sql' code in your function

Re: [PERFORM] PostgreSQL Function Language Performance: C vs PL/PGSQL

2010-06-01 Thread Matthew Wakeling
On Tue, 1 Jun 2010, Stephen Frost wrote: * Matthew Wakeling (matt...@flymine.org) wrote: The major case I found when writing pl/pgsql was when trying to build arrays row by row. AFAIK when I tried it, adding a row to an array caused the whole array to be copied, which put a bit of a damper on

Re: [PERFORM] Overusing 1 CPU

2010-06-02 Thread Matthew Wakeling
On Wed, 2 Jun 2010, Mozzi wrote: This box is basically adle @ the moment as it is still in testing yet top shows high usage on just 1 of the cores. First port of call: What process is using the CPU? Run top on a fairly wide terminal and use the "c" button to show the full command line. Matth

Re: [PERFORM] SELECT ignoring index even though ORDER BY and LIMIT present

2010-06-03 Thread Matthew Wakeling
On Wed, 2 Jun 2010, Jori Jovanovich wrote: (2) Making the query faster by making the string match LESS specific (odd, seems like it should be MORE) No, that's the way round it should be. The LIMIT changes it all. Consider if you have a huge table, and half of the entries match your WHERE claus

Re: [PERFORM] Weird XFS WAL problem

2010-06-03 Thread Matthew Wakeling
On Thu, 3 Jun 2010, Craig James wrote: Also, are barriers *on* on the RAID1 mount and off on the RAID10 one? It was the barriers. "barrier=1" isn't just a bad idea on ext4, it's a disaster. This worries me a little. Does your array have a battery-backed cache? If so, then it should be fast

Re: [PERFORM] slow query

2010-06-04 Thread Matthew Wakeling
On Thu, 3 Jun 2010, Anj Adu wrote: http://explain.depesz.com/s/kHa I'm interested in why the two partitions dev4_act_dy_fact and dev4_act_dy_fact_2010_05_t3 are treated so differently. I'm guessing that the former is the parent and the latter the child table? When accessing the parent table

Re: [PERFORM] Weird XFS WAL problem

2010-06-04 Thread Matthew Wakeling
On Thu, 3 Jun 2010, Greg Smith wrote: And it's also quite reasonable for a RAID controller to respond to that "flush the whole cache" call by flushing its cache. Remember that the RAID controller is presenting itself to the OS as a large disc, and hiding the individual discs from the OS. Why s

Re: [PERFORM] slow query performance

2010-06-11 Thread Matthew Wakeling
On Fri, 11 Jun 2010, Kenneth Marshall wrote: If you check the archives, you will see that this is not easy to do because of the effects of caching. Indeed. If you were to take the value at completely face value, a modern hard drive is capable of transferring sequential pages somewhere between

Re: [PERFORM] B-Heaps

2010-06-15 Thread Matthew Wakeling
On Mon, 14 Jun 2010, Eliot Gable wrote: Just curious if this would apply to PostgreSQL: http://queue.acm.org/detail.cfm?id=1814327 Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. It's a

Re: [PERFORM] Parallel queries for a web-application |performance testing

2010-06-17 Thread Matthew Wakeling
On Wed, 16 Jun 2010, Balkrishna Sharma wrote: Hello,I will have a web application having postgres 8.4+ as backend. At any given time, there will be max of 1000 parallel web-users interacting with the database (read/write)I wish to do performance testing of 1000 simultaneous read/write to the da

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Matthew Wakeling
Dimitri Fontaine wrote: Well I guess I'd prefer a per-transaction setting Not possible, as many others have said. As soon as you make an unsafe transaction, all the other transactions have nothing to rely on. On Thu, 17 Jun 2010, Pierre C wrote: A per-table (or per-index) setting makes more

Re: [PERFORM] B-Heaps

2010-06-18 Thread Matthew Wakeling
On Fri, 18 Jun 2010, Robert Haas wrote: On Tue, Jun 15, 2010 at 8:23 AM, Matthew Wakeling wrote: Absolutely, and I said in http://archives.postgresql.org/pgsql-performance/2010-03/msg00272.php but applied to the Postgres B-tree indexes instead of heaps. This is an interesting idea. I would

Re: [PERFORM] raid10 write performance

2010-06-23 Thread Matthew Wakeling
On Wed, 23 Jun 2010, Ivan Voras wrote: On 06/23/10 14:00, Florian Weimer wrote: Barrier support on RAID10 seems to require some smallish amount of non-volatile storage which supports a high number of write operations per second, so a software-only solution might not be available. If I understa

Re: [PERFORM] WAL+Os on a single disk

2010-06-24 Thread Matthew Wakeling
On Wed, 23 Jun 2010, Scott Marlowe wrote: We have a 12 x 600G hot swappable disk system (raid 10) and 2 internal disk  ( 2x 146G) Does it make sense to put the WAL and OS on the internal disks So for us, the WAL and OS and logging on the same data set works well. Generally, it is recommended

Re: [PERFORM] Write performance

2010-06-24 Thread Matthew Wakeling
On Thu, 24 Jun 2010, Janning wrote: We have a 12 GB RAM machine with intel i7-975 and using 3 disks "Seagate Barracuda 7200.11, ST31500341AS (1.5 GB)" Those discs are 1.5TB, not 1.5GB. One disk for the system and WAL etc. and one SW RAID-0 with two disks for postgresql data. Our database is a

Re: [PERFORM] ideal storage configuration

2010-06-30 Thread Matthew Wakeling
On Tue, 29 Jun 2010, Samuel Gendler wrote: The copy statements execute in a small fraction of the minute in which they occur. I'm going to ask a silly question here. If the system is already coping quite well with the load, then why are you changing it? All old data gets removed by dropping

Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-02 Thread Matthew Wakeling
On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane wrote: I'm guessing from tea leaves, but the impression I got from Eliot's description is that he's using plpgsql functions as sort comparators. It's not surprising that that sucks performance-wise compared to having the equivalent logic in C/C++ functio

Re: [PERFORM] how to (temporarily) disable/minimize benefits of disk block cache or postgresql shared buffer

2010-07-05 Thread Matthew Wakeling
On Fri, Jul 2, 2010 at 8:30 AM, Craig Ringer wrote: Yeah, if you're in a weird virtualized environment like that you're likely to have problems... On Sat, 3 Jul 2010, Rajesh Kumar Mallah wrote: Thanks for thinking about it.I do not understand why u feel OpenVz is weird. at the most its not ver

Re: [PERFORM] Two different execution plan for the same request

2010-07-07 Thread Matthew Wakeling
On Wed, 7 Jul 2010, JOUANIN Nicolas (44) wrote: It seems to work fine (same execution plan and less duration) after :  - setting default_statistics_target to 100  - full vacuum with analyze Don't do VACUUM FULL. Matthew -- I suppose some of you have done a Continuous Maths course. Yes? Contin

Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Matthew Wakeling
On Fri, 9 Jul 2010, Kevin Grittner wrote: Any thoughts on the "minimalist" solution I suggested a couple weeks ago?: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01385.php http://archives.postgresql.org/pgsql-hackers/2010-06/msg01387.php So far, there has been no comment by anyone...

Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Matthew Wakeling
On Fri, 9 Jul 2010, Kevin Grittner wrote: Interesting idea. As far as I can see, you are suggesting solving the too many connections problem by allowing lots of connections, but only allowing a certain number to do anything at a time? Right. I think in some situations, this arrangement would

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-12 Thread Matthew Wakeling
On Sat, 10 Jul 2010, Tom Lane wrote: Doesn't pgpool do this? No, and in fact that's exactly why the proposed implementation isn't ever going to be in core: it's not possible to do it portably. I'm surprised. Doesn't apache httpd do this? Does it have to do a whole load of non-portable stuff?

Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling
On Sun, 25 Jul 2010, Yeb Havinga wrote: Graph of TPS at http://tinypic.com/r/b96aup/3 and latency at http://tinypic.com/r/x5e846/3 Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Matthew

Re: [PERFORM] Testing Sandforce SSD

2010-07-26 Thread Matthew Wakeling
On Mon, 26 Jul 2010, Greg Smith wrote: Matthew Wakeling wrote: Does your latency graph really have milliseconds as the y axis? If so, this device is really slow - some requests have a latency of more than a second! Have you tried that yourself? If you generate one of those with standard

Re: [PERFORM] Advice configuring ServeRAID 8k for performance

2010-08-06 Thread Matthew Wakeling
On Thu, 5 Aug 2010, Scott Marlowe wrote: RAID6 is basically RAID5 with a hot spare already built into the array. On Fri, 6 Aug 2010, Pierre C wrote: As others said, RAID6 is RAID5 + a hot spare. No. RAID6 is NOT RAID5 plus a hot spare. RAID5 uses a single parity datum (XOR) to ensure protec

[PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling
I'm trying to eke a little bit more performance out of an application, and I was wondering if there was a better way to do the following: I am trying to retrieve, for many sets of rows grouped on a couple of fields, the value of an ungrouped field where the row has the highest value in anoth

Re: [PERFORM] Sorted group by

2010-08-10 Thread Matthew Wakeling
On Tue, 10 Aug 2010, Thomas Kellerer wrote: No. It's built in (8.4) and it's called Windowing functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html http://www.postgresql.org/docs/8.4/static/functions-window.html SELECT group, last_value(value) over(ORDER BY number) FROM table

Re: [PERFORM] Sorted group by

2010-08-11 Thread Matthew Wakeling
Original query: explain analyse select * from tracker where objectid < 120; QUERY PLAN --- Index Scan using tracker_objectid on tracker (cost=0.00..915152.62 rows=3684504 width=33) (

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling
On Mon, 21 Apr 2008, Mark Mielke wrote: This surprises me - hash values are lossy, so it must still need to confirm against the real list of values, which at a minimum should require references to the rows to check against? Is PostgreSQL doing something beyond my imagination? :-) Not too far

Re: [PERFORM] Group by more efficient than distinct?

2008-04-22 Thread Matthew Wakeling
On Tue, 22 Apr 2008, Mark Mielke wrote: The poster I responded to said that the memory required for a hash join was relative to the number of distinct values, not the number of rows. They gave an example of millions of rows, but only a few distinct values. Above, you agree with me that it it wo

Re: [PERFORM] Optimizer's issue

2008-04-24 Thread Matthew Wakeling
On Thu, 24 Apr 2008, Vlad Arkhipov wrote: It was written below in my first post: "These queries are part of big query and optimizer put them on the leaf of query tree, so rows miscount causes a real problem. " actual rows count for the first query is 294, estimate - 11; for the second -- 283 and

Re: [PERFORM] Question about disk IO an index use and seeking advice

2008-04-24 Thread Matthew Wakeling
On Thu, 24 Apr 2008, Nikolas Everett wrote: The setup is kind of a beast. No kidding. When I run dstat I see only around 2M/sec and it is not consistent at all. Well, it is having to seek over the disc a little. Firstly, your table may not be wonderfully ordered for index scans, but goodne

Re: [PERFORM] two memory-consuming postgres processes

2008-05-03 Thread Matthew Wakeling
On Fri, 2 May 2008, PFC wrote: CREATE TABLE derived AS SELECT ... FROM ... (perform all your derived calculations here) Given what you have said (that you really want all the data in one table) it may be best to proceed like this: First, take your original table, create an index on the prima

Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers

2008-05-07 Thread Matthew Wakeling
On Tue, 6 May 2008, Craig James wrote: I/O Sched AVG Test1 Test2 --- - cfq705 695715 noop 758 769747 deadline 741 705775 anticipatory 494 477511 Interesting. That contrasts with some tests I

Re: [PERFORM] multiple joins + Order by + LIMIT query performance issue

2008-05-07 Thread Matthew Wakeling
On Tue, 6 May 2008, Tom Lane wrote: If a misestimate of this kind is bugging you enough that you're willing to change the query, I think you can fix it like this: select ... from foo order by x limit n; => select ... from (select ... from foo order by x) ss limit n; The subselec

Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, Philippe Amelant wrote: using mkfs.ext3 I can use "-T" to tune the filesytem mkfs.ext3 -T fs_type ... fs_type are in /etc/mke2fs.conf (on debian) If you look at that file, you'd see that tuning really doesn't change that much. In fact, the only thing it does change (if y

Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, [EMAIL PROTECTED] wrote: IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T setting goes. ITYF it's actually 1GB/file. think twice about this. ext2/3 get slow when they fill up (they have fragmentation problems when free space gets too small), this

Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Matthew Wakeling
On Wed, 14 May 2008, Alvaro Herrera wrote: Hint bits are used to mark tuples as created and/or deleted by transactions that are know committed or aborted. To determine the visibility of a tuple without such bits set, you need to consult pg_clog and possibly pg_subtrans, so it is an expensive che

Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, Luke Lonergan wrote: BTW ­ we¹ve removed HINT bit checking in Greenplum DB and improved the visibility caching which was enough to provide performance at the same level as with the HINT bit optimization, but avoids this whole ³write the data, write it to the log also, then wr

Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, Heikki Linnakangas wrote: > Is it really safe to update the hint bits in place? If there is a > power cut in the middle of writing a block, is there a guarantee from > the disc that the block will never be garbled? Don't know, to be honest. We've never seen any reports of

Re: [PERFORM] I/O on select count(*)

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, Heikki Linnakangas wrote: There's not much point optimizing something that only helps with aborted transactions. That's fair enough, but this list method is likely to speed up index writes anyway. The general problem with any idea that involves keeping a list of changes

Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-15 Thread Matthew Wakeling
On Thu, 15 May 2008, Guillaume Cottenceau wrote: Also, IIRC when PG writes data up to a full filesystem, postmaster won't be able to then restart if the filesystem is still full (it needs some free disk space for its startup). Or maybe this has been fixed in recent versions? Ah, the "not enoug

Re: [PERFORM] I/O on select count(*)

2008-05-19 Thread Matthew Wakeling
Alvaro Herrera <[EMAIL PROTECTED]> writes: pg_clog is allocated in pages of 8kB apiece(*). On allocation, pages are zeroed, which is the bit pattern for "transaction in progress". So when a transaction starts, it only needs to ensure that the pg_clog page that corresponds to it is allocated, bu

Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Matthew Wakeling
On Thu, 22 May 2008, Tom Lane wrote: Do you have maintenance_work_mem set large enough that the index creation sort is done in-memory? 8.1 depends on the platform's qsort and a lot of them are kinda pessimal for input like this. Looking at the fact that other indexes on the same table are crea

Re: [PERFORM] Outer joins and equivalence

2008-05-28 Thread Matthew Wakeling
On Tue, 27 May 2008, Simon Riggs wrote: I do recognise that we would *not* be able to deduce this form of SQL A JOIN B ON (a.id = c.id) LEFT JOIN C ON (b.id = c.id) Surely that would not be valid SQL? Matthew -- Psychotics are consistently inconsistent. The essence of sanity is to be inconsi

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Matthew Wakeling
On Tue, 3 Jun 2008, andrew klassen wrote: Basically, I have a somewhat constant rate of inserts/updates that go into a work queue and then get passed to postgres. The cpu load is not that high, i.e. plenty of idle cpu. I am running an older version of freebsd and the iostat output is not very

Re: [PERFORM] backend pid changing

2008-06-04 Thread Matthew Wakeling
On Wed, 4 Jun 2008, Lewis Kapell wrote: The client sends its authorization information immediately before sending the data, and also with the data chunk. Well, I have no idea why the backend pid is changing, but here it looks like you have a classic concurrency problem caused by checking a var

Re: [PERFORM] insert/update tps slow with indices on table > 1M rows

2008-06-04 Thread Matthew Wakeling
On Wed, 4 Jun 2008, andrew klassen wrote: I am using multiple threads, but only one worker thread for insert/updated to this table. I don't mind trying to add multiple threads for this table, but my guess is it would not help because basically the overall tps rate is decreasing so dramatically.

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Howard Cole wrote: Alan Hodgson wrote: It's because everything is cached, in particular the relevant rows from the "email" table (accessing which took 22 of the original 27 seconds). Thanks Alan, I guessed that the caching was the difference, but I do not understand why t

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Howard Cole wrote: I think I may have answered my own question partially, the problem may be how I structure the query. Original statement: "Nested Loop (cost=4.40..65.08 rows=16 width=8)" " -> Function Scan on q (cost=0.00..0.01 rows=1 width=32)" " -> Bitmap Heap Sc

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Howard Cole wrote: They both appear to do a scan on the email table (Why?). The indexes don't contain copies of the row data. They only contain pointers to the rows in the table. So once the index has been consulted, Postgres still needs to look at the table to fetch the

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Howard Cole wrote: If I do a query that uses another index, then it uses the index only and does not scan the email table. Not true. It only looks a little bit like that from the explain output. However, if you look closely: Index Scan using email_email_directory_id_idx

Re: [PERFORM] Migration Articles.. ???

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, sathiya psql wrote: I have a database in postgres X.Y which has around 90 tables, and lot of data in it. In the next version of that product, i had some more tables, so how to migrate that,. there may be 150 tables., in that 90 tables, 70 may be the same, 20 got deleted, and

Re: [PERFORM] Which hardware ?

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Lionel wrote: I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for² slowest select). It's almost impossible to predict what users will do via the webapplication that queries this database: almost

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-17 Thread Matthew Wakeling
On Tue, 17 Jun 2008, Alan Hodgson wrote: On Tuesday 17 June 2008, Howard Cole <[EMAIL PROTECTED]> wrote: Incidentally, how can I clear the cache in between queries? Stop PostgreSQL, unmount the filesystem it's on, remount it, restart PostgreSQL. Works under Linux. If it's on a filesystem you

Re: [PERFORM] Tsearch2 Initial Search Speed

2008-06-18 Thread Matthew Wakeling
On Wed, 18 Jun 2008, Howard Cole wrote: Out of interest, if I could create a multicolumn index with both the primary key and the fts key (I don't think I can create a multi-column index using GIST with both the email_id and the fts field), would this reduce access to the table due to the primar

Re: [PERFORM] Hardware suggestions for high performance 8.3

2008-06-25 Thread Matthew Wakeling
On Wed, 25 Jun 2008, Henrik wrote: What are your suggestions. What we are currently looking at is. Dual Quad Core Intel 8 - 12 GB RAM More RAM would be helpful. It's not that expensive, compared to the rest of your system. 10 disks total. 4 x 146 GB SAS disk in RAID 1+0 for database 6 x 7

Re: [PERFORM] Hardware suggestions for high performance 8.3

2008-06-25 Thread Matthew Wakeling
On Wed, 25 Jun 2008, Henrik wrote: Would you turn off fsync if you had a controller with BBU? =) No, certainly not. Fsync is what makes the data move from the volatile OS cache to the non-volatile disc system. It'll just be a lot quicker on a controller with a BBU cache, because it won't need

Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Matthew Wakeling
On Wed, 25 Jun 2008, Merlin Moncure wrote: Has anyone done some benchmarks between hardware RAID vs Linux MD software RAID? I have here: http://merlinmoncure.blogspot.com/2007/08/following-are-results-of-our-testing-of.html The upshot is I don't really see a difference in performance. The ma

Re: [PERFORM] Hardware vs Software RAID

2008-06-25 Thread Matthew Wakeling
On Wed, 25 Jun 2008, Greg Smith wrote: A firewire-attached log device is an extremely bad idea. Anyone have experience with IDE, SATA, or SAS-connected flash devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer rate of 100MB/s, and doesn't degrade much in p

Re: [PERFORM] Hardware vs Software RAID

2008-06-26 Thread Matthew Wakeling
On Thu, 26 Jun 2008, Vivek Khera wrote: Anyone have experience with IDE, SATA, or SAS-connected flash devices like the Samsung MCBQE32G5MPP-0VA? I mean, it seems lovely - 32GB, at a transfer rate of 100MB/s, and doesn't degrade much in performance when writing small random blocks. But what's it

Re: [PERFORM] Hardware vs Software RAID

2008-06-27 Thread Matthew Wakeling
On Thu, 26 Jun 2008, Merlin Moncure wrote: In addition there are many different types of flash (MLC/SLC) and the flash cells themselves can be organized in particular ways involving various trade-offs. Yeah, I wouldn't go for MLC, given it has a tenth the lifespan of SLC. The main issue is lo

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Matthew Wakeling
On Mon, 30 Jun 2008, Moritz Onken wrote: I created a new column in "item" where I store the shortened url which makes "=" comparisons possible. Good idea. Now create an index on that column. select count(1) from result where url in (select shorturl from item where shorturl = result.url); Wh

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Matthew Wakeling
On Mon, 30 Jun 2008, Moritz Onken wrote: SELECT COUNT(*) FROM (SELECT DISTINCT result.url FROM result, item WHERE item.shorturl = result.url) AS a I tried the this approach but it's slower than WHERE IN in my case. However there's a lot more scope for improving a query along these lines, li

Re: [PERFORM] Planner should use index on a LIKE 'foo%' query

2008-06-30 Thread Matthew Wakeling
On Mon, 30 Jun 2008, Moritz Onken wrote: select count(1) from result where url in (select shorturl from item where shorturl = result.url); I really don't see what your query tries to accomplish. Why would you want "url IN (... where .. = url)"? Wouldn't you want a different qualifier somehow?

Re: [PERFORM] how does pg handle concurrent queries and same queries

2008-07-28 Thread Matthew Wakeling
On Mon, 28 Jul 2008, Faludi Gábor wrote: EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid) AS elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5; QUERY PLAN ---

Re: [PERFORM] Database size Vs performance degradation

2008-07-30 Thread Matthew Wakeling
On Wed, 30 Jul 2008, Dave North wrote: Running on HP DL380 w/ 4GB RAM, dual 10K HDDs in RAID 0+1 Checking the stats, the DB size is around 7.5GB; Doesn't fit in RAM. ...after the load, the DB size was around 2.7GB Does fit in RAM. One observation I've made on the DB system is the disk

Re: [PERFORM] Database size Vs performance degradation

2008-07-30 Thread Matthew Wakeling
On Wed, 30 Jul 2008, Craig James wrote: You don't have to change the application. One of the great advantages of Postgres is that even table creation, dropping and renaming are transactional. So you can do the select / drop / rename as a transaction by an external app, and your main applicati

Re: [PERFORM] Database size Vs performance degradation

2008-08-01 Thread Matthew Wakeling
On Thu, 31 Jul 2008, Andrzej Zawadzki wrote: Maybe I'm wrong but if this "bulk insert and delete" process is cyclical then You don't need vacuum full. Released tuples will fill up again with fresh data next day - after regular vacuum. Yes, a regular manual vacuum will prevent the table from gr

Re: [PERFORM] Filesystem benchmarking for pg 8.3.3 server

2008-08-13 Thread Matthew Wakeling
On Tue, 12 Aug 2008, Ron Mayer wrote: Really old software (notably 2.4 linux kernels) didn't send cache synchronizing commands for SCSI nor either ATA; Surely not true. Write cache flushing has been a known problem in the computer science world for several tens of years. The difference is that

Re: [PERFORM] Optimizing a VIEW

2008-08-18 Thread Matthew Wakeling
On Fri, 15 Aug 2008, Madison Kelly wrote: Below I will post the VIEW and a sample of the query's EXPLAIN ANALYZE. Thanks for any tips/help/clue-stick-beating you may be able to share! This query looks incredibly expensive: SELECT ... FROM customer a, history.customer_da

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: I have indexes on result.domain, domain_categories.domain, result.user, domain_categories.category. Clustered result on user and domain_categories on domain. "-> Materialize (cost=2118752.28..2270064.64 rows=12104989 width=8) (actual time=464

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: "HashAggregate (cost=817397.78..817428.92 rows=2491 width=8) (actual time=42874.339..42878.419 rows=3361 loops=1)" " -> Merge Join (cost=748.47..674365.50 rows=19070970 width=8) (actual > time=15702.449..42829.388 rows=36308 loops=1)" "Merge

Re: [PERFORM] Slow query with a lot of data

2008-08-18 Thread Matthew Wakeling
On Mon, 18 Aug 2008, Moritz Onken wrote: Running the query for more than one user is indeed not much slower. That's what I need. I'm clustering the results table on domain right now. But why is this better than clustering it on "user"? The reason is the way that the merge join algorithm works.

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Matthew Wakeling
On Tue, 19 Aug 2008, Moritz Onken wrote: explain select a."user", b.category, sum(1.0/b.cat_count)::float from result a, domain_categories b where a."domain" = b."domain" group by a."user", b.category; Both results and domain_categories are clustered on domain and analyzed. Why is it still sor

Re: [PERFORM] Slow query with a lot of data

2008-08-19 Thread Matthew Wakeling
On Tue, 19 Aug 2008, Moritz Onken wrote: tablename| attname | n_distinct | correlation result | domain | 1642 | 1 Well, the important thing is the correlation, which is 1, indicating that Postgres knows that the table is clustered. So I have no

Re: [PERFORM] PostgreSQL+Hibernate Performance

2008-08-20 Thread Matthew Wakeling
On Wed, 20 Aug 2008, Kranti K K Parisa™ wrote: creating multiple indexes on same column will effect performance?  for example: index1 : column1, column2, column3 index2: column1 index3: column2, index4: column3 index5: column1,column2 The sole purpose of indexes is to affect performance. Howe

Re: [PERFORM] Large number of tables slow insert

2008-08-26 Thread Matthew Wakeling
On Sat, 23 Aug 2008, Loic Petit wrote: I use Postgresql 8.3.1-1 to store a lot of data coming from a large amount of sensors. In order to have good performances on querying by timestamp on each sensor, I partitionned my measures table for each sensor. Thus I create a lot of tables. As far as

Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Matthew Wakeling
On Wed, 27 Aug 2008, Jerry Champlin wrote: After it's a day old, there are no longer any updates or inserts and we can vacuum it at that point. A pattern that has worked very well for other people is to have two separate tables (or partitions). One contains today's data, and the other contain

Re: [PERFORM] select on 22 GB table causes "An I/O error occured while sending to the backend." exception

2008-08-28 Thread Matthew Wakeling
On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't f

  1   2   3   4   >