Re: [PERFORM] Advise about how to delete entries
Hi all, > > COPY FROM a file with all the ID's to delete, into a temporary table, and do a joined delete to your main table (thus, only one query). I already did this, but I don't have idea about how to do this join, could you give me a hint ;-) ? Thank you very much -- Arnau ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Query slow after VACUUM ANALYZE
Hi all I'm having a strange problem with a query which looks like this: SELECT id FROM orders WHERE id NOT IN (SELECT order_id FROM orders_items); The id fields are varchars (32), both indexed. The number of rows in the tables are about 6. Now, the really strange part is if I delete all data from orders_items, run VACUUM ANALYZE, then import all the data, the query finshes in about 3 seconds. Then I run VACUUM ANALYZE, and *after* the vacuum, the query takes about 30 minutes to run. The data is the same and this is the only query running, and the machine load is effectively none. EXPLAIN'ng the query shows, before VACUUM ANALYZE, shows this: QUERY PLAN - Seq Scan on orders (cost=0.00..12184.14 rows=29526 width=33) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on orders_items (cost=0.00..0.00 rows=1 width=33) After the vacuum, the plan is like this: QUERY PLAN Seq Scan on fsi_orders (cost=0.00..40141767.46 rows=29526 width=33) Filter: (NOT (subplan)) SubPlan -> Seq Scan on fsi_orders_items (cost=0.00..1208.12 rows=60412 width=33) Any ideas what I can do to make the query running in < 10 seconds? Thanks, Guðmundur. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Poor SQL performance
Place 'and date(r.site_timestamp) = h.first_order_date' after WHERE Best regards, Alexander Kirpa ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] shared buffers
Chris, Would you say that 3 pages is a good maximum for a Postgres install? We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have shared_buffers set at 12. I've moved it up and down (it was 16 when I got here) without any measurable performance difference. The reason I ask is because I occasionally see large-ish queries take forever (like cancel-after-12-hours forever) and wondered if this could result from shared_buffers being too large. Thanks for your (and anyone else's) help! Martin Nickel On Tue, 30 Aug 2005 10:08:21 +0800, Christopher Kings-Lynne wrote: >> I forgot to say that it´s a 12GB database... > > That's actually not that large. > >> Ok, I´ll set shared buffers to 30.000 pages but even so "meminfo" and >> "top" shouldn´t show some shared pages? > > Yeah. The reason for not setting buffers so high is because PostgreSQL > cannot efficiently manage huge shared buffers, so you're better off > giving the RAM to Linux's disk cache. > > Chris > > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] Postgresql Hardware - Recommendations
Hello, My company has decided to migrate our Oracle database to postgresql8. We will aquire a new server for this, and would very much appreciate your advice. NOTE: The applications accessing the database are developed and maintained externally, and unfortunately, the developers have not yet given us detailed information on their requirements. The only info I can give so far is that the database size is about 60GB, and that it will be frequently accessed by multiple users (about 100 will be connected during business hours). The applications accessing the database are mostly reporting tools. I know that the performance question will ultimately boil down to "it depends what you want to do with it", but at the moment I'm very much interested if there are any general issues we should look out for. The questions we are asking us now are: 1) Intel or AMD (or alternate Platform) Are we better of with Xeons or Opterons? Should we consider the IBM OpenPower platform? 2) CPUs vs cache Would you rather have more CPUs or more cache? Eg: 4x Xeon 1MB vs 2x Xeon 8MB 3) CPUs vs Memory Would you rather have 4x CPUs and 8GB of memory, or 2x CPUs with 16GB of memory? Thanks in advance for all your replies! Best Regards, Christian Kastner ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query slow after VACUUM ANALYZE
Hi again [..] > >QUERY PLAN > - > Seq Scan on orders (cost=0.00..12184.14 rows=29526 width=33) >Filter: (NOT (hashed subplan)) >SubPlan > -> Seq Scan on orders_items (cost=0.00..0.00 rows=1 width=33) > > After the vacuum, the plan is like this: > >QUERY PLAN > > Seq Scan on fsi_orders (cost=0.00..40141767.46 rows=29526 width=33) >Filter: (NOT (subplan)) >SubPlan > -> Seq Scan on fsi_orders_items (cost=0.00..1208.12 rows=60412 > width=33) > This, of course, should be "orders", not "fsi_orders", and "orders_items", not "fsi_orders_items". Sorry for the confusion. Additional info: I'm running PostgreSQL 7.4.8. Thanks, Guðmundur. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query slow after VACUUM ANALYZE
[EMAIL PROTECTED] writes: > Any ideas what I can do to make the query running in < 10 seconds? Increase work_mem (or sort_mem in older releases). PG is dropping back from the hash plan because it thinks the hashtable won't fit in work_mem. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Need for speed 3
Ulrich, Luke cc'd me on his reply and you definitely should have a look at Bizgres Clickstream. Even if the whole stack doesn't match you needs, though it sounds like it would. The clickstream focused TELL and BizGres enhancements could make your life a little easier. Basically the stack components that you might want to look at first are: BizGres flavor of PostGreSQL - Enhanced for business intelligence and data warehousing - The www.bizgres.com website can speak to this in more detail. Clickstream Data Model - Pageview fact table surrounded by various dimensions and 2 core staging tables for the cleansed weblog data. ETL Platform - Contains a weblog sessionizer, cleanser and ETL transformations, which can handle 2-3 million hits without any trouble. With native support for the COPY command, for even greater performance. JasperReports - For pixel perfect reporting. Sorry for sounding like I'm in marketing or sales, however I'm not. Couple of key features that might interest you, considering your email. The weblog parsing component allows for relatively complex cleansing, allowing for less data to be written to the DB and therefore increasing throughput. In addition, if you run every 5 minutes there would be no need to truncate the days data and reload, the ETL knows how to connect the data from before. The copy enhancement to postgresql found in bizgres, makes a noticeable improvement when loading data. The schema is basically Dimension tables Session, Known Party (If cookies are logged), Page, IP Address, Date, Time, Referrer, Referrer Page. Fact tables: Pageview, Hit Subset (Not everyone wants all hits). Staging Tables: Hits (Cleansed hits or just pageviews without surrogate keys), Session (Session data gathered while parsing the log). Regards Nick -Original Message- From: Luke Lonergan [mailto:[EMAIL PROTECTED] Sent: Thursday, September 01, 2005 9:38 AM To: Ulrich Wisser; pgsql-performance@postgresql.org Cc: Nicholas E. Wakefield; Barry Klawans; Daria Hutchinson Subject: Re: [PERFORM] Need for speed 3 Ulrich, On 9/1/05 6:25 AM, "Ulrich Wisser" <[EMAIL PROTECTED]> wrote: > My application basically imports Apache log files into a Postgres > database. Every row in the log file gets imported in one of three (raw > data) tables. My columns are exactly as in the log file. The import is > run approx. every five minutes. We import about two million rows a month. Bizgres Clickstream does this job using an ETL (extract transform and load) process to transform the weblogs into an optimized schema for reporting. > After every import the data from the current day is deleted from the > reporting table and recalculated from the raw data table. This is something the optimized ETL in Bizgres Clickstream also does well. > What do you think of this approach? Are there better ways to do it? Is > there some literature you recommend reading? I recommend the Bizgres Clickstream docs, you can get it from Bizgres CVS, and there will shortly be a live html link on the website. Bizgres is free - it also improves COPY performance by almost 2x, among other enhancements. - Luke ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Observation about db response time
On Tue, 2005-08-30 at 08:13 -0500, Frank Wiles wrote: > On Tue, 30 Aug 2005 18:35:30 +0530 > "Akshay Mathur" <[EMAIL PROTECTED]> wrote: > > > Hello Friends, > > > > We were having a database in pgsql7.4.2 The database was responding > > very slowly even after full vacuum analyze (select count(*) from > > some_table_having_18000_records was taking 18 Sec). > > > > We took a backup of that db and restored it back. Now the same db on > > same PC is responding fast (same query is taking 18 ms). > > > > But we can't do the same as a solution of slow response. Do anybody > > has faced similar problem? Is this due to any internal problem of > > pgsql? Is there any clue to fasten the database? > > This could be because you don't have max_fsm_pages and > max_fsm_relations setup correctly or are not doing full vacuums > often enough. > > If your database deletes a ton of data as a matter of course then > sometimes a full vacuum will not clear up as much space as it could. > > Try increasing those configuration values and doing vacuums more > often. > > If you should also explore upgrading to the latest 8.0 as you will > no doubt see noticeable speed improvements. This can also be caused by index bloat. VACUUM does not clear out the index. You must use REINDEX for that. -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] When to do a vacuum for highly active table
> -Original Message- > From: [EMAIL PROTECTED] [mailto:pgsql-performance- > [EMAIL PROTECTED] On Behalf Of Markus Benne > Sent: Wednesday, August 31, 2005 12:14 AM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] When to do a vacuum for highly active table > > We have a highly active table that has virtually all > entries updated every 5 minutes. Typical size of the > table is 50,000 entries, and entries have grown fat. > > We are currently vaccuming hourly, and towards the end > of the hour we are seeing degradation, when compared > to the top of the hour. > > Vaccum is slowly killing our system, as it is starting > to take up to 10 minutes, and load at the time of > vacuum is 6+ on a Linux box. During the vacuum, > overall system is goin unresponsive, then comes back > once vacuum completes. Play with vacuum_cost_delay option. In our case it made BIG difference (going from very heavy hitting to almost unnoticed vacuuming.) Hope it helps. Rigmor Ukuhe > > If we run vacuum less frequently, degradation > continues to the point that we can't keep up with the > throughput, plus vacuum takes longer anyway. > > Becoming quite a pickle:-) > > We are thinking of splitting the table in two: the > part the updates often, and the part the updates > infrequently as we suspect that record size impacts > vacuum. > > Any ideas? > > > Thanks, > Mark > > - > > ---(end of broadcast)--- > TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match