Re: [PERFORM] Advise about how to delete entries

2005-09-05 Thread Arnau

Hi all,

>
> COPY FROM a file with all the ID's to delete, into a temporary 
table, and  do a joined delete to your main table (thus, only one query).



  I already did this, but I don't have idea about how to do this join, 
could you give me a hint ;-) ?


Thank you very much
--
Arnau


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Query slow after VACUUM ANALYZE

2005-09-05 Thread gdh
Hi all

I'm having a strange problem with a query which looks like this:

SELECT id FROM orders WHERE id NOT IN (SELECT order_id FROM orders_items);

The id fields are varchars (32), both indexed. The number of rows in the
tables are about 6.

Now, the really strange part is if I delete all data from orders_items,
run VACUUM ANALYZE, then import all the data, the query finshes in about 3
seconds. Then I run VACUUM ANALYZE, and *after* the vacuum, the query
takes
about 30 minutes to run. The data is the same and this is the only query
running, and the machine load is effectively none.

EXPLAIN'ng the query shows, before VACUUM ANALYZE, shows this:

   QUERY PLAN
-
 Seq Scan on orders  (cost=0.00..12184.14 rows=29526 width=33)
   Filter: (NOT (hashed subplan))
   SubPlan
 ->  Seq Scan on orders_items  (cost=0.00..0.00 rows=1 width=33)

After the vacuum, the plan is like this:

   QUERY PLAN

 Seq Scan on fsi_orders  (cost=0.00..40141767.46 rows=29526 width=33)
   Filter: (NOT (subplan))
   SubPlan
 ->  Seq Scan on fsi_orders_items  (cost=0.00..1208.12 rows=60412
width=33)


Any ideas what I can do to make the query running in < 10 seconds?

Thanks,
Guðmundur.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor SQL performance

2005-09-05 Thread Alexander Kirpa
Place
'and date(r.site_timestamp) = h.first_order_date'
after WHERE

Best regards,
 Alexander Kirpa


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] shared buffers

2005-09-05 Thread Martin Nickel
Chris,
Would you say that 3 pages is a good maximum for a Postgres install?
We're running 8.0.3 on 64-bit SUSE on a dual Opteron box with 4G and have
shared_buffers set at 12.  I've moved it up and down (it was 16
when I got here) without any measurable performance difference.

The reason I ask is because I occasionally see large-ish queries take
forever (like cancel-after-12-hours forever) and wondered if this could
result from shared_buffers being too large.

Thanks for your (and anyone else's) help!
Martin Nickel

On Tue, 30 Aug 2005 10:08:21 +0800, Christopher Kings-Lynne wrote:

>> I forgot to say that it´s a 12GB database...
> 
> That's actually not that large.
> 
>> Ok, I´ll set shared buffers to 30.000 pages but even so "meminfo" and 
>> "top" shouldn´t show some shared pages?
> 
> Yeah. The reason for not setting buffers so high is because PostgreSQL 
> cannot efficiently manage huge shared buffers, so you're better off 
> giving the RAM to Linux's disk cache.
> 
> Chris
> 
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Postgresql Hardware - Recommendations

2005-09-05 Thread Christian.Kastner
Hello,

My company has decided to migrate our Oracle database to postgresql8. We
will aquire a new server for this, and would very much appreciate your
advice.

NOTE: The applications accessing the database are developed and
maintained externally, and unfortunately, the developers have not yet
given us detailed information on their requirements. The only info I can
give so far is that the database size is about 60GB, and that it will be
frequently accessed by multiple users (about 100 will be connected
during business hours). The applications accessing the database are
mostly reporting tools.

I know that the performance question will ultimately boil down to "it
depends what you want to do with it", but at the moment I'm very much
interested if there are any general issues we should look out for.

The questions we are asking us now are:

1) Intel or AMD (or alternate Platform)
Are we better of with Xeons or Opterons? Should we consider the IBM
OpenPower platform?

2) CPUs vs cache
Would you rather have more CPUs or more cache? Eg: 4x Xeon 1MB vs 2x
Xeon 8MB

3) CPUs vs Memory
Would you rather have 4x CPUs and 8GB of memory, or 2x CPUs with 16GB of
memory?

Thanks in advance for all your replies!

Best Regards,
Christian Kastner

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query slow after VACUUM ANALYZE

2005-09-05 Thread gdh
Hi again

[..]

>
>QUERY PLAN
> -
>  Seq Scan on orders  (cost=0.00..12184.14 rows=29526 width=33)
>Filter: (NOT (hashed subplan))
>SubPlan
>  ->  Seq Scan on orders_items  (cost=0.00..0.00 rows=1 width=33)
>
> After the vacuum, the plan is like this:
>
>QUERY PLAN
> 
>  Seq Scan on fsi_orders  (cost=0.00..40141767.46 rows=29526 width=33)
>Filter: (NOT (subplan))
>SubPlan
>  ->  Seq Scan on fsi_orders_items  (cost=0.00..1208.12 rows=60412
> width=33)
>

This, of course, should be "orders", not "fsi_orders", and "orders_items",
not "fsi_orders_items". Sorry for the confusion.

Additional info: I'm running PostgreSQL 7.4.8.

Thanks,
Guðmundur.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query slow after VACUUM ANALYZE

2005-09-05 Thread Tom Lane
[EMAIL PROTECTED] writes:
> Any ideas what I can do to make the query running in < 10 seconds?

Increase work_mem (or sort_mem in older releases).  PG is dropping
back from the hash plan because it thinks the hashtable won't fit
in work_mem.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Need for speed 3

2005-09-05 Thread Nicholas E. Wakefield
Ulrich,

Luke cc'd me on his reply and you definitely should have a look at
Bizgres Clickstream. Even if the whole stack doesn't match you needs,
though it sounds like it would. The clickstream focused TELL and BizGres
enhancements could make your life a little easier.

Basically the stack components that you might want to look at first are:

BizGres flavor of PostGreSQL - Enhanced for business intelligence and
data warehousing - The www.bizgres.com website can speak to this in more
detail.
Clickstream Data Model - Pageview fact table surrounded by various
dimensions and 2 core staging tables for the cleansed weblog data.
ETL Platform - Contains a weblog sessionizer, cleanser and ETL
transformations, which can handle 2-3 million hits without any trouble.
With native support for the COPY command, for even greater performance.
JasperReports - For pixel perfect reporting.

Sorry for sounding like I'm in marketing or sales, however I'm not.

Couple of key features that might interest you, considering your email.
The weblog parsing component allows for relatively complex cleansing,
allowing for less data to be written to the DB and therefore increasing
throughput. In addition, if you run every 5 minutes there would be no
need to truncate the days data and reload, the ETL knows how to connect
the data from before. The copy enhancement to postgresql found in
bizgres, makes a noticeable improvement when loading data.
The schema is basically

Dimension tables Session, Known Party (If cookies are logged), Page, IP
Address, Date, Time, Referrer, Referrer Page.
Fact tables: Pageview, Hit Subset (Not everyone wants all hits).

Staging Tables: Hits (Cleansed hits or just pageviews without surrogate
keys), Session (Session data gathered while parsing the log).

Regards

Nick


-Original Message-
From: Luke Lonergan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 01, 2005 9:38 AM
To: Ulrich Wisser; pgsql-performance@postgresql.org
Cc: Nicholas E. Wakefield; Barry Klawans; Daria Hutchinson
Subject: Re: [PERFORM] Need for speed 3

Ulrich,

On 9/1/05 6:25 AM, "Ulrich Wisser" <[EMAIL PROTECTED]>
wrote:

> My application basically imports Apache log files into a Postgres 
> database. Every row in the log file gets imported in one of three (raw
> data) tables. My columns are exactly as in the log file. The import is

> run approx. every five minutes. We import about two million rows a
month.

Bizgres Clickstream does this job using an ETL (extract transform and
load) process to transform the weblogs into an optimized schema for
reporting.
 
> After every import the data from the current day is deleted from the 
> reporting table and recalculated from the raw data table.

This is something the optimized ETL in Bizgres Clickstream also does
well.
 
> What do you think of this approach? Are there better ways to do it? Is

> there some literature you recommend reading?

I recommend the Bizgres Clickstream docs, you can get it from Bizgres
CVS, and there will shortly be a live html link on the website.

Bizgres is free - it also improves COPY performance by almost 2x, among
other enhancements.

- Luke 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Observation about db response time

2005-09-05 Thread Jeffrey W. Baker
On Tue, 2005-08-30 at 08:13 -0500, Frank Wiles wrote:
> On Tue, 30 Aug 2005 18:35:30 +0530
> "Akshay Mathur" <[EMAIL PROTECTED]> wrote:
> 
> > Hello Friends,
> >  
> > We were having a database in pgsql7.4.2 The database was responding
> > very slowly even after full vacuum analyze (select count(*) from
> > some_table_having_18000_records was taking 18 Sec).
> >  
> > We took a backup of that db and restored it back. Now the same db on
> > same PC is responding fast (same query is taking 18 ms).
> >  
> > But we can't do the same as a solution of slow response. Do anybody
> > has faced similar problem? Is this due to any internal problem of
> > pgsql? Is there any clue to fasten the database?
> 
>   This could be because you don't have max_fsm_pages and
>   max_fsm_relations setup correctly or are not doing full vacuums 
>   often enough. 
> 
>   If your database deletes a ton of data as a matter of course then
>   sometimes a full vacuum will not clear up as much space as it could.
> 
>   Try increasing those configuration values and doing vacuums more
>   often. 
> 
>   If you should also explore upgrading to the latest 8.0 as you will
>   no doubt see noticeable speed improvements. 

This can also be caused by index bloat.  VACUUM does not clear out the
index.  You must use REINDEX for that.

-jwb

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] When to do a vacuum for highly active table

2005-09-05 Thread Rigmor Ukuhe

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:pgsql-performance-
> [EMAIL PROTECTED] On Behalf Of Markus Benne
> Sent: Wednesday, August 31, 2005 12:14 AM
> To: pgsql-performance@postgresql.org
> Subject: [PERFORM] When to do a vacuum for highly active table
> 
> We have a highly active table that has virtually all
> entries updated every 5 minutes.  Typical size of the
> table is 50,000 entries, and entries have grown fat.
> 
> We are currently vaccuming hourly, and towards the end
> of the hour we are seeing degradation, when compared
> to the top of the hour.
> 
> Vaccum is slowly killing our system, as it is starting
> to take up to 10 minutes, and load at the time of
> vacuum is 6+ on a Linux box.  During the vacuum,
> overall system is goin unresponsive, then comes back
> once vacuum completes.

Play with vacuum_cost_delay option. In our case it made BIG difference
(going from very heavy hitting to almost unnoticed vacuuming.)

Hope it helps.

Rigmor Ukuhe

> 
> If we run vacuum less frequently, degradation
> continues to the point that we can't keep up with the
> throughput, plus vacuum takes longer anyway.
> 
> Becoming quite a pickle:-)
> 
> We are thinking of splitting the table in two: the
> part the updates often, and the part the updates
> infrequently as we suspect that record size impacts
> vacuum.
> 
> Any ideas?
> 
> 
> Thanks,
> Mark
> 
> -
> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match