Re: [PERFORM] Postgres Benchmark Results

2007-05-21 Thread Rich

I assume red is the postgresql.  AS you add connections, Mysql always dies.

On 5/20/07, PFC <[EMAIL PROTECTED]> wrote:



I felt the world needed a new benchmark ;)
So : Forum style benchmark with simulation of many users posting
and
viewing forums and topics on a PHP website.

http://home.peufeu.com/ftsbench/forum1.png

One of those curves is "a very popular open-source database which
claims
to offer unparallelled speed".
The other one is of course Postgres 8.2.3 which by popular belief
is
"full-featured but slow"

What is your guess ?

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Re: [PERFORM] MVCC performance issue

2010-11-12 Thread Rich
In reading what you are describing, don't you think PG 9 goes a long way to
helping you out?

On Sat, Nov 13, 2010 at 12:53 AM, Craig Ringer
wrote:

> On 11/12/2010 02:25 AM, Kyriacos Kyriacou wrote:
>
>   The
>> result is to have huge fragmentation on table space, unnecessary updates
>> in all affected indexes, unnecessary costly I/O operations, poor
>> performance on SELECT that retrieves big record sets (i.e. reports etc)
>> and slower updates.
>>
>
> Yep. It's all about trade-offs. For some workloads the in-table MVCC
> storage setup works pretty darn poorly, but for most it seems to work quite
> well.
>
> There are various other methods of implementing relational storage with
> ACID properties. You can exclude all other transactions while making a
> change to a table, ensuring that nobody else can see "old" or "new" rows so
> there's no need to keep them around. You can use an out-of-line redo log
> (a-la Oracle). Many other methods exist, too.
>
> They all have advantages and disadvantages for different workloads. It's
> far from trivial to mix multiple schemes within a single database, so mixing
> and matching schemes for different parts of your DB isn't generally
> practical.
>
>
>  1) When a raw UPDATE is performed, store all "new raw versions" either
>> in separate temporary table space
>>or in a reserved space at the end of each table (can be allocated
>> dynamically) etc
>>
>
> OK, so you want a redo log a-la Oracle?
>
>
>  2) Any SELECT queries within the same session will be again accessing
>> the new version of the row
>> 3) Any SELECT queries from other users will still be accessing the old
>> version
>>
>
> ... and incurring horrible random I/O penalties if the redo log doesn't fit
> in RAM. Again, a-la Oracle.
>
> Even read-only transactions have to hit the undo log if there's an update
> in progress, because rows they need may have been moved out to the undo log
> as they're updated in the main table storage.
>
> [snip description]
>
>
>  I understand that my suggestion seems to be too simplified and also that
>> there are many implementation details and difficulties that I am not
>> aware.
>>
>
> It sounds like you're describing Oracle-style MVCC, using redo logs.
>
>
> http://blogs.sybase.com/database/2009/04/mvcc-dispelling-some-oracle-fudunderstanding-the-cost/
>
> http://en.wikipedia.org/wiki/Multiversion_concurrency_control
>
> Oracle's MVCC approach has its own costs. Like Pg's, those costs increase
> with update/delete frequency. Instead of table bloat, Oracle suffers from
> redo log growth (or redo log size management issues). Instead of increased
> table scan costs from dead rows, Oracle suffers from random I/O costs as it
> looks up the out-of-line redo log for old rows. Instead of long-running
> writer transactions causing table bloat, Oracle can have problems with
> long-running reader transactions aborting when the redo log runs out of
> space.
>
> Personally, I don't know enough to know which is "better". I suspect
> they're just different, with different trade-offs. If redo logs allow you
>  to do without write-ahead logging, that'd be interesting - but then, the
> WAL is useful for all sorts of replication options, and the use of linear
> WALs means that write ordering in the tables doesn't need to be as strict,
> which has performance advantages.
>
> --
> Craig Ringer
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Query Performance SQL Server vs. Postgresql

2010-11-17 Thread Rich
I have to concur.  Sql is written specifially and only for Windows. It is
optimized for windows.  Postgreal is writeen for just about everything
trying to use common code so there isn't much optimization because it has to
be optimized based on the OS that is running it.  Check out your config and
send it to us.  That would include the OS and hardware configs for both
machines.

On Wed, Nov 17, 2010 at 3:47 PM, Tomas Vondra  wrote:

> Dne 17.11.2010 05:47, Pavel Stehule napsal(a):
> > 2010/11/17 Humair Mohammed :
> >>
> >> There are no indexes on the tables either in SQL Server or Postgresql -
> I am
> >> comparing apples to apples here. I ran ANALYZE on the postgresql tables,
>
> Actually no, you're not comparing apples to apples. You've provided so
> little information that you may be comparing apples to cucumbers or
> maybe some strange animals.
>
> 1) info about the install
>
> What OS is this running on? I guess it's Windows in both cases, right?
>
> How nuch memory is there? What is the size of shared_buffers? The
> default PostgreSQL settings is very very very limited, you have to bump
> it to a much larger value.
>
> What are the other inportant settings (e.g. the work_mem)?
>
> 2) info about the dataset
>
> How large are the tables? I don't mean number of rows, I mean number of
> blocks / occupied disk space. Run this query
>
> SELECT relname, relpages, reltuples, pg_size_pretty(pg_table_size(oid))
> FROM pg_class WHERE relname IN ('table1', 'table2');
>
> 3) info about the plan
>
> Please, provide EXPLAIN ANALYZE output, maybe with info about buffers,
> e.g. something like
>
> EXPLAIN (ANALYZE ON, BUFFERS ON) SELECT ...
>
> 4) no indexes ?
>
> Why have you decided not to use any indexes? If you want a decent
> performance, you will have to use indexes. Obviously there is some
> overhead associated with them, but it's premature optimization unless
> you prove the opposite.
>
> BTW I'm not a MSSQL expert, but it seems like it's building a bitmap
> index on the fly, to synchronize parallelized query - PostgreSQL does
> not support that.
>
> regards
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Performance of PostgreSQL over NFS

2010-12-21 Thread Rich
I am wondering why anyone would do that?  Too much overhead and no reliable
enough.

On Tue, Dec 21, 2010 at 2:28 PM, Mladen Gogala wrote:

> I was asked about performance of PostgreSQL on NetApp, the protocol should
> be NFSv3.  Has anybody tried it? The database in question is a DW type, a
> bunch of documents indexed by Sphinx. Does anyone have any information?
> --
>
>
> Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com The Leader in Integrated Media Intelligence
> Solutions
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Making the most of memory?

2008-01-23 Thread Rich
Josh what about the rest of your system?  What operating system?  Your
hardware setup. Drives? Raids?  What indices do you have setup for
these queries?  There are other reasons that could cause bad queries
performance.

On Jan 22, 2008 11:11 PM, Joshua Fielek <[EMAIL PROTECTED]> wrote:
>
> Hey folks --
>
> For starters, I am fairly new to database tuning and I'm still learning
> the ropes. I understand the concepts but I'm still learning the real
> world impact of some of the configuration options for postgres.
>
> We have an application that has been having some issues with performance
> within postgres 8.1.9 and later 8.2.5. The upgrade to 8.2.5 gained us a
> nice little performance increase just off the improved query
> optimization, but we are still having other performance issues.
>
> The database itself is not that large -- a db_dump of the sql file as
> text is only about 110MB. I haven't checked the exact size of the actual
> data base, but the entire data directory is smaller than the available
> memory at about 385MB including logs and config files. This is a single
> database with a relatively small number of client connections (50 or so)
> making a fair number of smaller queries. This is not a massive data
> effort by any means at this time, but it will be growing.
>
> We have available currently ~4GB (8GB total) for Postgres. We will be
> moving to a server that will have about 24GB (32GB total) available for
> the database, with the current server becoming a hot backup, probably
> with slony or something similar to keep the databases in sync.
>
> I've been monitoring the memory usage of postgres on the current system
> and it seems like none of the threads ever allocate more than about
> 400MB total and about 80-90MB shared memory. It seems to me that since
> we have a very large chunk of memory relative to the database size we
> should be loading the entire database into memory. How can we be sure
> we're getting the most out of the memory we're allocating to postgres?
> What can we do to improve the memory usage, looking for performance
> first and foremost, on both the larger and smaller systems?
>
> Here's the salient config items for the 8GB system:
>
> max_connections = 200# realistically we expect 50-150 open
> shared_buffers = 38000
> sort_mem = 1048576
> work_mem = 32000
> maintenance_work_mem = 32000
> max_fsm_pages = 480001# probably too large for the max_fsm_*
> max_fsm_relations = 2# items; one Db with ~400 tables.
> effective_cache_size = 212016# ~2GB, could probably double this
>
>
> Thanks,
> J
> --
> Joshua J. Fielek
> Sr. Software Engineer
> Concursive Corporation
> 223 East City Hall Ave., Suite 212
> Norfolk, VA 23510
> Phone  : (757) 627-3002x6656
> Mobile : (757) 754-4462
> Fax: (757) 627-8773
> Email  : [EMAIL PROTECTED]
> http://www.concursive.com
>
> ---(end of broadcast)---
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] What is the best way to storage music files in Postgresql

2008-03-15 Thread Rich
I am going to embarkon building a music library using apache,
postgresql and php.  What is the best way to store the music files?
Which file type should I use?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-17 Thread Rich
On Mon, Mar 17, 2008 at 2:01 PM, Peter Koczan <[EMAIL PROTECTED]> wrote:
> >  > I am going to embarkon building a music library using apache,
>  >  > postgresql and php.  What is the best way to store the music files?
>  >
>  >  Your options are either to use a BLOB within the database or to store
>  >  paths to normal files in the file system in the database. I suspect
>  >  using normal files will make backup and management a great deal easier
>  >  than using in-database BLOBs, so personally I'd do it that way.
>
>  I discussed something like this with some co-workers recently, and
>  here's what I had to say. Not all of these apply to the original
>  message, but they are things to consider when marrying a database to a
>  file storage system.
>
>  Storing the files in the database as BLOBs:
>  Pros:
>  - The files can always be seen by the database system as long as it's
>  up (there's no dependence on an external file system).
>  - There is one set of locking mechanisms, meaning that the file
>  operations can be atomic with the database operations.
>  - There is one set of permissions to deal with.
>  Cons:
>  - There is almost no way to access files outside of the database. If
>  the database goes down, you are screwed.
>  - If you don't make good use of tablespaces and put blobs on a
>  separate disk system, the disk could thrash going between data and
>  blobs, affecting performance.
>  - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've 
> read).
>
>  Storing files externally, storing pathnames in the database:
>  Pros:
>  - You can access and manage files from outside the database and
>  possibly using different interfaces.
>  - There's a lot less to store directly in the database.
>  - You can use existing file-system permissions, mechanisms, and limits.
>  Cons:
>  - You are dealing with two storage systems and two different locking
>  systems which are unlikely to play nice with each other. Transactions
>  are not guaranteed to be atomic (e.g. a database rollback will not
>  rollback a file system operation, a commit will not guarantee that
>  data in a file will stay).
>  - The file system has to be seen by the database system and any remote
>  clients that wish to use your application, meaning that a networked FS
>  is likely to be used (depending on how many clients you have and how
>  you like to separate services), with all the fun that comes from
>  administering one of those. Note that this one in particular really
>  only applies to enterprise-level installations, not smaller
>  installations like the original poster's.
>  - If you don't put files on a separate disk-system or networked FS,
>  you can get poor performance from the disk thrashing between the
>  database and the files.
>
>  There are a couple main points:
>  1. The favorite answer in computing, "it depends", applies here. What
>  you decide depends on your storage system, your service and
>  installation policies, and how important fully atomic transactions are
>  to you.
>  2. If you want optimal performance out of either of these basic
>  models, you should make proper use of separate disk systems. I have no
>  idea which one is faster (it depends, I'm sure) nor do I have much of
>  an idea of how to benchmark this properly.
>
>  Peter
>  It seems to me as such a database gets larger, it will become much harder to 
> manage with the 2 systems.  I am talking mostly about music.  So each song 
> should not get too large.  I have read alot on this list and on other 
> resources and there seems to be leanings toward 1+0 raids for storage.  It 
> seems to the most flexible when it comes to speed, redundancy and recovery 
> time.  I do want my database to be fully atomic.  I think that is important 
> as this database grows.  Are my assumptions wrong?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-17 Thread Rich
On Sun, Mar 16, 2008 at 2:25 AM, Craig Ringer
<[EMAIL PROTECTED]> wrote:
> Rich wrote:
>  > I am going to embarkon building a music library using apache,
>  > postgresql and php.  What is the best way to store the music files?
>
>  Your options are either to use a BLOB within the database or to store
>  paths to normal files in the file system in the database. I suspect
>  using normal files will make backup and management a great deal easier
>  than using in-database BLOBs, so personally I'd do it that way.
>
>  Storing the audio files in the database does make it easier to keep the
>  database and file system backups in sync, but I'm not really sure that's
>  worth the costs.
What costs are to speaking of?
>
>  I'm sure that what you're doing has been done many times before, though,
>  so even if you're not going to use one of the existing options you might
>  at least  want to have a look at how they work.
>
>
>  > Which file type should I use?
>
>  I'm not sure I understand this question. Are you asking which audio
>  compression codec and audio container file type (like "mp3", "aac", etc)
>   you should use? If so, this is really not the right place to ask that.
>
>  Do you mean which file /system/ ?
>
>  --
>  Craig Ringer
>
>

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] NOT EXISTS or LEFT JOIN which one is better?

2012-04-29 Thread Rich
Al I have looked at this before and I am not sure the effort is worth all
the thought about it.  Let your explain tell you which is better.  I read
this link a year ago.
http://stackoverflow.com/questions/227037/can-i-get-better-performance-using-a-join-or-using-exists

On Sun, Apr 29, 2012 at 5:27 AM, AI Rumman  wrote:

> I can write a query to solve my requirement in any of the followings :-
>
> 1.
> select *
> from a
> where NOT EXISTS
> (
> select 1
> from b
> where a.id = b.id)
> union all
> select *
> from b
>
>
> 2.
> select
> (
> case when b.id is not null then
>b.id
>else
>a.id
> ) as id
> from a
> left join b
>   on a.id = b.id
>
> Any one please tell me which one is better?
>


[PERFORM] From: Rich

2014-08-25 Thread Rich
Hi pgsql


http://activebillion.com/bring.php?fzuvceubqu3101hcvfvcq





Rich


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Outer join query plans and performance

2005-10-25 Thread Rich Doughty

I tried on pgsql-general but got no reply. re-posting here as it's
probably the best place to ask

I'm having some significant performance problems with left join. Can
anyone give me any pointers as to why the following 2 query plans are so
different?


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h1.histdate = 'now';


 Nested Loop Left Join  (cost=0.00..68778.43 rows=2215 width=1402)
   ->  Nested Loop  (cost=0.00..55505.62 rows=2215 width=714)
 ->  Index Scan using idx_tokenhist__histdate on ta_tokenhist h1  
(cost=0.00..22970.70 rows=5752 width=688)
   Index Cond: (histdate = '2005-10-24 13:28:38.411844'::timestamp 
without time zone)
 ->  Index Scan using ta_tokens_pkey on ta_tokens t  (cost=0.00..5.64 
rows=1 width=26)
   Index Cond: ((t.token_id)::integer = ("outer".token_id)::integer)
   ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist h2  
(cost=0.00..5.98 rows=1 width=688)
 Index Cond: (("outer".token_id)::integer = (h2.token_id)::integer)


Performance is fine for this one and the plan is pretty much as i'd
expect.

This is where i hit a problem.


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h2.histdate = 'now';


 Hash Join  (cost=1249148.59..9000709.22 rows=2215 width=1402)
   Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
   ->  Hash Left Join  (cost=1225660.51..8181263.40 rows=4045106 width=714)
 Hash Cond: (("outer".token_id)::integer = ("inner".token_id)::integer)
 ->  Seq Scan on ta_tokens t  (cost=0.00..71828.06 rows=4045106 
width=26)
 ->  Hash  (cost=281243.21..281243.21 rows=10504921 width=688)
   ->  Seq Scan on ta_tokenhist h1  (cost=0.00..281243.21 
rows=10504921 width=688)
   ->  Hash  (cost=22970.70..22970.70 rows=5752 width=688)
 ->  Index Scan using idx_tokenhist__histdate on ta_tokenhist h2  
(cost=0.00..22970.70 rows=5752 width=688)
   Index Cond: (histdate = '2005-10-24 13:34:51.371905'::timestamp 
without time zone)


I would understand if h2 was joined on h1, but it isn't. It only joins
on t. can anyone give any tips on improving the performance of the second
query (aside from changing the join order manually)?


select version();
   version
--
 PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 20050821 
(prerelease) (Debian 4.0.1-6)


Thanks

--

  - Rich Doughty

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Outer join query plans and performance

2005-10-25 Thread Rich Doughty

Tom Lane wrote:

Rich Doughty <[EMAIL PROTECTED]> writes:


EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h1.histdate = 'now';




EXPLAIN SELECT *
FROM
tokens.ta_tokens  t  LEFT JOIN
tokens.ta_tokenhist   h1 ON t.token_id = h1.token_id LEFT JOIN
tokens.ta_tokenhist   h2 ON t.token_id = h2.token_id
WHERE
h2.histdate = 'now';



The reason these are different is that the second case constrains only
the last-to-be-joined table, so the full cartesian product of t and h1
has to be formed.  If this wasn't what you had in mind, you might be
able to rearrange the order of the LEFT JOINs, but bear in mind that
in general, changing outer-join ordering changes the results.  (This
is why the planner won't fix it for you.)


FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries
in approximately 3 seconds. postgres does the first in 6 seconds and the
second in a lot longer (eventually abandoned).


--

  - Rich Doughty

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Outer join query plans and performance

2005-10-26 Thread Rich Doughty

Tom Lane wrote:

Rich Doughty <[EMAIL PROTECTED]> writes:


Tom Lane wrote:


The reason these are different is that the second case constrains only
the last-to-be-joined table, so the full cartesian product of t and h1
has to be formed.  If this wasn't what you had in mind, you might be
able to rearrange the order of the LEFT JOINs, but bear in mind that
in general, changing outer-join ordering changes the results.  (This
is why the planner won't fix it for you.)




FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries
in approximately 3 seconds.



Does mysql get the correct answer, though?  It's hard to see how they do
this fast unless they (a) are playing fast and loose with the semantics,
or (b) have very substantially more analysis logic for OUTER JOIN semantics
than we do.  Perhaps mysql 5.x is better about this sort of thing, but
for 4.x I'd definitely find theory (a) more plausible than (b).


i would assume so. i'll re-run my testcase later and verify the results of the
two side-by-side.


The cases that would be interesting are those where rearranging the
outer join order actually does change the correct answer --- it may not
in this particular case, I haven't thought hard about it.  It seems
fairly likely to me that they are rearranging the join order here, and
I'm just wondering whether they have the logic needed to verify that
such a transformation is correct.

regards, tom lane




--

  - Rich Doughty

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Strange query plan invloving a view

2005-11-16 Thread Rich Doughty
x27;sold'::text)
   ->  Index Scan using fkx_tokenhist__tokens on ta_tokenhist r  
(cost=0.00..17.96 rows=1 width=246)
 Index Cond: (("outer".token_id)::integer = (r.token_id)::integer)
 Filter: ((status)::text = 'redeemed'::text)

This query returns a lot quicker than the plan would suggest, as the
planner is over-estimating the amount of rows where
((sarreport_id)::integer = 9). it thinks there are 53430 when in fact
there are only 7 (despite a vacuum and analyse).

Can anyone give me any suggestions? are the index stats the cause of
my problem, or is it the rewrite of the query?

Cheers


Version: PostgreSQL 8.0.3 on i486-pc-linux-gnu, compiled by GCC cc (GCC) 4.0.2 
20050821 (prerelease) (Debian 4.0.1-6)


--

  - Rich Doughty

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Strange query plan invloving a view

2005-11-22 Thread Rich Doughty

Tom Lane wrote:

Rich Doughty <[EMAIL PROTECTED]> writes:


However, the following query (which i believe should be equivalent)




SELECT *
FROM
tokens.ta_tokenhist   h INNER JOIN
tokens.ta_tokens  t ON h.token_id = t.token_id LEFT JOIN
tokens.ta_tokenhist   i ON t.token_id = i.token_id AND
   i.status   = 'issued'   LEFT JOIN
tokens.ta_tokenhist   s ON t.token_id = s.token_id AND
   s.status   = 'sold' LEFT JOIN
tokens.ta_tokenhist   r ON t.token_id = r.token_id AND
   r.status   = 'redeemed'
WHERE
h.sarreport_id = 9
;



No, that's not equivalent at all, because the implicit parenthesization
is left-to-right; therefore you've injected the constraint to a few rows
of ta_tokenhist (and therefore only a few rows of ta_tokens) into the
bottom of the LEFT JOIN stack.  In the other case the constraint is at
the wrong end of the join stack, and so the full view output gets formed
before anything gets thrown away.

Some day the Postgres planner will probably be smart enough to rearrange
the join order despite the presence of outer joins ... but today is not
that day.


thanks for the reply.

is there any way i can achieve what i need to by using views, or should i
just use a normal query? i'd prefer to use a view but i just can't get round
the performance hit.

--

  - Rich Doughty

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] view of view

2005-12-08 Thread Rich Doughty

Keith Worthington wrote:

Hi All,

I am working on an application that uses PostgreSQL.  One of the 
functions of the application is to generate reports.  In order to keep 
the code in the application simple we create a view of the required data 
in the database and then simply execute a SELECT * FROM 
view_of_the_data;  All of the manipulation and most of the time even the 
ordering is handled in the view.


My question is how much if any performance degradation is there in 
creating a view of a view?


IOW if I have a view that ties together a couple of tables and 
manipulates some data what will perform better; a view that filters, 
manipulates, and orders the data from the first view or a view that 
performs all the necessary calculations on the original tables?


from personal experience, if the inner views contain outer joins performance
isn't that great.

--

  - Rich Doughty

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Optimizing timestamp queries? Inefficient Overlaps?

2006-12-17 Thread Adam Rich

I have a table similar to this:

CREATE TABLE event_resources (
event_resource_id serial NOT NULL,
event_id integer NOT NULL,
resource_id integer NOT NULL,
start_date timestamptz NOT NULL,
end_date timestamptz NOT NULL,
CONSTRAINT event_resources_pkey PRIMARY KEY (event_resource_id)
);

Where the same resource can be added to an event multiple times.  
Since the table spans a few years, any day queried should
return at most 0.1% of the table, and seems perfect for indexes.   So I
add these:

CREATE INDEX er_idx1 ON event_resources (start_date);
CREATE INDEX er_idx2 ON event_resources (end_date);

One query I need to perform is "All event resources that start or end
on a particular day".   The first thing that comes to mind is this:

select *
from event_resources er
where er.start_date::date = $1::date  or er.end_date::date = $1::date

This is very slow.  Pg chooses a sequential scan.   (I am running vacuum
and analyze)  Shouldn't Pg be able to use an index here?

I've tried creating function indexes using cast, but Pg returns this
error message:

ERROR: functions in index expression must be marked IMMUTABLE

Which I assume is related to timezones and daylight saving issues in
converting
a timestamptz into a plain date.

This form strangely won't use an index either:

select *
from event_resources er
where (er.start_date, er.end_date) overlaps ($1::date, $1::date+1)

This is the only query form I've found that will use an index:

select *
from event_resources er
where (er.start_date >= $1::date and er.start_date < ($1::date+1))
or (er.end_date >= $1::date and er.end_date < ($1::date+1))

I know it's not exactly the same as the overlaps method, but since this
works
I would expect OVERLAPS to work as well.  I prefer overlaps because it's
clean
and simple, self documenting.

Another (similar) query I need to perform is "All event resources that
overlap a given
time range".  Seems tailor-made for OVERLAPS:

select *
from event_resources er
where (er.start_date, er.end_date) overlaps ($1::timestamptz,
$2::timestamptz)

Again. can't get this to use an index.  I have to use this again:

select *
from event_resources er
where (er.start_date >= $1::timestamptz and er.start_date <
$2::timestamptz)
or (er.end_date >= $1::timestamptz and er.end_date < $2::timestamptz)

What am I doing wrong?   This is Pg 8.1.2  on RHEL 4.






























Re: [PERFORM] Postgresql Configutation and overflow

2006-12-28 Thread Adam Rich
What are your table sizes?  What are your queries like?  (Mostly read,
mostly write?)
Can you post the "analyze" output for some of the slow queries?  
 
The three things that stand out for me is your disk configuration (RAID
5 is not ideal for databases,
you really want RAID 1 or 1+0) and also that you have enable_seqscan set
to off.   I would leave
that turned on.Lastly, your effective_cache_size looks low.  Your OS
is probably caching more
than 512 MB, I know mine is usually 1-2 GB and I don't have 12 GB of ram
available.
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of fabrix
peñuelas
Sent: Thursday, December 28, 2006 7:58 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Postgresql Configutation and overflow


Good day,

I have been reading about the configuration of postgresql, but I have a
server who does not give me the performance that should. The tables are
indexed and made vacuum regularly, i monitor with top, ps and
pg_stat_activity and when i checked was slow without a heavy load
overage. 

Before, the server reached 2000 connections to postgresql (with
max_connections=3000 in it for future workflow). 

I divided the load with another server for better performance, and now
reach 500 connections, but yet is overflow. 


My question is about how much memory should i configure in
shared_buffers and effective_cache_size. 

Features:

- 4 Processsors Intel Xeon Dual 3.0Ghz
- 12 GB RAM
- 2 discos en RAID 1 for OS 
- 4 discs RAID 5 for DB
- S.O Slackware 11.0 Linux 2.6.17.7
- Postgres 8.1.4


=In internet i found this:

Tuning PostgreSQL for performance
2 Some basic parameters 
2.1 Shared buffers

#  Start at 4MB (512) for a workstation
# Medium size data set and 256-512MB available RAM: 16-32MB (2048-4096)
# Large dataset and lots of available RAM (1-4GB): 64-256MB (8192-32768)
==


My postgresql.conf configuration is:

#---

# FILE LOCATIONS
#---
 

# The default values of these variables are driven from the -D command
line
# switch or PGDATA environment variable, represented here as ConfigDir.

#data_directory = 'ConfigDir'# use data in another directory 
#hba_file = 'ConfigDir/pg_hba.conf'# host-based authentication file
#ident_file = 'ConfigDir/pg_ident.conf'# IDENT configuration file

# If external_pid_file is not explicitly set, no extra pid file is
written. 
#external_pid_file = '(none)'# write an extra pid file


#---

# CONNECTIONS AND AUTHENTICATION
#---
 

# - Connection Settings -

listen_addresses = '*'# what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all 
port = 5432
max_connections = 3000
# note: increasing max_connections costs ~400 bytes of shared memory per
# connection slot, plus lock space (see max_locks_per_transaction).  You
# might also need to raise shared_buffers to support more connections. 
#superuser_reserved_connections = 2
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777# octal
#bonjour_name = ''# defaults to the computer name 



#---

# RESOURCE USAGE (except WAL)
#---


# - Memory - 


shared_buffers = 81920# min 16 or max_connections*2, 8KB
each
temp_buffers = 5000# min 100, 8KB each
max_prepared_transactions = 1000# can be 0 or more

# note: increasing max_prepared_transactions costs ~600 bytes of shared
memory 

# per transaction slot, plus lock space (see max_locks_per_transaction).
work_mem = 10240# min 64, size in KB
maintenance_work_mem = 253952# min 1024, size in KB
max_stack_depth = 4096# min 100, size in KB 

# - Free Space Map -

#max_fsm_pages = 2# min max_fsm_relations*16, 6 bytes
each
#max_fsm_relations = 1000# min 100, ~70 bytes each

# - Kernel Resource Usage -

#max_files_per_process = 1000# min 25 
#preload_libraries = ''

# - Cost-Based Vacuum Delay -

#vacuum_cost_delay = 0# 0-1000 milliseconds
#vacuum_cost_page_hit = 1# 0-1 credits
#vacuum_cost_page_miss = 10# 0-1 credits 
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits

# - Background writer -

#bgwriter_delay = 200# 10-1 milliseconds between rounds
#bgwriter_lru_percent = 1.0# 0-100% of LRU buffers scanned/round
#bgwriter_lru_maxpages = 5# 0-1000 buffers max written/round
#bgwriter_all_percent = 0.333# 0

Re: [PERFORM] Trivial function query optimized badly

2007-01-03 Thread Adam Rich

Craig,
What version of postgres are you using?  I just tested this on PG 8.1.2
and was unable to reproduce these results.  I wrote a simple function
that returns the same text passed to it, after sleeping for 1 second.
I use it in a where clause, like your example below, and regardless of
the number of rows in the table, it still takes roughly 1 second,
indicating to me the function is only called once.

Is it possible that your function really isn't immutable? Would PG 
realize this and fall back to treating it as VOLATILE ?



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Craig A.
James
Sent: Wednesday, January 03, 2007 9:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Trivial function query optimized badly


Well, once again I'm hosed because there's no way to tell the optimizer
the cost for a user-defined function.  I know this issue has already
been raised (by me!) several times, but I have to remind everyone about
this.  I frequently must rewrite my SQL to work around this problem.

Here is the function definition:

  CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
  AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
  LANGUAGE 'C' STRICT IMMUTABLE;

Here is the bad optimization:

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O',
1);
   QUERY PLAN



 Seq Scan on version  (cost=0.00..23.41 rows=1 width=4) (actual
time=1434.281..1540.253 rows=1 loops=1)
   Filter: (isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O'::text, 1))
 Total runtime: 1540.347 ms
(3 rows)

I've had to break it up into two separate queries.  Ironically, for
large databases, Postgres does the right thing -- it computes the
function, then uses the index on the "isosmiles" column.  It's blazingly
fast and very satisfactory.  But for small databases, it apparently
decides to recompute the function once per row, making the query N times
slower (N = number of rows) than it should be!

In this instance, there are 1000 rows, and factor of 10^4 is a pretty
dramatic slowdown...  To make it work, I had to call the function
separately then use its result to do the select.


db=> explain analyze select cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);
 QUERY PLAN



 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=1.692..1.694
rows=1 loops=1)
 Total runtime: 1.720 ms
(2 rows)

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O';
 QUERY PLAN


-
 Index Scan using i_version_isosmiles on version  (cost=0.00..5.80
rows=1 width=4) (actual time=0.114..0.117 rows=1 loops=1)
   Index Cond: (isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'::text)
 Total runtime: 0.158 ms
(3 rows)

Craig


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow Query on Postgres 8.2

2007-01-04 Thread Adam Rich
Dave,
Is it me or are the two examples you attached returning different row
counts?  
That means either the source data is different, or your queries are.
 
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave
Dutcher
Sent: Thursday, January 04, 2007 5:32 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Slow Query on Postgres 8.2



Hello,
 
I am looking at upgrading from 8.1.2 to 8.2.0, and I've found a query
which runs a lot slower.  Here is the query:
 
select type, currency_id, instrument_id, sum(amount) as total_amount
from om_transaction 
where 
strategy_id in
('BASKET1','BASKET2','BASKET3','BASKET4','BASKET5','BASKET6','BASKET7','
BASKET8','BASKET9','BASKET10','BASKET11')
and owner_trader_id in ('dave','sam','bob','tad',
'tim','harry','frank','bart','lisa','homer','marge','maggie','apu','milh
ouse','disco stu')
and cf_account_id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,29)
and as_of_date > '2006-12-04' and as_of_date <= '2006-12-05' 
group by type, currency_id, instrument_id;

I changed the values in the in statements to fake ones, but it still
takes over three seconds on 8.2, where 8.1 only takes 26 milliseconds.
When I increase the number of valules in the IN clauses, the query
rapidly gets worse.  I tried increasing my stats target to 1000 and
analyzing, but that didn't help so I put that back to 10.  While the
query is running the CPU is at 100%.  Is there a more efficient way to
write a query like this?  I've attached the output from EXPLAIN ANALYZE
in a file because it is somewhat large.
 
Thanks,
 

Dave Dutcher
Telluride Asset Management
952.653.6411

 

 



Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

I'm using 8.2 and using order by & limit is still faster than MAX()
even though MAX() now seems to rewrite to an almost identical plan
internally.

Count(*) still seems to use a full table scan rather than an index scan.

Using one of our tables, MySQL/Oracle/MS-SQL all return instantly while
PG takes longer ther 700ms.  Luckily we can design around this issue.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Craig A.
James
Sent: Sunday, January 07, 2007 5:57 PM
To: Guy Rouillier; PostgreSQL Performance
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


Craig A. James wrote:
> The "idiom" to replace count() was 
> "select col from tbl order by col desc limit 1".  It worked miracles
for 
> my app.

Sorry, I meant to write, "the idiom to replace MAX()", not count()...
MAX() was the function that was killing me, 'tho count() also gave me
problems.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

Here's the queries and explains... Granted, it's not a huge difference
here,
but both timings are absolutely consistent.  Using max(), this runs
almost 
15 queries/sec and "limit 1" runs at almost 40 queries/sec.  

Is the differene in explain analyze expected behavior? (rows=168196 vs.
rows=1)
(The table is freshly analayzed)



select max(item_id) from receipt_items

Result  (cost=0.04..0.05 rows=1 width=0) (actual time=0.030..0.031
rows=1 loops=1)
InitPlan
->  Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.023..0.024
rows=1 loops=1)
->  Index Scan Backward using receipt_items_pkey on receipt_items
(cost=0.00..6883.71 rows=168196 width=4) (actual time=0.020..0.020
rows=1 loops=1)
Filter: (item_id IS NOT NULL)
Total runtime: 0.067 ms


select item_id 
from receipt_items
order by item_id desc
limit 1

Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=0.010..0.011 rows=1
loops=1)
->  Index Scan Backward using receipt_items_pkey on receipt_items
(cost=0.00..6883.71 rows=168196 width=4) (actual time=0.008..0.008
rows=1 loops=1)
Total runtime: 0.026 ms


A couple more similar examples from this table:



select max(create_date) from receipt_items

Result  (cost=0.05..0.06 rows=1 width=0) (actual time=0.032..0.032
rows=1 loops=1)
InitPlan
->  Limit  (cost=0.00..0.05 rows=1 width=8) (actual time=0.025..0.026
rows=1 loops=1)
->  Index Scan Backward using test_idx_1 on receipt_items
(cost=0.00..7986.82 rows=168196 width=8) (actual time=0.022..0.022
rows=1 loops=1)
Filter: (create_date IS NOT NULL)
Total runtime: 0.069 ms


select create_date
from receipt_items
order by create_date desc
limit 1;

Limit  (cost=0.00..0.05 rows=1 width=8) (actual time=0.011..0.012 rows=1
loops=1)
->  Index Scan Backward using test_idx_1 on receipt_items
(cost=0.00..7986.82 rows=168196 width=8) (actual time=0.009..0.009
rows=1 loops=1)
Total runtime: 0.027 ms






-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 8:48 PM
To: Adam Rich
Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS 


"Adam Rich" <[EMAIL PROTECTED]> writes:
> I'm using 8.2 and using order by & limit is still faster than MAX()
> even though MAX() now seems to rewrite to an almost identical plan
> internally.

Care to quantify that?  AFAICT any difference is within measurement
noise, at least for the case of separately-issued SQL commands.

> Count(*) still seems to use a full table scan rather than an index
scan.

Yup.  Don't hold your breath for something different.  Postgres has made
design choices that make certain cases fast and others slow, and
count(*) is one case that has come out on the short end of the stick.
If that's your most important measure of performance, then indeed you
should select a different database that's made different tradeoffs.

regards, tom lane


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-07 Thread Adam Rich

Here's another, more drastic example... Here the order by / limit
version
runs in less than 1/7000 the time of the MAX() version.


select max(item_id)
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id

Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
time=816.382..816.383 rows=1 loops=1)
  ->  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
time=155.177..675.870 rows=147383 loops=1)
Hash Cond: (ri.receipt_id = r.receipt_id)
->  Seq Scan on receipt_items ri  (cost=0.00..4097.56
rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
->  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
time=155.146..155.146 rows=24571 loops=1)
  ->  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
(actual time=34.803..126.452 rows=24571 loops=1)
Hash Cond: (r.event_id = e.event_id)
->  Seq Scan on receipts r  (cost=0.00..663.58
rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
->  Hash  (cost=469.73..469.73 rows=14843 width=4)
(actual time=34.780..34.780 rows=14843 loops=1)
  ->  Seq Scan on events e  (cost=0.00..469.73
rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
Total runtime: 816.645 ms

select item_id
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
order by item_id desc limit 1


Limit  (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1
loops=1)
  ->  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
time=0.044..0.044 rows=1 loops=1)
->  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
(actual time=0.032..0.032 rows=1 loops=1)
  ->  Index Scan Backward using receipt_items_pkey on
receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
time=0.016..0.016 rows=1 loops=1)
  ->  Index Scan using receipts_pkey on receipts r
(cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
loops=1)
Index Cond: (r.receipt_id = ri.receipt_id)
->  Index Scan using events_pkey on events e  (cost=0.00..0.04
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
  Index Cond: (e.event_id = r.event_id)
Total runtime: 0.112 ms





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
Drake
Sent: Sunday, January 07, 2007 9:10 PM
To: Adam Rich
Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
> I'm using 8.2 and using order by & limit is still faster than MAX()
> even though MAX() now seems to rewrite to an almost identical plan
> internally.


Gonna need you to back that up :) Can we get an explain analyze?


> Count(*) still seems to use a full table scan rather than an index
scan.
> 

There is a TODO out there to help this. Don't know if it will get done.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] group by will not use an index?

2007-01-09 Thread Adam Rich
That query looks strange to me (a group by without an aggregate).  See
if this is
any faster:
 
SELECT DISTINCT DATE(inserted) FROM Messages
 
I won't hold my breath though, I don't think there's any way around the
full table scan
in Postgres, because the index does not contain enough information about
transactional
state, so table access is always required (unlike virtually every other
type of db)
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of tsuraan
Sent: Tuesday, January 09, 2007 5:06 PM
To: pgsql-performance
Subject: [PERFORM] group by will not use an index?


I have a table of messages with paths and inserted dates (among other
things), like so:

CREATE TABLE Messages (
msgkey BIGSERIAL PRIMARY KEY,
path TEXT NOT NULL,
inserted TIMESTAMP WITHOUT TIMEZONE DEFAULT NOW() 
);

I run a query to determine which days actually saw emails come in, like
so:

SELECT DATE(inserted) FROM Messages GROUP BY DATE(inserted);

That's obviously not very efficient, so I made an index: 

CREATE INDEX messages_date_inserted_ind ON Messages(DATE(inserted));

However, GROUP BY does not use this index:

=# explain analyze select date(inserted) from messages group by
date(inserted);
  QUERY PLAN


--
 HashAggregate  (cost=104773.10..104789.51 rows=1313 width=8) (actual
time=31269.476..31269.557 rows=44 loops=1)
   ->  Seq Scan on messages  (cost=0.00..101107.25 rows=1466340 width=8)
(actual time=23.923..25248.400 rows=1467036 loops=1)
 Total runtime: 31269.735 ms
(3 rows)


Is it possible to get pg to use an index in a group by?  I don't see why
it wouldn't be possible, but maybe I'm missing something. 

Using pg 8.1.4...




Re: [PERFORM] Partitioning

2007-01-11 Thread Adam Rich

Each partition can have its own disk, without using subpartitions.

CREATE TABLE th (id INT, name VARCHAR(30), adate DATE)
PARTITION BY LIST(YEAR(adate))
(
  PARTITION p1999 VALUES IN (1995, 1999, 2003)
DATA DIRECTORY = '/var/appdata/95/data'
INDEX DIRECTORY = '/var/appdata/95/idx',
  PARTITION p2000 VALUES IN (1996, 2000, 2004)
DATA DIRECTORY = '/var/appdata/96/data'
INDEX DIRECTORY = '/var/appdata/96/idx',
  PARTITION p2001 VALUES IN (1997, 2001, 2005)
DATA DIRECTORY = '/var/appdata/97/data'
INDEX DIRECTORY = '/var/appdata/97/idx',
  PARTITION p2000 VALUES IN (1998, 2002, 2006)
DATA DIRECTORY = '/var/appdata/98/data'
INDEX DIRECTORY = '/var/appdata/98/idx'
);

Subpartitions are just a way to break (parent) partitions up into 
smaller pieces.  Those of course can be moved to other disks 
just like the main partitions.



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bernd
Helmle
Sent: Thursday, January 11, 2007 6:51 AM
To: Scott Marlowe
Cc: Jim C. Nasby; Jeremy Haile; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning





On Wed, 10 Jan 2007 15:30:16 -0600, Scott Marlowe
<[EMAIL PROTECTED]> wrote:

[...]

> 
> And I don't think the mysql partition supports tablespaces either.
> 

MySQL supports distributing partitions over multiple disks via the
SUBPARTITION clause [1].
I leave it to you, wether their syntax is cleaner, more powerful or
easier or ;)


Bernd

[1]
http://dev.mysql.com/doc/refman/5.1/en/partitioning-subpartitions.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] max() versus order/limit (WAS: High update activity, PostgreSQL vs BigDBMS)

2007-01-14 Thread Adam Rich

Did anybody get a chance to look at this?  Is it expected behavior?
Everyone seemed so incredulous, I hoped maybe this exposed a bug
that would be fixed in a near release.


-Original Message-
From: Adam Rich [mailto:[EMAIL PROTECTED] 
Sent: Sunday, January 07, 2007 11:53 PM
To: 'Joshua D. Drake'; 'Tom Lane'
Cc: 'Craig A. James'; 'PostgreSQL Performance'
Subject: RE: [PERFORM] High update activity, PostgreSQL vs BigDBMS



Here's another, more drastic example... Here the order by / limit
version
runs in less than 1/7000 the time of the MAX() version.


select max(item_id)
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id

Aggregate  (cost=10850.84..10850.85 rows=1 width=4) (actual
time=816.382..816.383 rows=1 loops=1)
  ->  Hash Join  (cost=2072.12..10503.30 rows=139019 width=4) (actual
time=155.177..675.870 rows=147383 loops=1)
Hash Cond: (ri.receipt_id = r.receipt_id)
->  Seq Scan on receipt_items ri  (cost=0.00..4097.56
rows=168196 width=8) (actual time=0.009..176.894 rows=168196 loops=1)
->  Hash  (cost=2010.69..2010.69 rows=24571 width=4) (actual
time=155.146..155.146 rows=24571 loops=1)
  ->  Hash Join  (cost=506.84..2010.69 rows=24571 width=4)
(actual time=34.803..126.452 rows=24571 loops=1)
Hash Cond: (r.event_id = e.event_id)
->  Seq Scan on receipts r  (cost=0.00..663.58
rows=29728 width=8) (actual time=0.006..30.870 rows=29728 loops=1)
->  Hash  (cost=469.73..469.73 rows=14843 width=4)
(actual time=34.780..34.780 rows=14843 loops=1)
  ->  Seq Scan on events e  (cost=0.00..469.73
rows=14843 width=4) (actual time=0.007..17.603 rows=14843 loops=1)
Total runtime: 816.645 ms

select item_id
from events e, receipts r, receipt_items ri
where e.event_id=r.event_id and r.receipt_id=ri.receipt_id
order by item_id desc limit 1


Limit  (cost=0.00..0.16 rows=1 width=4) (actual time=0.047..0.048 rows=1
loops=1)
  ->  Nested Loop  (cost=0.00..22131.43 rows=139019 width=4) (actual
time=0.044..0.044 rows=1 loops=1)
->  Nested Loop  (cost=0.00..12987.42 rows=168196 width=8)
(actual time=0.032..0.032 rows=1 loops=1)
  ->  Index Scan Backward using receipt_items_pkey on
receipt_items ri  (cost=0.00..6885.50 rows=168196 width=8) (actual
time=0.016..0.016 rows=1 loops=1)
  ->  Index Scan using receipts_pkey on receipts r
(cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1
loops=1)
Index Cond: (r.receipt_id = ri.receipt_id)
->  Index Scan using events_pkey on events e  (cost=0.00..0.04
rows=1 width=4) (actual time=0.009..0.009 rows=1 loops=1)
  Index Cond: (e.event_id = r.event_id)
Total runtime: 0.112 ms





-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Joshua D.
Drake
Sent: Sunday, January 07, 2007 9:10 PM
To: Adam Rich
Cc: 'Craig A. James'; 'Guy Rouillier'; 'PostgreSQL Performance'
Subject: Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS


On Sun, 2007-01-07 at 20:26 -0600, Adam Rich wrote:
> I'm using 8.2 and using order by & limit is still faster than MAX()
> even though MAX() now seems to rewrite to an almost identical plan
> internally.


Gonna need you to back that up :) Can we get an explain analyze?


> Count(*) still seems to use a full table scan rather than an index
scan.
> 

There is a TODO out there to help this. Don't know if it will get done.

Joshua D. Drake

-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Adam Rich

Doesn't sound like you want postgres at all Try mysql.



-Original Message-
From: "Steve" <[EMAIL PROTECTED]>
To: pgsql-performance@postgresql.org
Sent: 1/17/2007 2:41 PM
Subject: [PERFORM] Configuration Advice

Hey there;

I've been lurking on this list awhile, and I've been working with postgres 
for a number of years so I'm not exactly new to this.  But I'm still 
having trouble getting a good balance of settings and I'd like to see what 
other people think.  We may also be willing to hire a contractor to help 
tackle this problem if anyone is interested.

I've got an application here that runs large (in terms of length -- the 
queries have a lot of conditions in them) queries that can potentially 
return millions of rows but on average probably return tens of thousands 
of rows.  It's read only for most of the day, and pretty much all the 
queries except one are really fast.

However, each night we load data from a legacy cobol system into the SQL 
system and then we summarize that data to make the reports faster.  This 
load process is intensely insert/update driven but also has a hefty 
amount of selects as well.  This load process is taking ever longer to 
complete.


SO ... our goal here is to make this load process take less time.  It 
seems the big part is building the big summary table; this big summary 
table is currently 9 million rows big.  Every night, we drop the table, 
re-create it, build the 9 million rows of data (we use COPY to put hte 
data in when it's prepared, not INSERT), and then build the indexes on it 
-- of which there are many.  Unfortunately this table gets queried 
in a lot of different ways and needs these indexes; also unfortunately, we 
have operator class indexes to support both ASC and DESC sorting on 
columns so these are for all intents and purposes duplicate but required 
under Postgres 8.1 (we've recently upgraded to Postgres 8.2, is this still 
a requirement?)

Building these indexes takes forever!  It's a long grind through inserts 
and then building the indexes takes a hefty amount of time too.  (about 9 
hours).  Now, the application is likely part at fault, and we're working 
to make it more efficient, but it has nothing to do with the index 
building time.  I'm wondering what we can do to make this better if 
anything; would it be better to leave the indexes on?  It doesn't seem to 
be.  Would it be better to use INSERTs instead of copies?  Doesn't seem to 
be.


Anyway -- ANYTHING we can do to make this go faster is appreciated :) 
Here's some vital statistics:

- Machine is a 16 GB, 4 actual CPU dual-core opteron system using SCSI 
discs.  The disc configuration seems to be a good one, it's the best of 
all the ones we've tested so far.

- The load process itself takes about 6 gigs of memory, the rest is free 
for postgres because this is basically all the machine does.

- If this was your machine and situation, how would you lay out the emmory 
settings?  What would you set the FSM to?  Would you leave teh bgwriter on 
or off?  We've already got FSYNC off because "data integrity" doesn't 
matter -- this stuff is religeously backed up and we've got no problem 
reinstalling it.  Besides, in order for this machine to go down, data 
integrity of the DB is the least of the worries :)

Do wal_buffers/full_page_writes matter of FSYNC is off?  If so, what 
settings?  What about checkpoints?

Any finally, any ideas on planner constants?  Here's what I'm using:

seq_page_cost = 0.5 # measured on an arbitrary scale
random_page_cost = 1.0  # same scale as above
cpu_tuple_cost = 0.001  # same scale as above
cpu_index_tuple_cost = 0.0001   # same scale as above
cpu_operator_cost = 0.00025 # same scale as above
effective_cache_size = 679006

I really don't remember how I came up with that effective_cache_size 
number


Anyway... any advice would be appreciated :)


Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Configuration Advice

2007-01-17 Thread Adam Rich

Sorry if this came off sounding trollish  All databases have their
strengths & weaknesses, and I feel the responsible thing to do is
exploit
those strengths where possible, rather than expend significant time and
effort coaxing one database to do something it wasn't designed to.
There's just no such thing as "one size fits all".

I have professional experience with MS-SQL, Oracle, MySQL, and Postgres.
and the scenario described sounds more ideal for MySQL & MyISAM than 
anything else:

1) No concerns for data loss (turning fsync & full_page_writes off)
since the data can be reloaded

2) No need for MVCC or transactions, since the database is read-only

3) No worries about lock contention

4) Complex queries that might take advantage of the MySQL "Query Cache"
since the base data never changes

5) Queries that might obtain data directly from indexes without having
to touch tables (again, no need for MVCC)

If loading in the base data and creating the summary table is taking 
a lot of time, using MySQL with MyISAM tables (and binary logging
disabled) should provide significant time savings, and it doesn't 
sound like there's any concerns for the downsides.  

Yes, postgresql holds an edge over MySQL for heavy OLTP applications,
I use it for that and I love it.  But for the scenario the original 
poster is asking about, MySQL/MyISAM is ideal.  




-Original Message-
From: Bricklen Anderson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, January 17, 2007 3:29 PM
To: Adam Rich
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Configuration Advice


Adam Rich wrote:
> Doesn't sound like you want postgres at all Try mysql.

Could you explain your reason for suggesting mysql? I'm simply curious 
why you would offer that as a solution.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] explain analyze output for review (was: optimizing a geo_distance()...)

2007-02-06 Thread Adam Rich

If I'm reading this correctly, 89% of the query time is spent
doing an index scan of earth_coords_idx.  Scanning pets is only
taking 6% of the total time.




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mark
Stosberg
Sent: Tuesday, February 06, 2007 8:40 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] explain analyze output for review (was:
optimizing a geo_distance()...)


Bruno Wolff III wrote:
>
> Some people here may be able to tell you more if you show us explain
> analyze output.

Here is my explain analyze output. Some brief context of what's going
on. The goal is to find "Pets Near You".

We join the pets table on the shelters table to get a zipcode, and then
join a shelters table with "earth_distance" to get the coordinates of
the zipcode.  (  Is there any significant penalty for using a varchar vs
an int for a joint? ).

I've been investigating partial indexes for the pets table. It has about
300,000 rows, but only about 10 are "active", and those are the ones we
are care about. Queries are also frequently made on males vs females,
dogs vs cats
or specific ages, and those specific cases seem like possible candidates
for partial indexes
as well. I played with that approach some, but had trouble coming up
with any thing that
benchmarked faster.

I'm reading the explain analyze output correctly myself, nearly all of
the time spent is related to the 'pets' table, but I can't see what to
about it.

Help appreciated!

  Mark

Nested Loop  (cost=11.82..29.90 rows=1 width=0) (actual
time=37.601..1910.787 rows=628 loops=1)
   ->  Nested Loop  (cost=6.68..20.73 rows=1 width=24) (actual
time=35.525..166.547 rows=1727 loops=1)
 ->  Bitmap Heap Scan on pets  (cost=6.68..14.71 rows=1 width=4)
(actual time=35.427..125.594 rows=1727 loops=1)
   Recheck Cond: (((sex)::text = 'f'::text) AND (species_id
= 1))
   Filter: ((pet_state)::text = 'available'::text)
   ->  BitmapAnd  (cost=6.68..6.68 rows=2 width=0) (actual
time=33.398..33.398 rows=0 loops=1)
 ->  Bitmap Index Scan on pets_sex_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=14.739..14.739
rows=35579 loops=1)
   Index Cond: ((sex)::text = 'f'::text)
 ->  Bitmap Index Scan on pet_species_id_idx
(cost=0.00..3.21 rows=347 width=0) (actual time=16.779..16.779
rows=48695 loops=1)
   Index Cond: (species_id = 1)
 ->  Index Scan using shelters_pkey on shelters
(cost=0.00..6.01 rows=1 width=28) (actual time=0.012..0.014 rows=1
loops=1727)
   Index Cond: ("outer".shelter_id = shelters.shelter_id)
   ->  Bitmap Heap Scan on earth_distance  (cost=5.14..9.15 rows=1
width=9) (actual time=0.984..0.984 rows=0 loops=1727)
 Recheck Cond: ((cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_distance.earth_coords)
AND
(("outer".postal_code_for_joining)::text =
(earth_distance.zipcode)::text))
 ->  BitmapAnd  (cost=5.14..5.14 rows=1 width=0) (actual
time=0.978..0.978 rows=0 loops=1727)
   ->  Bitmap Index Scan on earth_coords_idx
(cost=0.00..2.15 rows=42 width=0) (actual time=0.951..0.951 rows=1223
loops=1727)
 Index Cond: (cube_enlarge(('(-2512840.11676572,
4646218.19036629, 3574817.21369166)'::cube)::cube,
160930.130863421::double precision, 3) @ earth_coords)
   ->  Bitmap Index Scan on earth_distance_zipcode_idx
(cost=0.00..2.74 rows=212 width=0) (actual time=0.015..0.015 rows=1
loops=1727)
 Index Cond:
(("outer".postal_code_for_joining)::text =
(earth_distance.zipcode)::text)
 Total runtime: 1913.099 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Do I need to rebuild php-pgsql for 8.2.3

2007-04-11 Thread Adam Rich
Here's what I do... 
 
1) Install postgresql-libs from the RHEL source
2) Install compat-postgresql-libs from postgresql.org (install, not
upgrade, use rpm -hiv) use force if necessary
3) Install postgresq-libs from postgresql.org (again, install, not
upgrade, use rpm-hiv) use force if necessary
 
If done correctly, you'll end up with all 3 client versions:
 
/usr/lib/libpq.so.3
/usr/lib/libpq.so.4
/usr/lib/libpq.so.5
 
 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Michael
Dengler
Sent: Wednesday, April 11, 2007 12:25 PM
To: Devrim GÜNDÜZ
Cc: pgsql-performance; Guillaume Smet
Subject: Re: [PERFORM] Do I need to rebuild php-pgsql for 8.2.3


Hi,

Thanks for the info. One more thingI am in rpm hell. When I try to 
# rpm -Uvh postgresql-libs-8.2.3-1PGDG.i686.rpm
I get:
error: Failed dependencies:
libpq.so.3 is needed by (installed) perl-DBD-Pg-1.31-6.i386
libpq.so.3 is needed by (installed)
postgresql-python-7.4.13-2.RHEL4.1.i386
libpq.so.3 is needed by (installed) php-pgsql-4.3.9-3.15.i386
and when I try:
# rpm -ivh compat-postgresql-libs-3-3PGDG.i686.rpm 
I get:
error: Failed dependencies:
postgresql-libs < 8.0.2 conflicts with
compat-postgresql-libs-3-3PGDG.i686
gr...
should just force the upgrade (ie. --nodeps)?

Thanks

Mike



On 4/10/07, Devrim GÜNDÜZ <[EMAIL PROTECTED]> wrote: 

Hi,

On Tue, 2007-04-10 at 22:55 +0200, Guillaume Smet wrote:
> See http://developer.postgresql.org/~devrim/rpms/compat/ and choose
> the correct package for your architecture. 

... or better, each RHEL4 directory in our FTP site has compat package
(that directory is not up2date now).

Regards,
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support 
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/








[PERFORM] Java Out-of-memory errors on attempts to read tables with millionsof rows

2003-07-14 Thread Rich Cullingford
Greetings,
We have several tables (in a PG 7.3.3 database on RH Linux 7.3) with 2M+ 
rows (each row 300-400 bytes in length) that we SELECT into a JDBC 
ResultSet for display to the user. We expected that the driver would not 
actually transmit data from the database until the application began 
issuing getXXX() calls. (IIRC, this is the way the Oracle driver works, 
and we had created a buffering mechanism to use it.) Instead, the driver 
appears to be attempting to create the whole rowset in Java memory 
before returning, and the application runs out of memory. (Java has been 
configured to use up to 1.5G on the machine this occurs on.)

Now the SELECT is preceded by a COUNT of the rows that the same query 
would return, so perhaps that's what's causing the problem. But the 
question is, is this the way a ResultSet is supposed to work? Are there 
any configuration options available that modify this behavior? Are there 
commercial implementations of PG JDBC that don't have this problem? 
(Shame on me, but I have to ask. :)

Any help will be greatly appreciated!

    Rich Cullingford
[EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] Top n queries and GROUP BY

2003-11-17 Thread Rich Cullingford
All,
This is a straight SQL question, maybe not appropriate for a performance 
list, but...

I have a simple stock holdings setup:

=> select * from t1;
 nam |co | num
-+---+--
 joe | ibm   |  600
 abe | ibm   | 1500
 joe | cisco | 1200
 abe | cisco |  800
 joe | novell|  500
 joe | microsoft |  200
What I would like to see is a Top-n-holdings-by-name", e.g, for n=2:

 nam  |  co| num
--++-
 joe  | cisco  |  1200
 joe  | ibm|   600
 abe  | ibm|  1500
 abe  | cisco  |   800
I can get part of the way by using a LIMIT clause in a subquery, e.g,

=> select 'abe', a.co, a.num from (select co, num from t1 where 
nam='abe' order by num desc limit 2) as a;
 ?column? |  co   | num
--+---+--
 abe  | ibm   | 1500
 abe  | cisco |  800

but I can't figure out a correlated subquery (or GROUP BY arrangement or 
anything else) that will cycle through the names. I vaguely remember 
that these kinds or queries are hard to do in standard SQL, but I was 
hoping that PG, with its extensions...

  Thanks, Rich Cullingford
  [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Top n queries and GROUP BY

2003-11-17 Thread Rich Cullingford
Rich Cullingford wrote:
All,
This is a straight SQL question, maybe not appropriate for a performance 
list, but...

I have a simple stock holdings setup:

=> select * from t1;
 nam |co | num
-+---+--
 joe | ibm   |  600
 abe | ibm   | 1500
 joe | cisco | 1200
 abe | cisco |  800
 joe | novell|  500
 joe | microsoft |  200
What I would like to see is a Top-n-holdings-by-name", e.g, for n=2:

 nam  |  co| num
--++-
 joe  | cisco  |  1200
 joe  | ibm|   600
 abe  | ibm|  1500
 abe  | cisco  |   800
I can get part of the way by using a LIMIT clause in a subquery, e.g,

=> select 'abe', a.co, a.num from (select co, num from t1 where 
nam='abe' order by num desc limit 2) as a;
 ?column? |  co   | num
--+---+--
 abe  | ibm   | 1500
 abe  | cisco |  800

but I can't figure out a correlated subquery (or GROUP BY arrangement or 
anything else) that will cycle through the names. I vaguely remember 
that these kinds or queries are hard to do in standard SQL, but I was 
hoping that PG, with its extensions...
I forgot about row subqueries; for n=3, for example:

=> SELECT * FROM t1
 WHERE (nam,co,num) IN
   (SELECT nam,co,num FROM t1 b
where b.nam=t1.nam
order by num desc limit 3)
 order by nam, num desc;
 nam |   co   | num
-++--
 abe | ibm| 1500
 abe | cisco  |  800
 joe | cisco  | 1200
 joe | ibm|  600
 joe | novell |  500
(5 rows)
Seems to work...
   Thanks all, Rich Cullingford
   [EMAIL PROTECTED]
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]