Re: [PERFORM] immutable functions vs. join for lookups ?

2005-04-18 Thread Merlin Moncure
> d) self-join with a function ;)
>   EXPLAIN ANALYZE SELECT * FROM (SELECT n, id2username(n) AS username
> FROM (SELECT DISTINCT n FROM aaa) AS values) AS v_lookup RIGHT JOIN
> aaa USING (n);

That's pretty clever.  
It sure seems like the server was not caching the results of the
function...maybe the server thought it was to small a table to bother?  

Merlin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-21 Thread Merlin Moncure
> John A Meinel <[EMAIL PROTECTED]> writes:
> > Joel Fradkin wrote:
> >> Postgres was on the second run
> >> Total query runtime: 17109 ms.
> >> Data retrieval runtime: 72188 ms.
> >> 331640 rows retrieved.
> 
> > How were you measuring "data retrieval time"?
> 
> I suspect he's using pgadmin.  We've seen reports before suggesting
that
> pgadmin can be amazingly slow, eg here
> http://archives.postgresql.org/pgsql-performance/2004-10/msg00427.php
> where the *actual* data retrieval time as shown by EXPLAIN ANALYZE
> was under three seconds, but pgadmin claimed the query runtime was 22
> sec and data retrieval runtime was 72 sec.

The problem is that pgAdmin takes your query results and puts it in a
grid.  The grid is not designed to be used in that way for large
datasets. The time complexity is not linear and really breaks down
around 10k-100k rows depending on various factors.  pgAdmin users just
have to become used to it and use limit or the filter feature at
appropriate times.

The ms sql enterprise manager uses cursors which has its own set of
nasty issues (no mvcc).

In fairness, unless you are running with \a switch, psql adds a fair
amount of time to the query too.

Joel:
"Postgres was on the second run
Total query runtime: 17109 ms.
Data retrieval runtime: 72188 ms.
331640 rows retrieved."

The Data retrieval runtime is time spend by pgAdmin formatting, etc.
The query runtime is the actual timing figure you should be concerned
with (you are not comparing apples to apples).  I can send you a utility
I wrote in Delphi which adds only a few seconds overhead for 360k result
set.  Or, go into psql, throw \a switch, and run query.

or: 
psql -A -c "select * from myview where x" > output.txt

it should finish the above in 16-17 sec plus the time to write out the
file.

Joel, I have a lot of experience with all three databases you are
evaluating and you are making a huge mistake switching to mysql.  you
can make a decent case for ms sql, but it's quite expensive at your
level of play as you know.

Merlin




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-25 Thread Merlin Moncure
> In practice, we have watched Windows evolve in such a fashion with
> respect to multiuser support, and, in effect, it has never really
> gotten it.  Microsoft started by hacking something on top of MS-DOS,
> and by the time enough applications had enough dependancies on the way
> that worked, it has essentially become impossible for them to migrate
> properly to a multiuser model since applications are normally designed
> with the myopic "this is MY computer!" model of the world.

Completely false.  NT was a complete rewrite (1993ish) and was
inherently multi-user with even the GDI running as a user level process
(no longer however).  The NT kernel was scalable and portable, running
on the Alpha, MIPS, etc.

However, you do have a point with applications...many win32 developers
have a very bad habit about expecting their apps to install and run as
root.  However, this is generally not a problem with Microsoft stuff.
In short, the problem is really people, not the technology.

Merlin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Joel's Performance Issues WAS : Opteron vs Xeon

2005-04-25 Thread Merlin Moncure
> I am waiting to here back from Josh on using cursors and trying to
flatten
> long running views.
> 
> I am a little disappointed I have not understood enough to get my
analyzer
> to use the proper plan, we had to set seqscan off to get the select
from
> response_line to work fast and I had to turn off merge joins to get
assoc
> list to work fast. Once I am up I can try to learn more about it, I am
so
> glad there are so many folks here willing to take time to educate us
> newb's.

I am not a big fan of tweaking the optimizer because you are robbing
Peter to pay Paul, so to speak.  pg 8.1 may come out with new optimizer
tweaks and you'll have to do it all over again. If the optimizer is not
'getting' your view, there are a few different approaches to fixing the
problem.

I am also not a big fan of de-normalizing your database.  Essentially
you are lighting a fuse that may blow up later.  Here are some general
approaches to planner optimization that can help out in tricky
situations.

1. Split up views.  Often overlooked but can provide good enhancements.
If your view is based on 3 or more tables, has left/right joins,
consider breaking it up into two or more views.  Views can be based on
views and it is easier to force the planner to pick good plans this way.
If you can find other uses for component views in other queries, so much
the better.

2. Materialize your view.  Use lazy materialization, i.e. you query the
view into a table at scheduled times.  Now we are trading disk spaces
and coherence for performance...this may not fit your requirements but
the nice thing about it is that it will help give us the 'ideal plan'
running time which we are shooting for.

3. pl/pgsql.  Using combinations of loops, refcursors, and queries, you
can cut code that should give you comparable performance to the ideal
plan.  If you can do the actual work here as well (no data returned to
client), you get a tremendous win.  Also pl/pgsql works really well for
recursive sets and other things that are difficult to run in the context
of a single query.  Just be aware of the disadvantages:
a. not portable
b. maintenance overhead
c. require relatively high developer skill set

I will go out on a limb and say that mastering the above approaches can
provide the solution to virtually any performance problem within the
limits of your hardware and the problem complexity.

Based on your questions, it sounds to me like your #1 problem is your
developer skillset relative to your requirements.  However, this is
easily solvable...just keep attacking the problem and don't be afraid to
bring in outside help (which you've already done, that's a start!).

Merlin

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] ok you all win what is best opteron (I dont want a hosed system again)

2005-05-13 Thread Merlin Moncure
Joel wrote:
I have been following threads (in case you don't know I bought a 4 proc
Dell recently) and the Opteron seems the way to go.
I just called HP for a quote, but don't want to make any mistakes.
[snip]

At your level of play it's the DL585.
Have you checked out http://www.swt.com? 

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] could not send data to client:

2005-06-17 Thread Merlin Moncure
Justin wrote:
I have 6 Windows PC in a test environment accessing a very small Postgres DB on 
a 2003 Server.  The PC's access the database with a cobol app via ODBC.  3 of 
the PC's operate very efficiently and quickly.  3 of them do not.  The 3 that 
do not are all new Dell XP Pro with SP2.  They all produce the error in the log 
file as below:
 
2005-06-16 16:17:30 LOG:  could not send data to client: No connection could be 
made because the target machine actively refused it.
 
2005-06-16 16:17:30 LOG:  could not receive data from client: No connection 
could be made because the target machine actively refused it.
 
2005-06-16 16:17:30 LOG:  unexpected EOF on client connection

[...]

Have you tried other ODBC app (excel, etc) to connect to the database from the 
machines?

If so and it works,
1. what version odbc driver
2. what cobol compiler 
3. what technology to map cobol i/o to sql (Acu4GL for example)

This is probably more appropriate on pgsql-odbc and plain text is preferred for 
these mailing lists.

Merlin

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance - moving from oracle to postgresql

2005-06-27 Thread Merlin Moncure
> There are some immediate questions from our engineers about
performance
> 
> "- Oracle has one particular performance enhancement that Postgres is
> missing.  If you do a select that returns 100,000 rows in a given
order,
> and all you want are rows 99101 to 99200, then Oracle can do that very
> efficiently.  With Postgres, it has to read the first 99200 rows and
> then discard the first 99100.  But...  If we really want to look at
> performance, then we ought to put together a set of benchmarks of some
> typical tasks."

I agree with Rod: you are correct but this is a very odd objection.  You
are declaring a set but are only interested in a tiny subset of that
based on arbitrary critera.  You can do this with cursors or with clever
querying (not without materializing the full set however), but why?  

Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PERFORMANCE ISSUE ODBC x LIBPQ C++ Application

2005-06-27 Thread Merlin Moncure
> Hi !
> 
> My company is evaluating to compatibilizate our system (developed in
> C++) to PostgreSQL.
> 
> Our programmer made a lot of tests and he informed me that the
> performance using ODBC is very similar than using libpq, even with a
big
> number of simultaneous connections/queries. Of course that for us is
> simpler use ODBC because will be easier to maintan as we already
support
> a lot of other databases using ODBC (MySQL, DB2, etc).
> 
> Someone already had this experience? What are the key benefits using
> libpq insted of ODBC ?
> 
> Our application have a heavy load and around 150 concorrent users.

The ODBC driver for postgresql implements its own protocol stack.
Unfortunately, it is still on protocol revision 2 (out of 3).  Also, IMO
libpq is a little better tested and durable than the odbc driver.  This
naturally follows from the fact that libpq is more widely used and more
actively developed than odbc.

If you are heavily C++ invested you can consider wrapping libpq yourself
if you want absolute maximum performance.  If you happen to be
developing on Borland platform give strong consideration to Zeos
connection library which is very well designed (it wraps libpq).

You might want to consider posting your question to the odbc list.

Merlin 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [HACKERS] How two perform TPC-H test on postgresql-8.0.2

2005-06-27 Thread Merlin Moncure
[moved to pgsql-performance]
> > Currently I want to take a TPC-H test on postgresql-8.0.2. I
have
> > downloaded the DBGEN and QGEN from the homepage of TPC. But I
> encountered
> > many problems which forced me to request some help. 1. How to load
the
> data
> > from flat file generated by dbgen tool? To the best of my knowledge,
> there
> > is a SQL Loader in Oracle 2. How to simulate the currency
environment?
> > Where can I download a client which connects to DB server through
ODBC?
> 
> Get DBT3 from Sourceforge (search on "osdldbt").  This is OSDL's
TPCH-like
> test.
> 
> However, given your knowledge of PostgreSQL you're unlikely to get any
> kind of
> result you can use -- TPCH requires siginficant database tuning
knowledge.

I don't necessarily agree.  In fact, I remember reading the standards
for one of the TPC benchmarks and it said you were not supposed to
specifically tune for the test.  Any submission, including one with
stock settings, should be given consideration (and the .conf settings
should be submitted along with the benchmark results).  This can only
help to increase the body of knowledge on configuring the database.

Merlin

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
I need a fast way (sql only preferred) to solve the following problem:

I need the smallest integer that is greater than zero that is not in the
column of a table.  In other words, if an 'id' column has values
1,2,3,4,6 and 7, I need a query that returns the value of 5.

I've already worked out a query using generate_series (not scalable) and
pl/pgsql.  An SQL only solution would be preferred, am I missing
something obvious?

Merlin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
> Not so bad. Try something like this:
> 
> SELECT min(id+1) as id_new FROM table
> WHERE (id+1) NOT IN (SELECT id FROM table);
> 
> Now, this requires probably a sequential scan, but I'm not sure how
you
> can get around that.
> Maybe if you got trickier and did some ordering and limits. The above
> seems to give the right answer, though.

it does, but it is still faster than generate_series(), which requires
both a seqscan and a materialization of the function.
 
> I don't know how big you want to scale to.

big. :)

merlin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
John Meinel wrote:
> See my follow up post, which enables an index scan. On my system with
> 90k rows, it takes no apparent time.
> (0.000ms)
> John
> =:->

Confirmed.  Hats off to you, the above some really wicked querying.
IIRC I posted the same question several months ago with no response and
had given up on it.  I think your solution (smallest X1 not in X) is a
good candidate for general bits, so I'm passing this to varlena for
review :)

SELECT t1.id+1 as id_new FROM id_test t1
WHERE NOT EXISTS
(SELECT t2.id FROM id_test t2 WHERE t2.id = t1.id+1)
ORDER BY t1.id LIMIT 1;

Merlin

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
> Merlin Moncure wrote:
> 
> > I need a fast way (sql only preferred) to solve the following
problem:
> > I need the smallest integer that is greater than zero that is not in
the
> > column of a table.
> >
> > I've already worked out a query using generate_series (not scalable)
and
> > pl/pgsql.  An SQL only solution would be preferred, am I missing
> > something obvious?
 
> Probably not, but I thought about this "brute-force" approach... :-)
> This should work well provided that:
> 
> - you have a finite number of integers. Your column should have a
biggest
>integer value with a reasonable maximum like 100,000 or 1,000,000.
>#define YOUR_MAX 9
[...]
:-) generate_series function does the same thing only a little bit
faster (although less portable).

generate_series(m,n) returns set of integers from m to n with time
complexity n - m.  I use it for cases where I need to increment for
something, for example:

select now()::date + d from generate_series(0,355) as d;

returns days from today until 355 days from now.

Merlin

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
> On Tue, Jun 28, 2005 at 12:02:09 -0400,
>   Merlin Moncure <[EMAIL PROTECTED]> wrote:
> >
> > Confirmed.  Hats off to you, the above some really wicked querying.
> > IIRC I posted the same question several months ago with no response
and
> > had given up on it.  I think your solution (smallest X1 not in X) is
a
> > good candidate for general bits, so I'm passing this to varlena for
> > review :)
> >
> > SELECT t1.id+1 as id_new FROM id_test t1
> > WHERE NOT EXISTS
> > (SELECT t2.id FROM id_test t2 WHERE t2.id = t1.id+1)
> > ORDER BY t1.id LIMIT 1;
> 
> You need to rework this to check to see if row '1' is missing. The
> above returns the start of the first gap after the first row that
> isn't missing.

Correct.  

In fact, I left out a detail in my original request in that I had a
starting value (easily supplied with where clause)...so what I was
really looking for was a query which started at a supplied value and
looped forwards looking for an empty slot.  John's supplied query is a
drop in replacement for a plpgsql routine which does exactly this.

The main problem with the generate_series approach is that there is no
convenient way to determine a supplied upper bound.  Also, in some
corner cases of my problem domain the performance was not good.

Merlin



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] tricky query

2005-06-28 Thread Merlin Moncure
Cosimo wrote:
> I'm very interested in this "tricky query".
> Sorry John, but if I populate the `id_test' relation
> with only 4 tuples with id values (10, 11, 12, 13),
> the result of this query is:
> 
>cosimo=> create table id_test (id integer primary key);
>NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> 'id_test_pkey'
> for table 'id_test'
>CREATE TABLE
>cosimo=> insert into id_test values (10); -- and 11, 12, 13, 14
>INSERT 7457570 1
>INSERT 7457571 1
>INSERT 7457572 1
>INSERT 7457573 1
>INSERT 7457574 1
>cosimo=> SELECT t1.id+1 as id_new FROM id_test t1 WHERE NOT EXISTS
> (SELECT
> t2.id FROM id_test t2 WHERE t2.id = t1.id+1) ORDER BY t1.id LIMIT 1;
> id_new
>
> 15
>(1 row)
> 
> which if I understand correctly, is the wrong answer to the problem.
> At this point, I'm starting to think I need some sleep... :-)

Correct, in that John's query returns the first empty slot above an
existing  filled slot (correct behavior in my case).  You could flip
things around a bit to get around thist tho.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] ODBC driver over network very slow

2005-06-30 Thread Merlin Moncure
> Milan Sekanina <[EMAIL PROTECTED]> writes:
> > We are running an application that uses psqlodbc driver on Windows
XP to
> > connect to a server and for some reason the download of data from
the
> > server is very slow. We have created a very simple test application
that
> > inserts a larger amount of data into the database and uses a simple
> > "SELECT * from test" to download it back. The INSERT of 10MB takes
about
> > 4 seconds, while the SELECT takes almost 5 minutes (with basically
> > nothing else running on both the network and the two computers). If
we
> > run the PostgreSQL server on the local machine so that the network
is
> > not used, both actions are very fast.
> 
> I seem to recall having seen similar reports not involving ODBC at
all.
> Try searching the mailing-list archives, but I think the cases we
solved
> involved getting rid of third-party add-ons to the Windows TCP stack.

IIRC there was a TCP related fix in the odbc driver related to
performance with large buffers.  I'd suggest trying a newer odbc driver
first.

Merlin 

dave page wrote ([odbc] 500 times slower)
> 
> My collegue spent some time to dig the following case and it 
> looks like 
> Nagle algorithm and delayed ACKs related problem.
> In psqlodbc.h
> #define SOCK_BUFFER_SIZE  4096
> 
> I changed that value to 8192 and driver works fine for me.
> I am not sure why this change helps.

Err, no neither am I. Why do you think it's got something to do with
Nagle/delayed ACKs?

The only thing that instantly rings bells for me is that the max size of
a text field is 8190 bytes at present (which really should be increased,
if not removed altogether), which won't fit in the default buffer. But
then, I wouldn't expect to see the performance drop you describe with a
4096 byte buffer, only one much smaller.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Mount database on RAM disk?

2005-07-08 Thread Merlin Moncure
> Stuart,
> 
> > I'm putting together a road map on how our systems can scale as our
load
> > increases. As part of this, I need to look into setting up some fast
> > read only mirrors of our database. We should have more than enough
RAM
> > to fit everything into memory. I would like to find out if I could
> > expect better performance by mounting the database from a RAM disk,
or
> > if I would be better off keeping that RAM free and increasing the
> > effective_cache_size appropriately.
> 
> If you're accessing a dedicated, read-only system with a database
small
> enough to fit in RAM, it'll all be cached there anyway, at least on
Linux
> and BSD.   You won't be gaining anything by creating a ramdisk.


 
ditto windows.  

Files cached in memory are slower than reading straight from memory but
not nearly enough to justify reserving memory for your use.  In other
words, your O/S is a machine with years and years of engineering
designed best how to dole memory out to caching and various processes.
Why second guess it?

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance problems testing with Spamassassin 3.1.0 Bayes module.

2005-07-28 Thread Merlin Moncure
> I'm not sure how much this has been discussed on the list, but wasn't
> able to find anything relevant in the archives.
> 
> The new Spamassassin is due out pretty soon.  They are currently
testing
> 3.1.0pre4.  One of the things I hope to get out of this release is
bayes
> word stats moved to a real RDBMS.  They have separated the mysql
> BayesStore module from the PgSQL one so now postgres can use it's own
> queries.
> 
> I loaded all of this stuff up on a test server and am finding that the
> bayes put performance is really not good enough for any real amount of
> mail load.
> 
> The performance problems seems to be when the bayes module is
> inserting/updating.  This is now handled by the token_put procedure.

1. you need high performance client side timing (sub 1 millisecond).  on
win32 use QueryPerformanceCounter

2. one by one, convert queries inside your routine into dynamic
versions.  That is, use execute 'query string'

3. Identify the problem.  Something somewhere is not using the index.
Because of the way the planner works you have to do this sometimes.

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Finding bottleneck

2005-07-28 Thread Merlin Moncure
Kari Lavikka wrote:
> shared_buffers = 15000  
you can play around with this one but in my experience it doesn't make
much difference anymore (it used to).

> work_mem = 1536 # min 64, size in KB
this seems low.  are you sure you are not getting sorts swapped to disk?

> fsync = true# turns forced synchronization on or
off
does turning this to off make a difference?  This would help narrow down
where the problem is.

> commit_delay = 8# range 0-10, in microseconds
hm! how did you arrive at this number?  try setting to zero and
comparing.

> stats_start_collector = true
> stats_command_string = true
with a high query load you may want to consider turning this off.  On
win32, I've had some problem with stat's collector under high load
conditions.  Not un unix, but it's something to look at.  Just turn off
stats for a while and see if it helps.

good luck! your hardware should be more than adequate.

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Finding bottleneck

2005-08-08 Thread Merlin Moncure
> Kari Lavikka <[EMAIL PROTECTED]> writes:
> > samples  %symbol name
> > 13513390 16.0074  AtEOXact_CatCache
> 
> That seems quite odd --- I'm not used to seeing that function at the
top
> of a profile.  What is the workload being profiled, exactly?

He is running a commit_delay of 8.  Could that be playing a role?

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PG8 Tuning

2005-08-11 Thread Merlin Moncure
> Actually, it seems to me that with the addition of the WAL in
PostgreSQL
> and the subsequent decreased need to fsync the data files themselves
> (only during checkpoints?), that the only time a battery-backed write
> cache would make a really large performance difference would be on the
> drive(s) hosting the WAL.

It still helps.  In my experience a good BBU Raid controller is only
slightly slower than fsync=false.  Fear the checkpoint storm if you
don't have some write caching.  Beyond that I don't really care about
write delay.

Another thing to watch out for is that some sync modes (varying by
platform) can do >1 seeks per sync.  This will absolutely kill your
commit performance on the WAL without write caching.
 
> So although it is in general good to have a dedicated spindle for the
> WAL, for many workloads it is in fact significantly better to have the
> WAL written to a battery-backed write cache.  The exception would be
for
> applications with fewer, larger transactions, in which case you could
> actually use the dedicated spindle.

Exactly.

 
> Hmmm, on second thought, now I think I understand the rationale behind
> having a non-zero commit delay setting-- the problem with putting

I don't trust commit_delay.  Get a good raid controller and make sure pg
is properly using it.  Now, if you can't (or won't) do some type of
write caching bbu or no, your system has to be very carefully designed
to get any performance at all, especially with high transaction volumes.


Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] bitmap scan issues 8.1 devel

2005-08-17 Thread Merlin Moncure
Hello,
Doing some testing on upcoming 8.1 devel and am having serious issues
with new bitmap index scan feature.  It is easy to work around (just
disable it) but IMO the planner is using it when a regular index scan
should be strongly favored.  The performance of the bitmapscan in my
usage is actually quite a bit worse than a full sequential scan.

here is a query which does this:
explain analyze execute
data1_read_next_product_structure_file_0('012241', '', '', '002', 1);

Here is the 8.0/bitmap off plan:
Limit  (cost=0.00..45805.23 rows=5722 width=288) (actual
time=0.070..0.072 rows=1 loops=1)
   ->  Index Scan using product_structure_file_pkey on
product_structure_file  (cost=0.00..45805.23 rows=5722 width=288)
(actual time=0.063..0.063 row
s=1 loops=1)
 Index Cond: ((ps_parent_code)::text >= ($1)::text)
 Filter: ps_parent_code)::text > ($1)::text) OR
(ps_group_code >= $2)) AND (((ps_parent_code)::text > ($1)::text) OR
(ps_group_code > $2)
OR ((ps_section_code)::text >= ($3)::text)) AND (((ps_parent_code)::text
> ($1)::text) OR (ps_group_code > $2) OR ((ps_section_code)::text >
($3)::tex
t) OR ((ps_seq_no)::smallint > $4)))
 Total runtime: 0.185 ms

Here is the 8.1 with bitamp on:
Limit  (cost=3768.32..3782.63 rows=5722 width=288) (actual
time=2287.488..2287.490 rows=1 loops=1)
   ->  Sort  (cost=3768.32..3782.63 rows=5722 width=288) (actual
time=2287.480..2287.480 rows=1 loops=1)
 Sort Key: ps_parent_code, ps_group_code, ps_section_code,
ps_seq_no
 ->  Bitmap Heap Scan on product_structure_file
(cost=187.84..3411.20 rows=5722 width=288) (actual time=19.977..514.532
rows=47355 loops=1)
   Recheck Cond: ((ps_parent_code)::text >= ($1)::text)
   Filter: ps_parent_code)::text > ($1)::text) OR
(ps_group_code >= $2)) AND (((ps_parent_code)::text > ($1)::text) OR
(ps_group_code
> $2) OR ((ps_section_code)::text >= ($3)::text)) AND
(((ps_parent_code)::text > ($1)::text) OR (ps_group_code > $2) OR
((ps_section_code)::text > ($3
)::text) OR ((ps_seq_no)::smallint > $4)))
   ->  Bitmap Index Scan on product_structure_file_pkey
(cost=0.00..187.84 rows=18239 width=0) (actual time=19.059..19.059
rows=47356 loo
ps=1)
 Index Cond: ((ps_parent_code)::text >= ($1)::text)
 Total runtime: 2664.034 ms


Here is the prepared statement definition:
prepare data1_read_next_product_structure_file_0 (character varying,
character, character varying, int4, int4)
as select 1::int4, * from data1.product_structure_file
where ps_parent_code >= $1 and 
(ps_parent_code >  $1 or  ps_group_code >= $2) and 
(ps_parent_code >  $1 or  ps_group_code >  $2 or
ps_section_code >= $3) and 
(ps_parent_code >  $1 or  ps_group_code >  $2 or
ps_section_code >  $3 or  ps_seq_no >  $4) 
order by ps_parent_code, ps_group_code, ps_section_code,
ps_seq_no
limit $5

Aside: this is the long way of writing
select 1::int4, * from data1.product_structure_file where
(ps_parent_code, ps_group_code, ps_section_code, ps_seq_no) > ($1, $2,
$3, $4) limit %5

which is allowed in pg but returns the wrong answer.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] limit number of concurrent callers to a stored proc?

2005-08-18 Thread Merlin Moncure
Christopher
> You could use a 1 column/1 row table perhaps.  Use some sort of
locking
> mechanism.
> 
> Also, check out contrib/userlock

userlock is definitely the way to go for this type of problem.  

The are really the only way to provide locking facilities that live
outside transactions.

You are provided with 48 bits of lock space in the form of offset/block
in 32 bit field and a 16 bit field.  The 16 bit field could be the pid
of the locker and the 32 bit field the oid of the function.

Unfortunately, userlocks are not really easy to query via the pg_locks()
view.  However this has been addressed for 8.1.  In 8.1, it will be
trivial to create a function which checked the number of lockers on the
function oid and acquire a lock if less than a certain amount.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Finding bottleneck

2005-08-19 Thread Merlin Moncure
> Kari Lavikka <[EMAIL PROTECTED]> writes:
> > However, those configuration changes didn't have significant effect
to
> > oprofile results. AtEOXact_CatCache consumes even more cycles.
> 
> I believe I've fixed that for 8.1.

Relative to 8.0, I am seeing a dramatic, almost miraculous reduction in
CPU load times in 8.1devel.  This is for ISAM style access patterns over
the parse/bind interface.  (IOW one record at a time, 90% read, 10%
write).

Relative to commercial dedicated ISAM storage engines, pg holds up very
well except in cpu load, but 8.1 is a huge step towards addressing that.

So far, except for one minor (and completely understandable) issue with
bitmap issues, 8.1 has been a stellar performer.  Also great is the
expansion of pg_locks view (which I didn't see mentioned in Bruce's TODO
list, just FYI).

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Finding bottleneck

2005-08-19 Thread Merlin Moncure
> Cool --- we've done a fair amount of work on squeezing out internal
> inefficiencies during this devel cycle, but it's always hard to
predict
> just how much anyone will notice in the real world.
> 
> Care to do some oprofile or gprof profiles to see where it's still
bad?
> 

Since release of 8.0, we are a strictly windows shop :).  I tried
building pg with -pg flag and got errors in some of the satellite
libraries.  I think this is solvable though at some point I'll spend
more time on it.  Anyways, just so you know the #s that I'm seein, I've
run several benchmarks of various programs that access pg via our ISAM
bridge.  The results are as consistent as they are good.  These tests
are on the same box using the same .conf on the same freshly loaded
data.  The disk doesn't play a major role in these tests.  All data
access is through ExecPrepared libpq C interface.  Benchmark is run from
a separate box on a LAN.

Bill of Materials Traversal ( ~ 62k records).

 ISAM*  pg 8.0 pg 8.1 devel   delta 8.0->8.1
running time 63 sec 90 secs71 secs21%
cpu load 17%45%32%29% 
loadsecs**   10.71  40.5   22.72  44%
recs/sec 984688873
recs/loadsec 5882   1530   2728

*ISAM is an anonymous commercial ISAM library in an optimized server
architecture (pg smokes the non-optimized flat file version).
**Loadsecs being seconds of CPU at 100% load.  


IOW cpu load drop is around 44%.  Amazing!

Merlin



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Finding bottleneck

2005-08-22 Thread Merlin Moncure
> Bill of Materials Traversal ( ~ 62k records).
> 
>  ISAM*  pg 8.0 pg 8.1 devel   delta 8.0->8.1
> running time 63 sec 90 secs71 secs21%
> cpu load 17%45%32%29%
> loadsecs**   10.71  40.5   22.72  44%
> recs/sec 984688873
> recs/loadsec 5882   1530   2728
> 
> *ISAM is an anonymous commercial ISAM library in an optimized server
> architecture (pg smokes the non-optimized flat file version).
> **Loadsecs being seconds of CPU at 100% load.

One thing that might interest you is that the penalty in 8.1 for
stats_command_string=true in this type of access pattern is very high: I
was experimenting to see if the new cpu efficiency gave me enough of a
budget to start using this.  This more than doubled the cpu load to
around 70% with a runtime of 82 seconds.  This is actually worse than
8.0 :(.

This *might* be a somewhat win32 specific issue.  I've had issues with
the stats collector before.  Anyways, the feature is a frill so it's not
a big deal.

Merlin



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Finding bottleneck

2005-08-22 Thread Merlin Moncure
> That seems quite peculiar; AFAICS the pgstat code shouldn't be any
> slower than before.  At first I thought it might be because we'd
> increased PGSTAT_ACTIVITY_SIZE, but actually that happened before
> 8.0 release, so it shouldn't be a factor in this comparison.

Just FYI the last time I looked at stats was in the 8.0 beta period.
 
> Can anyone else confirm a larger penalty for stats_command_string in
> HEAD than in 8.0?  A self-contained test case would be nice too.

looking into it.

Merlin



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] performance drop on RAID5

2005-08-24 Thread Merlin Moncure
> Hello,
> i have a pg-8.0.3 running on Linux  kernel  2.6.8,  CPU  Sempron
2600+,
> 1Gb RAM on IDE HD ( which could be called a "heavy desktop" ),
measuring
> this performance with pgbench ( found on /contrib ) it gave me an
> average ( after several runs ) of 170 transactions per second;

170 tps is not plausible no a single platter IDE disk without using
write caching of some kind.  For a 7200 rpm drive any result much over
100 tps is a little suspicious. (my 10k sata raptor can do about 120).
 
> for the sake of experimentation ( actually, i'm scared this IDE drive
> could fail at any time, hence i'm looking for an alternative, more
> "robust", machine ), i've installed on an aging Compaq Proliant server
(
> freshly compiled SMP kernel 2.6.12.5  with preemption ), dual Pentium
> III Xeon 500Mhz, 512Mb RAM, (older) SCSI-2 80pin drives, and
re-tested,
> when the database was on a single SCSI drive, pgbench gave me an
average
> of 90 transactions per second, but, and that scared me most, when the
> database was on a RAID-5 array ( four 9Gb disks, using linux software
> RAID mdadm and LVM2, with the default filesystem cluster size of 32Kb
),
> the performance dropped to about 55 transactions per second.

Is natural to see a slight to moderate drop in write performance moving
to RAID 5.  The only raid levels that are faster than single disk levels
for writing are the ones with '0' in it or caching raid controllers.
Even for 0+1, expect modest gains in tps vs. single disk if not using
write caching.

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Some ideas for comment

2005-08-24 Thread Merlin Moncure
> Ok, there is always a lot of talk about tuning PostgreSQL on linux and
> how PostgreSQL uses the linux kernel cache to cache the tables and
> indexes.
[...]
> 
> 1.  Implement a partition type layout using views and rules - This
> will allow me to have one table in each view with the "active" data,
> and the inactive data stored by year in other tables.
> 
> So I would have the following (for each major table):
> 
> Table View as
> select * from active_table
> union all
> select * from table_2005
> union all
> select * from table_2004
> etc.

Linux does a pretty good job of deciding what to cache.  I don't think
this will help much.  You can always look at partial indexes too.

> 2.  I am also thinking of recommending we collapse all databases in a
> cluster into one "mega" database.  I can then use schema's and views
> to control database access and ensure that no customer can see another
> customers data.

hm. keep in mind views are tightly bound to the tables they are created
with (views can't 'float' over tables in different schemas).  pl/pgsql
functions can, though.  This is a more efficient use of server
resources, IMO, but not a windfall.
 
> This would mean that there are only one set of indexes being loaded
> into the cache.  While they would be larger, I think in combination
> with the partition from idea 1, we would be ahead of the ball game.
> Since there would only be one set of indexes, everyone would be
> sharing them so they should always be in memory.

I would strongly consider adding more memory :).
 
> I don't have real numbers to give you, but we know that our systems
> are hurting i/o wise and we are growing by about 2GB+ per week (net).
> We actually grow by about 5GB/week/server.  However, when I run my
> weekly maintenance of vacuum full, reindex, and the vacuum analyze, we
> end up getting about 3GB back.  Unfortunately, I do not have the i/o
> bandwidth to vacuum during the day as it causes major slowdowns on our
> system.  Each night, I do run a vacuum analyze across all db's to try
> and help.  I also have my fsm parameters set high (800 fsm pages,
> and 5000 fsm relations) to try and compensate.

Generally, you can reduce data turnover for the same workload by
normalizing your database.  IOW, try and make your database more
efficient in the way it stores data.

> Right now, we are still on 7.3.4.  However, these ideas would be
> implemented as part of an upgrade to 8.x (plus, we'll initialize the
> new clusters with a C locale).

yes, do this!

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Need for speed 2

2005-08-25 Thread Merlin Moncure
> Putting pg_xlog on the IDE drives gave about 10% performance
> improvement. Would faster disks give more performance?
> 
> What my application does:
> 
> Every five minutes a new logfile will be imported. Depending on the
> source of the request it will be imported in one of three "raw click"
> tables. (data from two months back, to be able to verify customer
> complains)
> For reporting I have a set of tables. These contain data from the last
> two years. My app deletes all entries from today and reinserts updated
> data calculated from the raw data tables.
> 
> The queries contain no joins only aggregates. I have several indexes
to
> speed different kinds of queries.
> 
> My problems occur when one users does a report that contains to much
old
> data. In that case all cache mechanisms will fail and disc io is the
> limiting factor.

It seems like you are pushing limit of what server can handle.  This
means: 1. expensive server upgrade. or 
2. make software more efficient.

Since you sound I/O bound, you can tackle 1. by a. adding more memory or
b. increasing i/o throughput.  

Unfortunately, you already have a pretty decent server (for x86) so 1.
means 64 bit platform and 2. means more expensive hard drives.  The
archives is full of information about this...

Is your data well normalized?  You can do tricks like:
if table has fields a,b,c,d,e,f with a is primary key, and d,e,f not
frequently queried or missing, move d,e,f to seprate table.

well normalized structures are always more cache efficient.  Do you have
lots of repeating and/or empty data values in your tables?

Make your indexes and data as small as possible to reduce pressure on
the cache, here are just a few tricks:
1. use int2/int4 instead of numeric
2. know when to use char and varchar 
3. use functional indexes to reduce index expression complexity.  This
can give extreme benefits if you can, for example, reduce double field
index to Boolean.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Limit + group + join

2005-08-26 Thread Merlin Moncure
Mark Kirkwood
> > The 'desc' seems to be the guy triggering the sort, e.g:
> 
> Oh; really an accident that I didn't notice myself, I was actually
going
> to
> remove all instances of "desc" in my simplification, but seems like I
> forgot.

If desc is the problem you can push the query into a subquery without
sorting and sort the result.  This is called an inline view.  Sometimes
you can pull a couple of tricks to force the view to materialize before
it is sorted.

aka 
select q.*
from
(
   some_complex_query   
) q order by ...;

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] difference in plan between 8.0 and 8.1?

2005-08-26 Thread Merlin Moncure
> Hello all,
> 
> I was hoping someone could explain the plan for a statement.
> 
> We have a table with a column of longs being used as an index.  The
> query plan in 8.0 was like this:
> 
> # explain select distinct timeseriesid from tbltimeseries where
> timeseriesid > 0 order by timeseriesid;

I had the same problem.  You probably already have seq scan turned off,
or the server would be using that.  You may have to turn bitmap off or
rework you query such that the server will use the index.  (between?).

Anyways, distinct is code word for 'bad performance' :).  Consider
laying out tables such that it not necessary, for example set up table
with RI link.  Then you can do this in zero time.

Good luck!

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Sending a select to multiple servers.

2005-08-26 Thread Merlin Moncure
>  Does such a solution exist now. To me this appears to be in entirety
of
> what should constitute a database cluster. Only the search needs to be
> done on all the servers simultaneously at the low level. Once you get
the
> results, the writing can be determined by the upper level logic (which
can
> even be in a scripting language). But the search across many servers
has
> to be done using proper threading, and the re-sorting also needs to be
> done fast.

Well the fastest way would be to write a libpq wrapper, personally I
would choose C++ for extreme performance.  STL bring super fast sorting
to the table and will make dealing with ExecParams/ExecPrepared a little
bit easier.  To make available from scripting languages you need to make
C wrappers for interface functions and build in a shared library.

You could use any of a number of high level scripting languages but
performance will not be as good.  YMMV.

Antother interesting take on this problem would be to use dblink
contrib. module.  Check that out and see if it can meet your needs.

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Limit + group + join

2005-08-29 Thread Merlin Moncure
Tobias wrote:
> Splendid :-) Unfortunately we will not be upgrading for some monthes
> still,
> but anyway I'm happy.  This provides yet another good argument for
> upgrading
> sooner.  I'm also happy to see such a perfect match:
> 
>  - A problem that can be reduced from beeing complex and
>production-specific, to simple and easily reproducible.
> 
>  - Enthusiastic people testing it and pinpointing even more precisely
what
>conditions will cause the condition
> 
>  - Programmers actually fixing the issue
> 
>  - Testers verifying that it was fixed
> 
> Long live postgresql! :-)

In the last three or so years since I've been really active with
postgresql, I've found two or three issues/bugs which I was able to
reproduce and reduce to a test case.  In all instances the fix was in
cvs literally within minutes.

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Need for speed 3

2005-09-01 Thread Merlin Moncure

Ulrich wrote:
> Hi again,
> 
> first I want to say ***THANK YOU*** for everyone who kindly shared
their
> thoughts on my hardware problems. I really appreciate it. I started to
> look for a new server and I am quite sure we'll get a serious hardware
> "update". As suggested by some people I would like now to look closer
at
> possible algorithmic improvements.
> 
> My application basically imports Apache log files into a Postgres
> database. Every row in the log file gets imported in one of three (raw
> data) tables. My columns are exactly as in the log file. The import is
> run approx. every five minutes. We import about two million rows a
month.
> 
> Between 30 and 50 users are using the reporting at the same time.
> 
> Because reporting became so slow, I did create a reporting table. In
> that table data is aggregated by dropping time (date is preserved),
ip,
> referer, user-agent. And although it breaks normalization some data
from
> a master table is copied, so no joins are needed anymore.
> 
> After every import the data from the current day is deleted from the
> reporting table and recalculated from the raw data table.
> 

schemas would be helpful.  You may be able to tweak the import table a
bit and how it moves over to the data tables.

Just a thought: have you considered having apache logs write to a
process that immediately makes insert query(s) to postgresql? 

You could write small C program which executes advanced query interface
call to the server.

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Need for speed 3

2005-09-01 Thread Merlin Moncure
> Hi Merlin,
> > Just a thought: have you considered having apache logs write to a
> > process that immediately makes insert query(s) to postgresql?
> 
> Yes we have considered that, but dismissed the idea very soon. We need
> Apache to be as responsive as possible. It's a two server setup with
> load balancer and failover. Serving about ones thousand domains and
> counting. It needs to be as failsafe as possible and under no
> circumstances can any request be lost. (The click counting is core
> business and relates directly to our income.)
> That said it seemed quite save to let Apache write logfiles. And
import
> them later. By that a database downtime wouldn't be mission critical.

hm.  well, it may be possible to do this in a fast and safe way but I
understand your reservations here, but I'm going to spout off my opinion
anyways :).

If you are not doing this the following point is moot.  But take into
consideration you could set a very low transaction time out (like .25
seconds) and siphon log entries off to a text file if your database
server gets in trouble.  2 million hits a month is not very high even if
your traffic is bursty (there are approx 2.5 million seconds in a
month).

With a direct linked log file you get up to date stats always and spare
yourself the dump/load song and dance which is always a headache :(.
Also, however you are doing your billing, it will be easier to manage it
if everything is extracted from pg and not some conglomeration of log
files, *if* you can put 100% faith in your database.  When it comes to
pg now, I'm a believer.

> > You could write small C program which executes advanced query
interface
> > call to the server.
> 
> How would that improve performance?

The functions I'm talking about are PQexecParams and PQexecPrepared.
The query string does not need to be encoded or decoded and is very
light on server resources and is very low latency.  Using them you could
get prob. 5000 inserts/sec on a cheap server if you have some type of
write caching in place with low cpu load.  

Merlin



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Merlin Moncure
> I'm having performance issues with a table consisting of 2,043,133
rows.
> The
> schema is:

> locality_1 has 16650 distinct values and locality_2 has 1156 distinct
> values.

Just so you know I have a 2GHz p4 workstation with similar size (2M
rows), several keys, and can find and fetch 2k rows based on 20k unique
value key in about 60 ms. (.06 seconds).

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Merlin Moncure
>   Table "public.address"
> Column|  Type  | Modifiers
> --++---
>  postcode_top | character varying(2)   | not null
>  postcode_middle  | character varying(4)   | not null
>  postcode_bottom  | character varying(7)   | not null


consider making above fields char(x) not varchar(x) for small but
important savings.

>  postcode | character varying(10)  | not null
>  property_type| character varying(15)  | not null
>  sale_type| character varying(10)  | not null
>  flat_extra   | character varying(100) | not null
>  number   | character varying(100) | not null
>  street   | character varying(100) | not null
>  locality_1   | character varying(100) | not null
>  locality_2   | character varying(100) | not null
>  city | character varying(100) | not null
>  county   | character varying(100) | not null
> Indexes:
> "address_city_index" btree (city)
> "address_county_index" btree (county)
> "address_locality_1_index" btree (locality_1)
> "address_locality_2_index" btree (locality_2)
> "address_pc_bottom_index" btree (postcode_bottom)
> "address_pc_middle_index" btree (postcode_middle)
> "address_pc_top_index" btree (postcode_top)
> "address_pc_top_middle_bottom_index" btree (postcode_top,
>  postcode_middle, postcode_bottom)
> "address_pc_top_middle_index" btree (postcode_top,
postcode_middle)
> "address_postcode_index" btree (postcode)
> "address_property_type_index" btree (property_type)
> "address_street_index" btree (street)
> "street_prefix" btree (lower("substring"((street)::text, 1, 1)))
> 
> Obviously, to me, this is a problem, I need these queries to be under
a
> second to complete. Is this unreasonable? What can I do to make this
"go
> faster"? I've considered normalising the table but I can't work out
> whether the slowness is in dereferencing the pointers from the index
> into the table or in scanning the index in the first place. And
> normalising the table is going to cause much pain when inserting
values
> and I'm not entirely sure if I see why normalising it should cause a
> massive performance improvement.

http://www.dbdebunk.com :)

> I need to get to the stage where I can run queries such as:
> select street, locality_1, locality_2, city from address
> where (city = 'Nottingham' or locality_2 = 'Nottingham'
>or locality_1 = 'Nottingham')
>   and upper(substring(street from 1 for 1)) = 'A'
> group by street, locality_1, locality_2, city
> order by street
> limit 20 offset 0
> 
> and have the results very quickly.
> 
> Any help most gratefully received (even if it's to say that I should
be
> posting to a different mailing list!).

this is correct list.  did you run vacuum/analyze, etc?
Please post vacuum analyze times.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Massive performance issues

2005-09-01 Thread Merlin Moncure
> -Original Message-
> From: Alvaro Herrera [mailto:[EMAIL PROTECTED]
> Sent: Thursday, September 01, 2005 3:34 PM
> To: Merlin Moncure
> Cc: Matthew Sackman; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Massive performance issues
> 
> On Thu, Sep 01, 2005 at 02:04:54PM -0400, Merlin Moncure wrote:
> > >   Table "public.address"
> > > Column|  Type  | Modifiers
> > > --++---
> > >  postcode_top | character varying(2)   | not null
> > >  postcode_middle  | character varying(4)   | not null
> > >  postcode_bottom  | character varying(7)   | not null
> >
> > consider making above fields char(x) not varchar(x) for small but
> > important savings.
> 
> Huh, hang on -- AFAIK there's no saving at all by doing that.  Quite
the
> opposite really, because with char(x) you store the padding blanks,
> which are omitted with varchar(x), so less I/O (not necessarily a
> measurable amount, mind you, maybe even zero because of padding
issues.)

You are right, all this time I thought there was a 4 byte penalty for
storing varchar type and not in char :(.  So there is no reason at all
to use the char type?

Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Improving performance of a query

2005-09-06 Thread Merlin Moncure
Carlos wrote:
SELECT * FROM SSIRRA where 
(YEAR = 2004 and CUSTOMER = 04 and CODE = 00 and PART >= 00) or 
(YEAR = 2004 and CUSTOMER = 04 and CODE > 00) or 
(YEAR = 2004 and CUSTOMER > 04) or 
(YEAR > 2004)
[snip]

ah, the positional query.  You can always rewrite this query in the
following form:

(YEAR >= 2004) and
(YEAR = 2004 or CUSTOMER >= 04) and
(YEAR = 2004 or CUSTOMER = 04 or CODE >= 00) and
(YEAR = 2004 or CUSTOMER = 04 or CODE = 00 or PART > 00) 

This is better because it will index scan using 'year' (not customer or
part though).  The true answer is to lobby for/develop proper row
constructor support so you can just 

SELECT * FROM SSIRRA where (YEAR, CUSTOMER, CODE, PART) > (2004,
04, 00, 00)

this is designed to do what you are trying to do but currently doesn't
work quite right.

note: in all these queries, 'order by YEAR, CUSTOMER, CODE, PART' should
probably be on the query.

Other solution: use cursor/fetch or some type of materialized solution. 

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Improving performance of a query

2005-09-06 Thread Merlin Moncure
> > Carlos wrote:
> > SELECT * FROM SSIRRA where
> > (YEAR = 2004 and CUSTOMER = 04 and CODE = 00 and PART >= 00)
or
> > (YEAR = 2004 and CUSTOMER = 04 and CODE > 00) or
> > (YEAR = 2004 and CUSTOMER > 04) or
> > (YEAR > 2004)
> > [snip]
> >
> > ah, the positional query.  You can always rewrite this query in the
> > following form:
> >
> > (YEAR >= 2004) and
> > (YEAR = 2004 or CUSTOMER >= 04) and
> > (YEAR = 2004 or CUSTOMER = 04 or CODE >= 00) and
> > (YEAR = 2004 or CUSTOMER = 04 or CODE = 00 or PART > 00)
> 
> Unless I'm not seeing something, I don't think that's a correct
> reformulation in general. If customer < 4 and year > 2004 the original
> clause would return true but the reformulation would return false
since
> (year=2004 or customer >= 4) would be false.

You are correct, you also have to exchange '=' with '>' to exchange
'and' with 'or'.  

Correct answer is:
> > (YEAR >= 2004) and
> > (YEAR > 2004 or CUSTOMER >= 04) and
> > (YEAR > 2004 or CUSTOMER > 04 or CODE >= 00) and
> > (YEAR > 2004 or CUSTOMER > 04 or CODE > 00 or PART > 00)

It's easy to get tripped up here: the basic problem is how to get the
next record based on a multi part key.  My ISAM bridge can write them
either way but the 'and' major form is always faster ;).

MErlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert performance for win32

2005-09-06 Thread Merlin Moncure
> Hi,
> 
> I usually use PostgreSQL coupled with Linux, but I have to use Windows
for
> a
> perticular project.
> 
> So I wanted to do some tests to know if the performance will be
acceptable
> (I
> don't need PostgreSQL to be as fast with windows as with linux, but it
has
> to
> be usable...).

In my experience win32 is par with linux generally with a few gotchas on
either side.  Are your times with fsync=no? It's much harder to give
apples-apples comparison with fsync=on for various reasons.

Are you running stats_command_string=on?  Try disabling and compare
results.
Is your loading app running locally or on the server?

I am very interesting in discovering sources of high cpu load problems
on win32.  If you are still having problems could you get a gprof
profile together?  There is a recent thread on win32-hackers discussing
how to do this.

Merlin



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] insert performance for win32

2005-09-06 Thread Merlin Moncure
> > In my experience win32 is par with linux generally with a few gotchas on
> > either side.  Are your times with fsync=no? It's much harder to give
> > apples-apples comparison with fsync=on for various reasons.
> It is with fsync=off on windows, fsync=on on linux

well, inside a transaction this shouldn't have mattered anyways.
 
> > Are you running stats_command_string=on?  Try disabling and compare
> > results.
> Deactivated on windows, activated on linux
 
> > Is your loading app running locally or on the server?
> Yes

hm :(.  Well, you had me curious so I went ahead and re-ran your test case and 
profiled it (on windows).  I got similar results time wise.  It's interesting 
to note that the command I used to generate the test table before dumping 
w/inserts

insert into test select nextval('test_id_seq'), 'test' from 
generate_series(1,50) 

ran in just a few seconds.  

Well, I cut the #recs down to 50k and here is profile trace:
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 10.78  0.62 0.6250001 0.00 0.00  yyparse
  5.39  0.93 0.31  5101422 0.00 0.00  AllocSetAlloc
  4.52  1.19 0.26   799970 0.00 0.00  base_yylex
  2.78  1.35 0.16   28 0.00 0.00  SearchCatCache
  2.43  1.49 0.14   554245 0.00 0.00  hash_search
  2.26  1.62 0.1349998 0.00 0.00  XLogInsert
  1.74  1.72 0.10   453363 0.00 0.00  LWLockAcquire
  1.74  1.82 0.10   299988 0.00 0.00  ScanKeywordLookup

This makes me wonder if we are looking in the wrong place.  Maybe the problem 
is coming from psql?  More results to follow.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert performance for win32

2005-09-06 Thread Merlin Moncure
> This makes me wonder if we are looking in the wrong place.  Maybe the
> problem is coming from psql?  More results to follow.

problem is not coming from psql.  

One thing I did notice that in a 250k insert transaction the insert time
grows with #recs inserted.  Time to insert first 50k recs is about 27
sec and last 50 k recs is 77 sec.  I also confimed that size of table is
not playing a role here.

Marc, can you do select timeofday() every 50k recs from linux?  Also a
gprof trace from linux would be helpful.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] insert performance for win32

2005-09-07 Thread Merlin Moncure
> > One thing I did notice that in a 250k insert transaction the insert
time
> > grows with #recs inserted.  Time to insert first 50k recs is about
27
> > sec and last 50 k recs is 77 sec.  I also confimed that size of
table is
> > not playing a role here.
> >
> > Marc, can you do select timeofday() every 50k recs from linux?  Also
a
> > gprof trace from linux would be helpful.
> >
> 
> Here's the timeofday ... i'll do the gprof as soon as I can.
> Every 5 rows...
> 
Were those all in a single transaction?

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] insert performance for win32

2005-09-07 Thread Merlin Moncure
> On Tuesday 06 September 2005 19:11, Merlin Moncure wrote:
> Here's the timeofday ... i'll do the gprof as soon as I can.
> Every 5 rows...
> 
> Wed Sep 07 13:58:13.860378 2005 CEST
> Wed Sep 07 13:58:20.926983 2005 CEST
> Wed Sep 07 13:58:27.928385 2005 CEST
> Wed Sep 07 13:58:35.472813 2005 CEST
> Wed Sep 07 13:58:42.825709 2005 CEST
> Wed Sep 07 13:58:50.789486 2005 CEST
> Wed Sep 07 13:58:57.553869 2005 CEST
> Wed Sep 07 13:59:04.298136 2005 CEST
> Wed Sep 07 13:59:11.066059 2005 CEST
> Wed Sep 07 13:59:19.368694 2005 CEST

ok, I've been in crunching profile profile graphs, and so far have been
only been able to draw following conclusions.

For bulk, 'in-transaction' insert:
1. win32 is slower than linux.  win32 time for each insert grows with #
inserts in xact, linux does not (or grows much slower).  Win32 starts
out about 3x slower and grows to 10x slower after 250k inserts.

2. ran a 50k profile vs. 250k profile.  Nothing jumps out as being
slower or faster: most time is spent in yyparse on either side.  From
this my preliminary conclusion is that there is something going on in
the win32 api which is not showing in the profile.

3. The mingw gprof cumulative seconds does not show measurable growth in
cpu time/insert in 50k/250k profile.

I'm now talking suggestions about where to look for performance problems
:(.
Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Low performance on Windows problem

2005-09-13 Thread Merlin Moncure
> Hello.
> 
> I would like to build a shared repository for Enterprise Architect
> (http://www.sparxsystems.com.au/ea.htm) using PostgreSQL. I have done
it
> before with Linux and FreeBSD servers and everything was working out
of
> the
> box. The repository is pretty simple database with less than 100
tables
> (the
> schema is at
> http://www.sparxsystems.com.au/downloads/corp/Postgres_Basemodel.sql).
> 
> The problem is that at the moment I have only a Windows XP "server" at
my
> disposal. I have installed PostgreSQL 8.0.3 for Windows and set the
> repository up. Unfortunately the performance is unacceptable: every
> operation with the model stored in the repository is by the order of
> magnitude slower than on the FreeBSD server with half as good
hardware.
> (BTW CPU load is nearly 0, network load is under 5%, the machine has
1GB
> RAM and the database size is 14MB.)
> 
> I have tried to:
> - tweak the postgresql.conf - no apparent change
> - kill all unnecessary services - no apparent change
> - install MySQL on the same machine to compare - it is as fast as
> PostgreSQL
>   on FreeBSD (= way faster than PG on the machine)
 
Can you give specific examples of cases that are not performing like you
expect?  If possible, give a few queries with explain analyze times and
all that.

Are you syncing your data?   Win32 fsync is about 1/3 as fast as linux
fsync, although this was changed to fsync_writethrough for clarification
purposes.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Low performance on Windows problem

2005-09-13 Thread Merlin Moncure
> On Tue, Sep 13, 2005 at 07:58:20AM -0400, Merlin Moncure wrote:
> This command is executed while a model is loaded from the repository.
> 
> The table definition is:
> CREATE TABLE t_umlpattern (
>   PatternID INTEGER DEFAULT nextval('"patternid_seq"'::text) NOT
NULL
> PRIMARY KEY,
>   PatternCategory VARCHAR(100),
>   PatternName VARCHAR(150),
>   Style VARCHAR(250),
>   Notes TEXT,
>   PatternXML TEXT,
>   Version VARCHAR(50)
> );
> 
> It has just 23 rows but the PatternXML column is rather large. The
table
> dump has over 900 kB.
> 
> Now
> select * from t_umlpattern limit 2
> 
> takes 1500+ msec on the Windows machine and 60 on a comparable Linux
> machine. Both selects performed from remote PgAdmin.
> The same select performed localy on the windows machine takes 60 msec.
> 
> So I guess the problem is in the transfer of the bigger amount of data
> from
> the Windows server.
> 
> I put the dump at http://www.insula.cz/dali/misc/table.zip
> 
> Could anybody confirm the difference?
 
I loaded your dump and was able to select entire table in trivial time
from both pgAdmin and psql shell.  I am suspecting some type of tcp
problem here.  Can you confirm slow times on unloaded server?

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Low performance on Windows problem

2005-09-13 Thread Merlin Moncure
> Did you run the select remotely on a Windows server?

yes.
 
> Yes the server load is practically 0. Note the difference between
local
> and
> remote execution of the command. I think you are right about the
network
> problem possibility. But it is bound to PostgreSQL. MySQL on the same
> machine (and same database content) had no problem.
> 
> So are there any known issues with PostgreSQL on Windows sending data
to
> remote hosts connected via ODBC?
> What should I do to find out more debug info?

1. turn on all your logging and make sure we looking at the right place
(planner stats, etc).
2. run explain analyze and compare timings (which returns only explain
output).
3. do a select max(patternxml) test.t_umlpattern and observe the time.
4. do a select substr(patternxml, 1, 10) from test.t_umlpattern and
observe the time.
5. do select array_accum(q::text) from generate_series(1,1) q;

if array_accum errors out, do:

CREATE AGGREGATE public.array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);

and observe the time.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Low performance on Windows problem

2005-09-14 Thread Merlin Moncure
> in the 10 ms range. Definitely not 800 ms. The 8.1 has the same
problem.
> 
> Just for the record: the server PC is Dell Precision 330 with 3Com
3C920
> integrated network card. OS MS Windows Professional 2002 with service
pack
> 2. There is Symantec Antivirus installed - which I have (hopefully)
> completely disabled.

Try throwing in another network card and see if it helps.  Next step is
to try twinking tcp settings
(http://support.microsoft.com/default.aspx?scid=kb;en-us;314053) and see
if that helps.  Beyond that, try playing the update driver game.  If you
are still having problems, try receiving bigger and bigger results to
see where problem occurs. 1-2k range suggests mtu problem, 4-8k range
suggests tcp receive window problem.

Beyond that, I'm stumped, uh, buy Opteron? :)

Merlin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Battery Backed Cache for RAID

2005-09-14 Thread Merlin Moncure
> On Wed, Sep 14, 2005 at 11:28:43AM -0700, Jeffrey W. Baker wrote:
> > On Wed, 2005-09-14 at 11:25 -0700, Peter Darley wrote:
> > >   I'm getting a new server for our database, and I have a quick
> question
> > > about RAID controllers with a battery backed cache.  I understand
that
> the
> > > cache will allow the cache to be written out if the power fails to
the
> box,
> > > which allows it to report a write as committed safely when it's
not
> actually
> > > committed.
> >
> > Actually the cache will just hold its contents while the power is
out.
> > When the power is restored, the RAID controller will complete the
writes
> > to disk.  If the battery does not last through the outage, the data
is
> > lost.
> 
> Just curious: how long are the batteries supposed to last?
 
For the length of time it will take for you to get fired for not getting
the server running plus one hour :).

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [ODBC] ODBC Driver on Windows 64 bit

2005-09-15 Thread Merlin Moncure








Well, pg being a multi-process
architecture, on a 64 bit system you get the advantages of extra memory for
cache all day long.  I don’t
thing a 2gb mem
limit/backend is not a wall people are hitting very often even on high end
systems.  Performance wise, 32 vs.
64 bit is a tug of war between extra registers & faster 64 bit ops on one
side vs. smaller pointers and better memory footprint on the other.

 

Note I am assuming Opteron here where the
32/64 performance is basically the same.

 

Merlin

 





 

That being said, from what I’m
hearing it may be moot because it’s probably best to run postgres as 32
bit on a 64 bit operating system performance wise.

 

 



Oh? why's that then?

 

/D










Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-22 Thread Merlin Moncure
> >I previously posted the following as a sequel to my SELECT DISTINCT
> >Performance Issue question. We would most appreciate any clue or
> >suggestions on how to overcome this show-stopping issue. We are using
> >8.0.3 on Windows.
> >
> >Is it a known limitation when using a view with SELECT ... LIMIT 1?
> >
> >Would the forthcoming performance enhancement with MAX help when used
> >within a view, as in:
> >
> >create or replace view VCurPlayer as select * from Player a
> >where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID
=
> >b.PlayerID);

Here is a trick I use sometimes with views, etc.  This may or may not be
effective to solve your problem but it's worth a shot.  Create one small
SQL function taking date, etc. and returning the values and define it
immutable.  Now in-query it is treated like a constant.

Another useful application for this feature is when you have nested
views (view 1 queries view 2) and you need to filter records based on
fields from view 2 which are not returned in view 1.  Impossible? 

in view 2 add clause where v2.f between  f_min() and f_max(), them being
immutable functions which can grab filter criteria based on inputs or
values from a table.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-22 Thread Merlin Moncure
> >Here is a trick I use sometimes with views, etc.  This may or may not
be
> >effective to solve your problem but it's worth a shot.  Create one
small
> >SQL function taking date, etc. and returning the values and define it
> >immutable.  Now in-query it is treated like a constant.
> 
> We don't use functions as a rule, but I would be glad to give it a
try.
> I would most appreciate if you could define a sample function and
rewrite
> the VCurPlayer view above. Both PlayerID and AtDate are varchar
fields.

> esdt=> explain analyze select PlayerID,AtDate from Player a
>   where PlayerID='0' and AtDate = (select b.AtDate from Player b
>   where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc
LIMIT 1
try:

create function player_max_at_date (varchar) returns date as 
$$
select atdate from player where playerid = $1 order by playerid
desc, AtDate desc limit 1;
$$ language sql immutable;

create view v as select playerid, player_max_at_date(playerid) from
player;
select * from v where playerid = 'x'; --etc

note: this function is not really immutable.  try with both 'immutable'
and 'stable' if performance is same, do stable.

You're welcome in advance, ;)
Merlin



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-23 Thread Merlin Moncure
> At 02:07 05/09/23, Merlin Moncure wrote:
> > > >Here is a trick I use sometimes with views, etc.  This may or may
not
> be
> > > >effective to solve your problem but it's worth a shot.  Create
one
> small
> > > >SQL function taking date, etc. and returning the values and
define it
> > > >immutable.  Now in-query it is treated like a constant.
> 
> esdt=> create or replace function player_max_atdate (varchar(32))
returns
> varchar(32) as $$
> esdt$>  select atdate from player where playerid = $1 order by
playerid
> desc, AtDate desc limit 1;
> esdt$> $$ language sql immutable;

Can you time just the execution of this function and compare vs. pure
SQL version?  If the times are different, can you do a exaplain analyze
of a prepared version of above?

prepare test(character varying) as select atdate from player where
playerid = $1 order by playerid desc, AtDate desc limit 1;

explain analyze execute test('0');

> CREATE FUNCTION
> esdt=> create or replace view VCurPlayer3 as select * from Player
where
> AtDate = player_max_atdate(PlayerID);
> CREATE VIEW

This is wrong, it should have been 
create or replace view VCurPlayer3 as select *,
player_max_atdate(PlayerID) as max_date  from Player;

I did a test on a table with 124k records and a two part key, ID & date.
esp# select count(*) from parts_order_file;
count

 124158
(1 row)


esp=# select count(*) from parts_order_file where pr_dealer_no =
'000500';
 count
---
 27971
(1 row)

created same function, view v, etc.
esp=# explain analyze select * from v where pr_dealer_no = '000500'
limit 1;
 
QUERY PLAN




 Limit  (cost=0.00..3.87 rows=1 width=10) (actual time=1.295..1.297
rows=1 loops=1)
   ->  Index Scan using parts_order_file_pr_dealer_no_key on
parts_order_file  (cost=0.00..109369.15
 rows=28226 width=10) (actual time=1.287..1.287 rows=1 loops=1)
 Index Cond: (pr_dealer_no = '000500'::bpchar)
 Total runtime: 1.413 ms
(4 rows)

Something is not jiving here.  However, if the server plan still does
not come out correct, try the following (p.s. why is function returning
varchar(32) and not date?):

create or replace function player_max_atdate (varchar(32)) returns date
as
$$
DECLARE
   player_record record;
   return date date;
BEGIN
   for player_record in execute
   'select atdate from player where playerid = \'' || $1 || '\'
order by playerid desc, AtDate desc limit 1;' loop
   return_date = player_record.atdate;
   end loop;
   
   return return_date;
END;
$ language plpgsql immutable;

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-05 Thread Merlin Moncure
[to K C:] sorry, was out on vactation all last week.  I was visualizing
the problem incorrectly anyways...

Jim wrote:
> That function is not immutable, it should be defined as stable.

That is 100% correct: however now and then I declare stable functions as
immutable in some cases because the planner treats them differently with
no side effects...this is a hack of course...see my earlier suggestion
to try both immutable and stable versions.  I can give a pretty good
example of when this can make a big difference.
 
> PostgreSQL doesn't pre-compile functions, at least not until 8.1 (and
> I'm not sure how much those are pre-compiled, though they are
> syntax-checked at creation). Do you get the same result time when you
> run it a second time? What time do you get from running just the
> function versus the SQL in the function?

plpgsql functions are at least partially compiled (sql functions afaik
are not), in that a internal state is generated following the first
execution.  This is the cause of all those infernal 'invalid table oid'
errors.
 
> Also, remember that every layer you add to the cake means more work
for
> the database. If speed is that highly critical you'll probably want to
> not wrap things in functions, and possibly not use views either.

The overhead of the function/view is totally inconsequential next to the
planner choosing a suboptimal plan.  The purpose of the function is to
coerce the planner into choosing the correct plan.

> Also, keep in mind that getting below 1ms doesn't automatically mean
> you'll be able to scale to 1000TPS. Things will definately change when
> you load the system down, so if performance is that critical you
should
> start testing with the system under load if you're not already.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Indexes on ramdisk

2005-10-05 Thread Merlin Moncure
> It's a quad opteron system. RAID controller is a 4 channel LSILogic
> Megaraid
> 320 connected to 10 15k 36.7G SCSI disks. The disks are configured in
5
> mirrored partitions. The pg_xlog is on one mirror and the data and
indexes
> are spread over the other 4 using tablespaces. These numbers from
> pg_stat_user_tables are from about 2 hours earlier today on this one
> table.
> 
> 
> idx_scan   20578690
> idx_tup_fetch  35866104841
> n_tup_ins1940081
> n_tup_upd   1604041
> n_tup_del1880424

Is your raid controller configured to buffer your writes?  How much RAM
are you packing?  Are you running 64 bit?

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Ultra-cheap NVRAM device

2005-10-05 Thread Merlin Moncure
Chris wrote:
> [EMAIL PROTECTED] (Dan Harris) writes:
> > On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote:
> >
> >> I thought this might be interesting, not the least due to the
> >> extremely low
> >> price ($150 + the price of regular DIMMs):
> >
> > Replying before my other post came through.. It looks like their
> > benchmarks are markedly improved since the last article I read on
> > this.  There may be more interest now..
> 
> It still needs a few more generations worth of improvement.
> 
> 1.  It's still limited to SATA speed
> 2.  It's not ECC smart

3. Another zero (or two) on the price tag :).  While it looks like a fun
toy to play with, for it to replace hard drives in server environments
they need to provide more emphasis and effort in assuring people their
drive is reliable.

If they really wanted it to be adopted in server environments, it would
have been packaged in a 3.5" drive, not a pci card, since that's what we
all hot swap (especially since it already uses SATA interface).  They
would also have allowed use of 2 and 4gb DIMS, and put in a small hard
drive that the memory paged to when powered off, and completely isolated
the power supply...hard to pack all that in 60$.

That said, we are in the last days of the hard disk.  I think it is only
a matter of months before we see a sub 1000$ part which have zero
latency in the 20-40 GB range.  Once that happens economies of scale
will kick in and hard drives will become basically a backup device.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Status of Opteron vs Xeon

2005-10-07 Thread Merlin Moncure
> What's the current status of how much faster the Opteron is compared
to
> the
> Xeons?  I know the Opterons used to be close to 2x faster, but is that
> still
> the case?  I understand much work has been done to reduce the contect
> switching storms on the Xeon architecture, is this correct?

Up until two days ago (Oct 5) Intel has had no answer for AMD's dual
core offerings...unfortunately this has allowed AMD to charge top dollar
for dual core Opterons.  The Intel dual core solution on the P4 side
hasn't been very impressive particularly with regard to thermals.

My 90nm athlon 3000 at home runs very cool...if I underclock it a bit I
can actually turn off the cooling fan :).

IMO, right now it's AMD all the way, but if you are planning a big
purchase, it might be smart to wait a couple of months for the big price
realignment as Intel's dual xeons hit the retail channel.

Merlin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] count(*) using index scan in "query often, update rarely" environment

2005-10-07 Thread Merlin Moncure
On 10/7/05, Cestmir Hybl <[EMAIL PROTECTED]> wrote:
Isn't it possible (and reasonable) for these environments to keep track
of whether there is a transaction in progress with update to given table
and if not, use an index scan (count(*) where) or cached value
(count(*)) to perform this kind of query?


The answer to the first question is subtle.  Basically, the PostgreSQL
engine is designed for high concurrency.  We are definitely on the right
side of the cost/benefit tradeoff here.  SQL server does not have MVCC
(or at least until 2005 appears) so they are on the other side of the
tradeoff.

You can of course serialize the access yourself by materializing the
count in a small table and use triggers or cleverly designed
transactions.  This is trickier than it might look however so check the
archives for a thorough treatment of the topic.

One interesting thing is that making count(*) over large swaths of data
is frequently an indicator of a poorly normalized database.  Is it
possible to optimize the counting by laying out your data in a different
way?

Merlin



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-10-12 Thread Merlin Moncure
KC wrote:
> 
> So I guess it all comes back to the basic question:
> 
> For the query select distinct on (PlayerID) * from Player a where
> PlayerID='0' order by PlayerId Desc, AtDate Desc;
> can the optimizer recognise the fact the query is selecting by the
primary
> key (PlayerID,AtDate), so it can skip the remaining rows for that
> PlayerID,
> as if LIMIT 1 is implied?
> 
> Best regards, KC.

Hi KC, have you tried:
select * from player where playerid = '0' and atdate < 99
order by platerid desc, atdate desc limit 1;

??
Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help tuning postgres

2005-10-12 Thread Merlin Moncure
> The disk used for the data is an external raid array, I don't know
much
> about that right now except I think is some relatively fast IDE stuff.
> In any case the operations should be cache friendly, we don't scan
over
> and over the big tables...

Maybe you are I/O bound.  Do you know if your RAID array is caching your
writes?  Easy way to check is to run fsync off and look for obvious
performance differences.  Maybe playing with sync method could help
here.

Merlin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Help tuning postgres

2005-10-12 Thread Merlin Moncure
> 
> Would it not be faster to do a dump/reload of the table than reindex
or
> is it about the same?
> 
reindex is probably faster, but that's not the point. you can reindex a
running system whereas dump/restore requires downtime unless you work
everything into a transaction, which is headache, and dangerous.

reindex locking is very granular, in that it only acquires a excl. lock
on one index at a time and while doing so reading is possible (writes
will wait).

in 8.1 we get a fire and forget reindex database xyz which is about as
good as it gets without a dump/load or full vacuum.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Inefficient escape codes.

2005-10-19 Thread Merlin Moncure
Rodrigo wrote:
$$
As I understand it, the client needs to put the data into the server
using a textual-based command. This makes the 5MB data grow up-to 5x,
making it 25MB in the worst case. (Example: 0x01 -> \\001).

My question is:

1) Is there any way for me to send the binary field directly without
needing escape codes?
2) Will this mean that the client actually wastes my network bandwidth
converting binary data to text? Or does the client transparently manage
this?
$$ [snip]

I think the fastest, most efficient binary transfer of data to
PostgreSQL via C++ is a STL wrapper to libpq.  Previously I would not
have recommended libqpxx for this purpose although this may have changed
with the later releases.  As others have commented you most certainly
want to do this with the ExecParams/ExecPrepared interface.  If you want
to exclusively use libqxx then you need to find out if it exposes/wraps
this function (IIRC libpqxx build on top of libpq).

You can of course 'roll your own' libpq wrapper via STL pretty easily.
For example, here is how I am making my SQL calls from my COBOL apps:

typedef vector  stmt_param_strings;
typedef vector stmt_param_lengths;
typedef vector stmt_param_values;
typedef vector stmt_param_formats;

[...]

res = PQexecPrepared(   _connection, 
stmt.c_str(), 
num_params, 
¶m_values[0], 
¶m_lengths[0], 
¶m_formats[0], 
result_format);

Executing data this way is a direct data injection to/from the server,
no parsing/unparsing, no plan generating, etc.  Also STL vectors play
very nice with the libpq interface because it takes unterminated stings.


Merlin





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] cached plans in plpgsql

2005-10-20 Thread Merlin Moncure
Kuba wrote:

> is there an easy way to flush all cached query plans in pl/pgsql
> functions? I've long running sessions where data are changing and the
> plans become inaccurate after a while. I can imagine something like
> recreating all pl/pgsql functions. I can recreate them from sql source
> files but I'd prefer recreating them inside the database without
> accessing files outside. I can think only of one way - reconstructing
> function source code from pg_proc and EXECUTEing it. But it's not the
> cleanest solution (there isn't saved the actual source code anywhere
so
> there could be problems with quoting etc.). Can you see any other
> possibility? How do you solve this problem? [And yes, I don't want to
> re-connect...]

Start here:
http://archives.postgresql.org/pgsql-hackers/2005-09/msg00690.php

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] impact of stats_command_string

2005-10-25 Thread Merlin Moncure
> If I turn on stats_command_string, how much impact would it have on
> PostgreSQL server's performance during a period of massive data
> INSERTs?  I know that the answer to the question I'm asking will
> largely depend upon different factors so I would like to know in which
> situations it would be negligible or would have a signifcant impact.

First of all, we have to assume your writes are buffered in some way or
you are using transactions, or you will likely be i/o bound (or you have
a super fast disk setup).

Assuming that, I can tell you from experience on win32 that
stats_command_string can be fairly expensive for certain types of access
patterns.  What patterns?

1. If your ratio of queries to records involved is low.
2. If you are accessing data in a very quick way, for example via
prepared statements over a LAN
3. Your volume of queries is very high.

In these cases, the cost is high.  stats_command_string can add a
fractional millisecond ( ~.2  in my setup ) to statement latency and as
much as double cpu time in extreme cases...you are warned.  You may want
to turn it off before doing bulk loads or lengthy record iterations.

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] zero performance on query

2005-10-26 Thread Merlin Moncure
> look at this:
> select count(*) from fotos where archivo not in (select archivo from
> archivos)
> Aggregate  (cost=4899037992.36..4899037992.37 rows=1 width=0)
> ->  Seq Scan on fotos  (cost=22598.78..4899037338.07 rows=261716 width=0)
>Filter: (NOT (subplan))
>SubPlan
>  ->  Materialize  (cost=22598.78..39304.22 rows=805344 width=58)
>->  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344
> width=58)
> 
> I WILL DIE WAITING FOR QUERY RESPONSE !!!

Try:
select count(*) from fotos f where not exists (select archivo from archivos a 
where a.archivo = f.archivo) 

select count(*) from 
(
select archivo from fotos
except
select archivo from archivos
);

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] browsing table with 2 million records

2005-10-27 Thread Merlin Moncure
Christopher 
> > - Present a nifty date selector to choose the records from any
day,
> > hour,  minute, second
> > - show them, with "next day" and "previous day" buttons
> >
> > - It's more useful to the user (most likely he wants to know
what
> > happened on 01/05/2005 rather than view page 2857)
> > - It's faster (no more limit/offset ! just "date BETWEEN a AND
b",
> > indexed of course)
> > - no more new items pushing old ones to the next page while you
> browse
> > - you can pretend to your boss it's just like a paginated list
> 
> All very well and good, but now do it generically...

I've done it...  
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets.  Among other things, it has serious
problems if you have multiple updating your table.  Also it's kind of
silly to be doing this in a set based data paradigm.

The 'SQL' way to browse a dataset is by key.  If your key has multiple
parts or you are trying to sort on two or more fields, you are supposed
to use the row constructor:

select * from t where (x, y) > (xc, yc) order by x,y;

Unfortunately, this gives the wrong answer in postgresql :(.

The alternative is to use boolean logic.  Here is a log snippit from my
ISAM driver (in ISAM, you are *always* browsing datasets):

prepare system_read_next_menu_item_favorite_file_0 (character varying,
int4, int4, int4)
as select from system.menu_item_favorite_file
where mif_user_id >= $1 and 
(mif_user_id >  $1 or  mif_menu_item_id >= $2) and 
(mif_user_id >  $1 or  mif_menu_item_id >  $2 or
mif_sequence_no >  $3) 
order by mif_user_id, mif_menu_item_id, mif_sequence_no
limit $4

This is a Boolean based 'get next record' in a 3 part key plus a
parameterized limit.  You can do this without using prepared statements
of course but with the prepared version you can at least do 

execute system_read_next_menu_item_favorite_file_0('abc', 1, 2, 1);

Merlin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Best way to check for new data.

2005-10-31 Thread Merlin Moncure
There are a few ways to do this...thinking about it a bit, I would add a 
timestamp column to your log table (indexed) and keep a control table which 
keeps track of the last log print sweep operation.

The print operation would just do 
select * from log where logtime > (select lastlogtime());

The idea here is not to have to keep track of anything on the log table like a 
flag indicating print status, which will cause some bloat issues.  All you have 
to do is reindex once in a while.

lastlogtime() is a function which returns the last log time sweep from the 
control table.  we use a function declared immutable to force planner to treat 
as a constant (others might tell you to do different here).

Merlin


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rodrigo Madera
Sent: Friday, October 28, 2005 5:39 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Best way to check for new data.

I have a table that holds entries as in a ficticious table Log(id integer, msg 
text).
 
Lets say then that I have the program log_tail that has as it´s sole purpose to 
print newly added data elements.
 
What is the best solution in terms of performace?
 
Thank you for your time,
Rodrigo
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread Merlin Moncure
Kelly wrote:
> We are running some performance tests in which we are attempting to
> insert about 100,000,000 rows in a database at a sustained rate.
About
> 50M rows in, our performance drops dramatically.
> 
> This test is with data that we believe to be close to what we will
> encounter in production.  However in tests with purely generated,
> sequential data, we did not notice this slowdown.  I'm trying to
figure
> out what patterns in the "real" data may be causing us problems.
> 
> I have log,data and indexes on separate LUNs on an EMC SAN.  Prior to
> slowdown, each partition is writing at a consistent rate.  Index
> partition is reading at a much lower rate.  At the time of slowdown,
> index partition read rate increases, all write rates decrease.  CPU
> utilization drops.
> 
> The server is doing nothing aside from running the DB.  It is a dual
> opteron (dual core, looks like 4 cpus) with 4GB RAM.  shared_buffers =
> 32768.  fsync = off.  Postgres version is 8.1.b4.  OS is SuSE
Enterprise
> server 9.
> 
> My leading hypothesis is that one indexed column may be leading to our
> issue.  The column in question is a varchar(12) column which is
non-null
> in about 2% of the rows.  The value of this column is 5 characters
which
> are the same for every row, followed by a 7 character zero filled base
> 36 integer.  Thus, every value of this field will be exactly 12 bytes
> long, and will be substantially the same down to the last bytes.
> 
> Could this pattern be pessimal for a postgresql btree index?  I'm
> running a test now to see if I can verify, but my runs take quite a
long
> time...
> 
> If this sounds like an unlikely culprit how can I go about tracking
down
> the issue?

well, can you defer index generation until after loading the set (or use
COPY?)

if that index is causing the problem, you may want to consider setting
up partial index to exclude null values.

One interesting thing to do would be to run your inserting process until
slowdown happens, stop the process, and reindex the table and then
resume it, and see if this helps.

Merlin




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread Merlin Moncure
> > if that index is causing the problem, you may want to consider
setting
> > up partial index to exclude null values.
> 
> This is a single column index.  I assumed that null column values were
> not indexed.  Is my assumption incorrect?
> 
> -K
It turns out it is, or it certainly seems to be.  I didn't know that :).
So partial index will probably not help for null exclusion...

would be interesting to see if you are getting swaps (check pg_tmp) when
performance breaks down.  That is an easy fix, bump work_mem.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread Merlin Moncure
> On Mon, Oct 31, 2005 at 12:32:03PM -0500, Merlin Moncure wrote:
> > if that index is causing the problem, you may want to consider
setting
> > up partial index to exclude null values.
> 
> Hey all.
> 
> Pardon my ignorance. :-)
> 
> I've been trying to figure out whether null values are indexed or not
from
> the documentation. I was under the impression, that null values are
not
> stored in the index. Occassionally, though, I then see a suggestion
such
> as the above, that seems to indicate to me that null values *are*
stored
> in the index, allowing for the 'exclude null values' to have effect?
> 
> Which is it? :-)

I think I'm the ignorant one...do explain on any lookup on an indexed
field where the field value is null and you get a seqscan.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread Merlin Moncure
> [EMAIL PROTECTED] writes:
> > I've been trying to figure out whether null values are indexed or
not
> from
> > the documentation. I was under the impression, that null values are
not
> > stored in the index.
> 
> You're mistaken, at least with regard to btree indexes.

hmm. I tried several different ways to filter/extract null values from
an indexed key and got a seq scan every time.  The only way I could
query for/against null values was to convert to bool via function.

However I did a partial exclusion on a 1% non null value really big
table and index size dropped as expected.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 8.x index insert performance

2005-10-31 Thread Merlin Moncure
> select * from sometable where somefield IS NULL won't work because IS
is
> not a nomally indexible operator.

Ah, I didn't know that.  So there is no real reason not to exclude null
values from all your indexes :).  Reading Tom's recent comments
everything is clear now.

Instead of using your two index approach I prefer to:
create function nullidx(anyelement) returns boolean as $$ select $1 is
null; $$ language
sql immutable;

create index on t(nullidx(f)); -- etc

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] improvise callbacks in plpgsql

2005-11-01 Thread Merlin Moncure
hello performance minded administrators:

We have recently converted a number of routines that walk a bill of
materials (which is a nested structure) from the application side to the
server side via recursive plpgsql functions.  The performance is
absolutely fantastic but I have to maintain a specialized 'walker' for
each specific task that I have to do.  It would be very nice and elegant
if I could pass in the function for the walker to execute while it is
iterating through the bill of materials.  I have been beating my head
against the wall for the best way to do this so here I am shopping for
ideas.

A simplified idealized version of what I would like to do is 
begin
  select (callback_routine)(record_type)
end;

from within a plpgsql function.  I am borrowing the C syntax for a
function pointer here.  The problem I am running into is the only way to
do callbacks is via dynamic sql...however you can use higher level types
such as row/record type in dynamic sql (at least not efficiently).  I
could of course make a full dynamic sql call by expanding the record
type into a large parameter list but this is unwieldy and brittle.

Any thoughts?

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] solutions for new Postgresql application testing

2005-11-01 Thread Merlin Moncure
Geoffrey wrote:
> We are going live with a application in a few months that is a
complete
> rewrite of an existing application.  We are moving from an existing
> proprietary database to Postgresql.  We are looking for some
> insight/suggestions as to how folks test Postgresql in such a
situation.

Shouldn't you run your tests *before* rewriting your application? :).
You don't have to answer that.

> We're also trying to decide whether a single database with multiple
> schemas or multiple databases are the best solution.  We've done some
> research on this through the archives, and the answer seems to depend
on
> the database/application design.  Still, we welcome any generic ideas
on
> this issue as well.

I can help a little bit here.  Yes, this decision will be heavily
influenced by application design.  Let's assume you have to keep
multiple identical table sets (suppose you have multiple companies on
the same server for example).  Here are some general stipulations:

Reasons to use schemas:
* If you have a requirement where data must be queried from multiple
data stores at the same time, or between various data stores and a
shared area, this argues for schemas.  While it is possible to do this
without schemas via dblink, which is the postgresql inter-database rpc,
performance can be an issue and there is some overhead of setting it up.

* If you need to swap out data stores on the fly without reconnecting,
then this argues strongly in favor of schemas.  With schemas, you can
manipulate which datastore you are using by simply manipulating the
search_path.  There is one big caveat to this: your non dynamic pl/pgsql
functions will stick to the tables they use following the first time you
run them like suction cups.  Worse, your sql functions will stick to the
tables they refer to when compiled, making non-immutable sql functions a
no-no in a multi-schema environment.  However, there is a clever
workaround to this by force recompiling you pl/pgsql functions (search
the recent archives on this list).

* Finally, since multiple schemas can share a common public area, this
means that if you have to deploy database features that apply to all of
your datastores, you can sometimes get away with sticking them in a
public area of the databse...server side utility functions are an
example of this.

Reasons to use databases:
* Certain third party tools may have trouble with schemas.

* Manipulating the search path can be error prone and relatively
tedious.

* Database are more fully separate.  I run multi schema, and I make
heavy use of the userlock contrib module.  This means I have to take
special care not to have inter-schema overlap of my lock identifier.
There are other cases where this might bite you, for example if you
wanted one data store to respond to notifications but not another.
These are solvable problems, but can be a headache.

In short, there are pros and cons either way.  If it's any help, the
servers I administrate, which have *really complex* data interdependency
and isolation requirements, use schemas for the extra flexibility.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] improvise callbacks in plpgsql

2005-11-01 Thread Merlin Moncure
> The body of callit() need be little more than OidFunctionCall1()
> plus whatever error checking and security checking you want to
> include.

esp=# create table test(f text);
CREATE TABLE

esp=# create function test() returns void as 
$$ 
begin 
insert into test values ('called'); 
end; 
$$ language plpgsql;

esp=# create or replace function test2() returns void as
esp-# $$
esp$# declare
esp$# r record;
esp$# begin
esp$# select into r 'abc';
esp$# perform callit('test()'::regprocedure, r);
esp$# end;
esp$#
esp$# $$ language plpgsql;
CREATE FUNCTION

esp=# select test2();

esp=# select * from test;
   f

 called
(1 row)

one word...
w00t

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] insert performance for win32

2005-11-02 Thread Merlin Moncure
> I've done the tests with rc1. This is still as slow on windows ...
about
> 6-10
> times slower thant linux (via Ip socket). (depending on using prepared
> queries, etc...)
> 
> By the way, we've tried to insert into the windows database from a
linux
> psql
> client, via the network. In this configuration, inserting is only
about 2
> times slower than inserting locally (the linux client had a slower CPU
> 1700Mhz agains 3000).
> Could it be related to a problem in the windows psql client ?

[OK, I'm bringing this back on-list, and bringing it to QingQing's
attention, who I secretly hope is the right person to be looking at this
problem :)]

Just to recap Marc and I have been looking at the performance disparity
between windows and linux for a single transaction statement by
statement insert on a very narrow table with no keys from a remote
client.  Marc's observations showed (and I verified) that windows is
much slower in this case than it should be.  I gprof'ed both the psql
client and the server during the insert and didn't see anything
seriously out of order...unfortunately QQ's latest win32 performance
tweaks haven't helped.

Marc's observation that by switching to a linux client drops time down
drastically is really intersing!

Merlin



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] improvise callbacks in plpgsql

2005-11-02 Thread Merlin Moncure
> Would you be willing to write up an example of this? We often get
asked
> about support for WITH, so I bet there's other people who would be
very
> interested in what you've got.

Sure. In fact, I had already decided this to be the next topic on my
blog.  I'm assuming you are asking about tools to deal with recursive
sets in postgresql.  A plpgsql solution is extremely fast, tight, and
easy if you do it right...Tom's latest suggestions (I have to flesh this
out some more) provide the missing piece puzzle to make it really tight
from a classic programming perspective.  I don't miss the recursive
query syntax at all...IMO it's pretty much a hack anyways (to SQL).   

Merlin




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] improvise callbacks in plpgsql

2005-11-02 Thread Merlin Moncure
> Would you be willing to write up an example of this? We often get
asked
> about support for WITH, so I bet there's other people who would be
very
> interested in what you've got.
> 
You can see my blog on the subject here:
http://www.postgresql.org/docs/8.0/interactive/plpgsql.html#PLPGSQL-ADVA
NTAGES


It doesn't touch the callback issue.  I'm going to hit that at a later
date, a review would be helpful!

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] improvise callbacks in plpgsql

2005-11-02 Thread Merlin Moncure
oops. my blog is here: :-)
http://people.planetpostgresql.org/merlin/

>
http://www.postgresql.org/docs/8.0/interactive/plpgsql.html#PLPGSQL-ADVA
> NTAGES

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sorted union

2005-11-03 Thread Merlin Moncure
>  selectwhen_stopped as when_happened,
>1 as order_hint
>  from  transaction t
>  where '2005-10-25 15:00:00' <= when_stopped
>and when_stopped <= '2005-10-26 10:00:00'
>  union all
>  selectwhen_stopped as when_happened,
>2 as order_hint
>  from  transaction t
>  where '2005-10-25 15:00:00' <= when_stopped
>and when_stopped <= '2005-10-26 10:00:00'
>  order by  when_happened, order_hint;

hmm, try pushing the union into a subquery...this is better style
because it's kind of ambiguous if the ordering will apply before/after
the union.

select q.when from
(
 select 1 as hint, start_time as when [...]
 union all
 select 2 as hint, end_time as when [...]
) q order by q.seq, when

question: why do you want to flatten the table...is it not easier to
work with as records?

Merlin
 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Function with table%ROWTYPE globbing

2005-11-03 Thread Merlin Moncure
> Postgresql 8.0.4 using plpgsql
> 
> The basic function is set up as:
> CREATE FUNCTION add_data(t_row mytable) RETURNS VOID AS $func$
> DECLARE
> newtable text;
> thesql text;
> BEGIN
> INSERT INTO newtable thename from mytable where lookup.id =
> t_row.id;
> thesql := 'INSERT INTO ' || newtable || VALUES (' || t_row.* ')';
> EXECUTE thesql;
> RETURN;
> END;
> $func$ LANGUAGE plpgsql VOLATILE;
> 
> SELECT add_data(t.*) FROM mytable t where 
> ERROR:  column "*" not found in data type mytable
> 
> Now I have tried to drop the * but then there is no concatenation
> function to join text to a table%ROWTYPE. So my question is how can I
> make this dynamic insert statement without listing out every
> t_row.colname? Or, alternatively, is there a better way to parse out
> each row of a table into subtables based on a column value?

I don't think it's possible.  Rowtypes, etc are not first class yet (on
to do).  What I would do is pass the table name, where clause, etc into
the add_data function and rewrite as insert...select and do the whole
thing in one operation.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sorted union

2005-11-03 Thread Merlin Moncure

> Merlin Moncure wrote:
> > hmm, try pushing the union into a subquery...this is better style
> > because it's kind of ambiguous if the ordering will apply
before/after
> > the union.
> 
> Seems to be a little slower. There's a new "subquery scan" step.

I figured.  However it's more correct, I'm not sure if the original
query is necessarily guaranteed to give the right answer (in terms of
ordering).  It might though.

> 
> > question: why do you want to flatten the table...is it not easier to
> > work with as records?
> 
> For most things, yes. But I'm making a bunch of different graphs from
> these data, and a few of them are much easier with events. The best
> example is my concurrency graph. Whenever there's a start event, it
goes
> up one. Whenever there's a stop event, it goes down one. It's
completely
> trivial once you have it separated into events.

well, if you don't mind attempting things that are not trivial, how
about trying: 

select t, (select count(*) from  transaction where t between happened
and when_stopped) from
(
select ((generate_series(1,60) * scale)::text::interval) + '12:00
pm'::time as t
) q;
for example, to check concurrency at every second for a minute (starting
from 1 second) after 12:00 pm, (scale is zero in this case),

select t, (select count(*) from  transaction where t between happened
and when_stopped) from
(
select (generate_series(1,60)::text::interval) + '12:00 pm'::time as
t
) q;

this could be a win depending on how much data you pull into your
concurrency graph.  maybe not though.  

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] insert performance for win32

2005-11-03 Thread Merlin Moncure
> On Wed, 2 Nov 2005, Merlin Moncure wrote:
> If you put client/server on the same machine, then we don't know how
the
> CPU is splitted. Can you take a look at the approximate number by
> observing the task manager data while running?

ok, I generated a test case which was 250k inserts to simple two column
table all in single transaction.  Every 50k inserts, time is recorded
via timeofday().  

Running from remote, Time progression is:
First  50k: 20 sec
Second: 29 sec
[...]
final:: 66 sec

so, clear upward progression of time/rec.  Initial time is 2.5k
inserts/sec which is decent but not great for such a narrow table.  CPU
time on server starts around 50% and drops in exact proportion to insert
performance.  My earlier gprof test also suggest there is no smoking gun
sucking down all the cpu time.

cpu time on the client is very volatile but with a clear increase over
time starting around 20 and ending perhaps 60. My client box is pretty
quick, 3ghz p4.

Running the script locally, from the server, cpu time is pegged at 100%
and stays...first 50k is 23 sec with a much worse decomposition to
almost three minutes for final 50k.

Merlin



 
> If communication code is the suspect, can we measure the difference if
we
> disable the redefinition of recv()/send() etc in port/win32.h (may
require
> change related code a little bit as well). In this way, the socket
will
> not be able to pickup signals, but let see if there is any performance
> difference first.
> 
> Regards,
> Qingqing
> 
> 
> >
> > [OK, I'm bringing this back on-list, and bringing it to QingQing's
> > attention, who I secretly hope is the right person to be looking at
this
> > problem :)]
> >
> P.s. You scared me ;-)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Sorted union

2005-11-03 Thread Merlin Moncure
> Wow. I hadn't known about generate_series, but there are a bunch of
> places I've needed it.

It's a wonder tool :).
 
> But I think there is something I can do: I can just do a query of the
> transaction table sorted by start time. My graph tool can keep a

Reading the previous paragraphs I was just about to suggest this.  This
is a much more elegant method...you are reaping the benefits of having
normalized your working set.  You were trying to denormalize it back to
what you were used to.  Yes, now you can drop your index and simplify
your queries...normalized data is always more 'natural'.

> Mind you, I still think PostgreSQL should be able to perform that
> sorted union fast. Maybe sometime I'll have enough free time to take
> my first plunge into looking at a database query planner.

I'm not so sure I agree, by using union you were basically pulling two
independent sets (even if they were from the same table) that needed to
be ordered.  There is zero chance of using the index here for ordering
because you are ordering a different set than the one being indexed.
Had I not been able to talk you out of de-normalizing your table I was
going to suggest rigging up a materialized view and indexing that:

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sorted union

2005-11-03 Thread Merlin Moncure
> The ANSI/ISO specs are not at all ambiguous on this.  An
> ORDER BY is not allowed for the SELECT statements within
> a UNION.  It must come at the end and applied to the resulting
> UNION.

Interesting :/ 

Merlin

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] insert performance for win32

2005-11-03 Thread Merlin Moncure
> Both win32 send/recv have pgwin32_poll_signals() in them.  This is
> glorified WaitForSingleObjectEx on global pgwin32_signal_event.  This
is
> probably part of the problem.  Can we work some of the same magic you
put
> into check interrupts macro?

Whoop! following a cvs update I see this is already nailed :) Back to
the drawing board...

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] insert performance for win32

2005-11-03 Thread Merlin Moncure
> > Sorry, I don't follow you here - what do you mean to do? Remove the
> > event completely so we can't wait on it?
> >
> 
> I'd like to use the win32 provided recv(), send() functions instead of
> redirect them to pgwin32_recv()/pgwin32_send(), just like libpq does.
If
> we do this, we will lose some functionalities, but I'd like to see the
> performance difference first. -- do you think that will be any
difference?

I personally strongly doubt this will make a diffenrence.  Anyways I
think we might be looking at the wrong place.  Here was my test:
1. drop/create table two fields (id int, f text) no keys
2. begin
3. insert 500k rows.  every 50k get time get geometric growth in insert
time
4. commit

I am doing this via 
type dump.sql | psql -q mydb

I rearrange:
every 50k rows get time but also restart transaction.  I would ex

Guess what...no change.  This was a shocker.  So I wrap dump.sql with
another file that is just 
\i dump.sql
\i dump.sql

and get time to insert 50k recs resets after first dump...

Merlin 

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> You mean something like the attached?
not quite:  attached is a file to generate test.
to do it:

psql yadda
\i timeit.sql
\t
\o dump.sql
select make_dump(5, false);
\q
cat dump.sql  | psql -q yadda

and see what pops out.  I had to do it that way because redirecting psql
to dump file caused psql sit forever waiting on more with cpu load...

Merlin


timeit.sql
Description: timeit.sql

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> > You mean something like the attached?

oh, btw I ran timeit.c and performance is flat and fairly fast.  I'm
pretty sure psql is the culprit here.

Merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Searching union views not using indices

2005-11-04 Thread Merlin Moncure
> Hello everyone.
> 
> We are facing a performance problem with views consisting of several
> unioned tables. The simplified schema is as follows:
> 
> CREATE TABLE foo (
>   foo_object_id   bigint,
>   link_id bigint,
>   somedatatext,
>   PRIMARY KEY (foo_object_id) );

point 1:
well, you may want to consider:

create table foobar
( 
prefixtext, --  foo/bar/etc
  object_id   bigint,
link_id bigint,
primary key(prefix, object_id)
); -- add indexes as appropriate

and push foo/bar specific information to satellite table which refer
back via pkey-key link.  Now you get very quick and easy link id query
and no view is necessary.  You also may want to look at table
inheritance but make sure you read all the disclaimers first.

point 2: 
watch out for union, it is implied sort and duplicate filter.  union all
is faster although you may get duplicates.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> > not quite:  attached is a file to generate test.
> 
> > cat dump.sql  | psql -q yadda
> 
> Ah.  Does your psql have readline support?  if so, does adding -n to
> that command change anything?
> 

It doesn't, and it doesn't. :/  Ok, here's where it gets interesting. I
removed all the newlines from the test output (dump.sql) and got flat
times ;).  

Merlin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> That's bizarre ... I'd have thought a very long line would be more
> likely to trigger internal performance problems than the original.
> 
> What happens if you read the file with "psql -f dump.sql" instead
> of cat/stdin?

non-flat.  Also ran via \i and got non flat times.

> BTW, I get flat times for your psql test case on Unix, again both with
> local and remote client.  So whatever is going on here, it's
> Windows-specific.

yeah.  I'm guessing problem is in the mingw flex/bison (which I really,
really hope is not the case) or some other win32 specific block of code.
I'm snooping around there...

Merlin 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> 
> "Merlin Moncure" <[EMAIL PROTECTED]> writes:
> > yeah.  I'm guessing problem is in the mingw flex/bison (which I
really,
> > really hope is not the case) or some other win32 specific block of
code.
> > I'm snooping around there...
> 
> Maybe I'm confused here, but I thought we had established that the
local
> and remote cases behave differently for you?  If so I'd suppose that
it
> must be a networking issue, and there's little point in looking inside
> psql.
> 
The local case is *worse*...presumably because psql is competing with
the server for cpu time...cpu load is pegged at 100%.  On the remote
case, I'm getting 50-60% cpu load which is way to high.  The problem is
definitely in psql.

Merlin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
ok, here is gprof output from newlines/no newlines 
[newlines]
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 19.03  0.67 0.671 0.67 3.20  MainLoop
 17.61  1.29 0.62   500031 0.00 0.00  yylex
 15.63  1.84 0.55  1500094 0.00 0.00  GetVariable
 11.08  2.23 0.39   250018 0.00 0.00  SendQuery
  4.26  2.38 0.15   750051 0.00 0.00  GetVariableBool
  3.41  2.50 0.12   250024 0.00 0.00  SetVariable
  2.56  2.59 0.09   250015 0.00 0.00  gets_fromFile
  2.27  2.67 0.08   750044 0.00 0.00
yy_switch_to_buffer
  2.27  2.75 0.08   500031 0.00 0.00  psql_scan
  2.27  2.83 0.08 pg_strcasecmp
  1.70  2.89 0.06  4250078 0.00 0.00  emit
  1.70  2.95 0.06   500031 0.00 0.00  VariableEquals
  1.70  3.01 0.06   250018 0.00 0.00  AcceptResult
  1.42  3.06 0.05   250018 0.00 0.00  ResetCancelConn

[no newlines]
  %   cumulative   self  self total   
 time   seconds   secondscalls   s/call   s/call  name
 23.01  0.26 0.26   250019 0.00 0.00  yylex
 19.47  0.48 0.22   250018 0.00 0.00  SendQuery
 11.50  0.61 0.13  170 0.00 0.00  GetVariable
  9.73  0.72 0.11   250042 0.00 0.00  pg_strdup
  9.73  0.83 0.11   250024 0.00 0.00  SetVariable
  6.19  0.90 0.07   500039 0.00 0.00  GetVariableBool
  5.31  0.96 0.06 pg_strcasecmp
  4.42  1.01 0.05  4250078 0.00 0.00  emit
  2.65  1.04 0.031 0.03 1.01  MainLoop

ok, mingw gprof is claiming MainLoop is a culprit here, along with
general efficiency penalty otherwise in several things (twice many calls
to yylex, 33%more to getvariable, etc).  Just for fun I double checked
string len of query input to SendQuery and everything is the right
length.

Same # calls to SendQuery, but 2.5 times call time in newlines
case...anything jump out?  

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
Nailed it.

problem is in mainloop.c -> setup_cancel_handler.  Apparently you can
have multiple handlers and windows keeps track of them all, even if they
do the same thing.  Keeping track of so many system handles would
naturally slow the whole process down.  Commenting that line times are
flat as a pancake.  I am thinking keeping track of a global flag would
be appropriate.  


Merlin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] insert performance for win32

2005-11-04 Thread Merlin Moncure
> "Merlin Moncure" <[EMAIL PROTECTED]> writes:
> > Nailed it.
> 
> > problem is in mainloop.c -> setup_cancel_handler.  Apparently you
can
> > have multiple handlers and windows keeps track of them all, even if
they
> > do the same thing.  Keeping track of so many system handles would
> > naturally slow the whole process down.
> 
> Yipes.  So we really want to do that only once.
> 
> AFAICS it is appropriate to move the sigsetjmp and
setup_cancel_handler
> calls in front of the per-line loop inside MainLoop --- can anyone see
> a reason not to?

hm. mainloop is re-entrant, right?  That means each \i would reset the
handler...what is downside to keeping global flag?


> I'm inclined to treat this as an outright bug, not just a minor
certainly...

> performance issue, because it implies that a sufficiently long psql
> script would probably crash a Windows machine.

actually, it's worse than that, it's more of a dos on the whole system,
as windows will eventually stop granting handles, but there is a good
chance of side effects on other applications.

Merlin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Some help on buffers and other performance tricks

2005-11-10 Thread Merlin Moncure
> The point Gentlemen, was that Good Architecture is King.  That's what
I
> was trying to emphasize by calling proper DB architecture step 0.  All
> other things being equal (and they usually aren't, this sort of stuff
is
> _very_ context dependent), the more of your critical schema that you
can
> fit into RAM during normal operation the better.
> 
> ...and it all starts with proper DB design.  Otherwise, you are quite
> right in stating that you risk wasting time, effort, and HW.
> 
> Ron

+1!

I answer lots of question on this list that are in the form of 'query x
is running to slow'.  Often, the first thing that pops in my mind is
'why are you running query x in the first place?'  

The #1 indicator that something is not right is 'distinct' clause.
Distinct (and its evil cousin, union) are often brought in to address
problems.

The human brain is the best optimizer.  Even on old hardware the server
can handle a *lot* of data.  It's just about where we add
inefficiency...lousy database designs lead to lousy queries or (even
worse) extra application code.

Merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (5TB)

2005-11-15 Thread Merlin Moncure
> Hardware-wise I'd say dual core opterons. One dual-core-opteron
> performs better than two single-core at the same speed. Tyan makes
> some boards that have four sockets, thereby giving you 8 cpu's (if you
> need that many). Sun and HP also makes nice hardware although the Tyan
> board is more competetive priced.

just FYI: tyan makes a 8 socket motherboard (up to 16 cores!):
http://www.swt.com/vx50.html

It can be loaded with up to 128 gb memory if all the sockets are filled
:).

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


  1   2   3   4   5   6   7   8   9   10   >