Re: [PERFORM] 121+ million record table perf problems

2007-05-18 Thread Craig James




I've got a table with ~121 million records in it.  Select count on it 
currently takes ~45 minutes, and an update to the table to set a value 
on one of the columns I finally killed after it ran 17 hours and had 
still not completed.  Queries into the table are butt slow, and


The update query that started this all I had to kill after 17hours.  
It should have updated all 121+ million records.  That brought my 
select count down to 19 minutes, but still a far cry from acceptable.


If you have a column that needs to be updated often for all rows, 
separate it into a different table, and create a view that joins it back 
to the main table so that your application still sees the old schema.


This will greatly speed your update since (in Postgres) and update is 
the same as a delete+insert.  By updating that one column, you're 
re-writing your entire 121 million rows.  If you separate it, you're 
only rewriting that one column.  Don't forget to vacuum/analyze and 
reindex when you're done.


Better yet, if you can stand a short down time, you can drop indexes on 
that column, truncate, then do 121 million inserts, and finally 
reindex.  That will be MUCH faster.


Craig



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Auto-ANALYZE?

2007-05-23 Thread Craig James

Auto-vacuum has made Postgres a much more friendly system.  Is there some 
reason the planner can't also auto-ANALYZE in some situations?

Here's an example I ran into:

  create table my_tmp_table (...);
  insert into my_tmp_table (select some stuff from here and there);
  select ... from my_tmp_table join another_table on (...);

The last statement generated a horrible plan, because the planner had no idea 
what was in the temporary table (which only had about 100 rows in it).  Simply 
inserting an ANALYZE before the SELECT improved performance by a factor of 100 
or so.

There are several situations where you could automatically analyze the data.

1. Any time you have to do a full table scan, you might as well throw in an 
ANALYZE of the data you're scanning.  If I understand things, ANALYZE takes a 
random sample anyway, so a full table scan should be able to produce even 
better statistics than a normal ANALYZE.

2. If you have a table with NO statistics, the chances of generating a sensible 
plan are pretty random.  Since ANALYZE is quite fast, if the planner encounters 
no statistics, why not ANALYZE it on the spot?  (This might need to be a 
configurable feature, though.)

3. A user-configurable update threshold, such as, When 75% of the rows have changed 
since the last ANALYZE, trigger an auto-analyze.  The user-configurable part would 
account for the fact that some tables stats don't change much even after many updates, 
but others may need to be reanalyzed after a modest number of updates.

Auto-vacuum, combined with auto-analyze, would eliminate many of the problems that plague 
neophyte (and sometimes experienced) users of Postgres.  A substantial percentage of the 
questions to this list are answered with, Have you ANALYZED?

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread Craig James

Mark Lewis wrote:


PG could scan the index looking for matches first and only load the
actual rows if it found a match, but that could only be a possible win
if there were very few matches, because the difference in cost between a
full index scan and a sequential scan would need to be greater than the
cost of randomly fetching all of the matching data rows from the table
to look up the visibility information.  


Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings?  I thought indexes only had to 
store the string up to the point where there was no ambiguity, for example, if I have missing, mississippi and 
misty, the index only needs missin, missis and mist in the actual index.  This would make 
it impossible to use a full index scan for a LIKE query.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] LIKE search and performance

2007-05-24 Thread Craig James

Alvaro Herrera wrote:
 Just out of curiosity: Does Postgress store a duplicate of the data in the 
index, even for long strings?  I thought indexes only had to store the 
string up to the point where there was no ambiguity, for example, if I have 
missing, mississippi and misty, the index only needs missin, 
missis and mist in the actual index.


What would happen when you inserted a new tuple with just miss?  You
would need to expand all the other tuples in the index.


That's right.  This technique used by some index implementations is a tradeoff between 
size and update speed.  Most words in most natural languages can be distinguished by the 
first few characters.  The chances of having to modify more than a few surrounding nodes 
when you insert miss is small, so some implementations choose this method.  
Other implementations choose to store the full string.  I was just curious which method 
Postgres uses.

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] ECC RAM really needed?

2007-05-25 Thread Craig James

We're thinking of building some new servers.  We bought some a while back that 
have ECC (error correcting) RAM, which is absurdly expensive compared to the 
same amount of non-ECC RAM.  Does anyone have any real-life data about the 
error rate of non-ECC RAM, and whether it matters or not?  In my long career, 
I've never once had a computer that corrupted memory, or at least I never knew 
if it did.  ECC sound like a good idea, but is it solving a non-problem?

Thanks,
Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?

2007-05-29 Thread Craig James

On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote:
   The point I'm worried is performance. Do you think the performance 
would be better executing exactly the same queries only adding an extra 
column to all the tables e.g. customer_id, than open a connection to the 
only one customers DB and execute the query there?


There is no simple answer to this question; it depends too much on your data.  
In many cases, adding a customer_id to every table, and perhaps also 
per-customer views (per Jeff's suggestion), can work really well.

However, performance is not the only consideration, or even the main 
consideration.  We operate with about 150 separate databases.  In our cases, 
administration issues and software design outweighed performance issues.

For example, with separate databases, security is simpler, *and* it's easy to 
convince the customer that their data is protected.  Creating views only helps 
for read-only access.  When the customer wants to modify their data, how will 
you keep them from accessing and overwriting one another's data?  Even with 
views, can you convince the customer you've done it right?  With separate 
databases, you use the built-in security of Postgres, and don't have to 
duplicate it in your schema and apps.

With separate databases, it's really easy to discard a customer.  This can be 
particularly important for a big customer with millions of linked records.  In 
a database-for-everyone design, you'll have lots of foreign keys, indexes, etc. 
that make deleting a whole customer a REALLY big job.  Contrast that with just 
discarding a whole database, which typically takes a couple seconds.

But even more important (to us) is the simplicity of the applications and management.  
It's far more than just an extra  ... and customer = xyz added to every 
query.  Throwing the customers together means every application has to understand 
security, and many operations that would be simple become horribly tangled.  Want to back 
up a customer's data?  You can't use pg_dump, you have to write your own dump app.  Want 
to restore a customer's data?  Same.  Want to do a big update?  Your whole database is 
affected and probably needs to be vacuum/analyzed.  On and on, at every turn, management 
and applications are more complex.

If you have hundreds of separate databases, it's also easy to scale: Just buy 
more servers, and move some of the databases.  With a single monster database, 
as load increases, you may hit the wall sooner or later.

Postgres is really good at maintaining many separate databases.  Why do it 
yourself?

There are indeed performance issues, but even that's not black and white.  
Depending on the specifics of your queries and the load on your servers, you 
may get better performance from a single monster database, or from hundreds of 
separate databases.

So, your question has no simple answer.  You should indeed evaluate the 
performance, but other issues may dominate your decision.

Craig



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Autodetect of software RAID1+0 fails

2007-06-01 Thread Craig James

Apologies for a somewhat off-topic question, but...

The Linux kernel doesn't properly detect my software RAID1+0 when I boot up.  
It detects the two RAID1 arrays, the partitions of which are marked properly.  
But it can't find the RAID0 on top of that, because there's no corresponding 
device to auto-detect.  The result is that it creates /dev/md0 and /dev/md1 and 
assembles the RAID1 devices on bootup, but /dev/md2 isn't created, so the RAID0 
can't be assembled at boot time.

Here's what it looks like:

$ cat /proc/mdstat 
Personalities : [raid0] [raid1] 
md2 : active raid0 md0[0] md1[1]

 234436224 blocks 64k chunks
 
md1 : active raid1 sde1[1] sdc1[2]

 117218176 blocks [2/2] [UU]
 
md0 : active raid1 sdd1[1] sdb1[0]

 117218176 blocks [2/2] [UU]

$ uname -r
2.6.12-1.1381_FC3

After a reboot, I always have to do this:

 mknod /dev/md2 b 9 2
 mdadm --assemble /dev/md2 /dev/md0 /dev/md1
 mount /dev/md2

What am I missing here?

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

[EMAIL PROTECTED] wrote:
various people (not database experts) are pushing to install Oracle 
cluster so that they can move all of these to one table with a 
customerID column.


They're blowing smoke if they think Oracle can do this.  One of my applications 
had this exact same problem -- table-per-customer versus 
big-table-for-everyone.  Oracle fell over dead, even with the best indexing 
possible, tuned by the experts, and using partitions keyed to the customerID.

We ended up breaking it up into table-per-customer because Oracle fell over 
dead when we had to do a big update on a customer's entire dataset.  All other 
operations were slowed by the additional index on the customer-ID, especially 
complex joins.  With a table-for-everyone, you're forced to create tricky 
partitioning or clustering, clever indexes, and even with that, big updates are 
problematic.  And once you do this, then you become heavily tied to one RDBMS 
and your applications are no longer portable, because clustering, indexing, 
partitioning and other DB tuning tricks are very specific to each RDBMS.

When we moved to Postgres, we never revisited this issue, because both Oracle 
and Postgres are able to handle thousands of tables well.  As I wrote in a 
previous message on a different topic, often the design of your application is 
more important than the performance.  In our case, the table-per-customer makes 
the applications simpler, and security is MUCH easier.

Oracle is simply not better than Postgres in this regard.  As far as I know, 
there is only one specific situation (discussed frequently here) where Oracle 
is faster: the count(), min() and max() functions, and I know significant 
progress has been made since I started using Postgres.  I have not found any 
other query where Oracle is significantly better, and I've found several where 
Postgres is the clear winner.

It's telling that Oracle's license contract prohibits you from publishing 
comparisons and benchmarks.  You have to wonder why.

Craig

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Scott Marlowe wrote:
OTOH, there are some things, like importing data, which are MUCH faster 
in pgsql than in the big database.


An excellent point, I forgot about this. The COPY command is the best thing 
since the invention of a shirt pocket.  We have a database-per-customer design, 
and one of the mosterous advantages of Postgres is that we can easily do 
backups.  A pg_dump, then scp to a backup server, and in just a minute or two 
we have a full backup.  For recovery, pg_restore is equally fast and amazing.  
Last time I checked, Oracle didn't have anything close to this.

Craig



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Thousands of tables versus on table?

2007-06-06 Thread Craig James

Jonah H. Harris wrote:

On 6/6/07, Craig James [EMAIL PROTECTED] wrote:

They're blowing smoke if they think Oracle can do this.


Oracle could handle this fine.


Oracle fell over dead, even with the best indexing possible,
tuned by the experts, and using partitions keyed to the
customerID.


I don't think so, whoever tuned this likely didn't know what they were 
doing.


Wrong on both counts.

You didn't read my message.  I said that *BOTH* Oracle and Postgres performed 
well with table-per-customer.  I wasn't Oracle bashing.  In fact, I was doing 
the opposite: Someone's coworker claimed ORACLE was the miracle cure for all 
problems, and I was simply pointing out that there are no miracle cures.  (I 
prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used 
extensively.)

The technical question is simple: Table-per-customer or big-table-for-everyone.  The 
answer is, it depends.  It depends on your application, your 
read-versus-write ratio, the table size, the design of your application software, and a 
dozen other factors.  There is no simple answer, but there are important technical 
insights which, I'm happy to report, various people contributed to this discussion.  
Perhaps you have some technical insight too, because it really is an important question.

The reason I assert (and stand by this) that They're blowing smoke when they 
claim Oracle has the magic cure, is because Oracle and Postgres are both relational 
databases, they write their data to disks, and they both have indexes with O(log(N)) 
retrieval/update times.  Oracle doesn't have a magical workaround to these facts, nor 
does Postgres.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-07 Thread Craig James

Tyrrill, Ed wrote:


I have a table, let's call it A, whose primary key, a_id, is referenced
in a second table, let's call it B.  For each unique A.a_id there are
generally many rows in B with the same a_id.  My problem is that I want
to delete a row in A when the last row in B that references it is
deleted.  Right now I just query for rows in A that aren't referenced by
B, and that worked great when the tables were small, but it takes over
an hour now that the tables have grown larger (over 200 million rows in
B and 14 million in A).  The delete has to do a sequential scan of both
tables since I'm looking for what's not in the indexes.

I was going to try creating a trigger after delete on B for each row to
check for more rows in B with the same a_id, and delete the row in A if
none found.  In general I will be deleting 10's of millions of rows from
B and 100's of thousands of rows from A on a daily basis.  What do you
think?  Does anyone have any other suggestions on different ways to
approach this?


Essentially what you're doing is taking the one-hour job and spreading out in 
little chunks over thousands of queries.  If you have 10^7 rows in B and 10^5 
rows in A, then on average you have 100 references from B to A.  That means 
that 99% of the time, your trigger will scan B and find that there's nothing to 
do.  This could add a lot of overhead to your ordinary transactions, costing a 
lot more in the long run than just doing the once-a-day big cleanout.

You didn't send the specifics of the query you're using, along with an EXPLAIN 
ANALYZE of it in operation.  It also be that your SQL is not optimal, and that 
somebody could suggest a more efficient query.

It's also possible that it's not the sequential scans that are the problem, but 
rather that it just takes a long time to delete 100,000 rows from table A 
because you have a lot of indexes. Or it could be a combination of performance 
problems.

You haven't given us enough information to really analyze your problem.  Send 
more details!

Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best way to delete unreferenced rows?

2007-06-11 Thread Craig James

Tyrrill, Ed wrote:

QUERY PLAN



---
 Merge Left Join  (cost=38725295.93..42505394.70 rows=13799645 width=8)
(actual time=6503583.342..8220629.311 rows=93524 loops=1)
   Merge Cond: (outer.record_id = inner.record_id)
   Filter: (inner.record_id IS NULL)
   -  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..521525.10 rows=13799645 width=8) (actual
time=15.955..357813.621 rows=13799645 loops=1)
   -  Sort  (cost=38725295.93..39262641.69 rows=214938304 width=8)
(actual time=6503265.293..7713657.750 rows=214938308 loops=1)
 Sort Key: backup_location.record_id
 -  Seq Scan on backup_location  (cost=0.00..3311212.04
rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308
loops=1)
 Total runtime: 8229178.269 ms
(8 rows)

I ran vacuum analyze after the last time any inserts, deletes, or
updates were done, and before I ran the query above.  I've attached my
postgresql.conf.  The machine has 4 GB of RAM.


I thought maybe someone with more expertise than me might answer this, but 
since they haven't I'll just make a comment.  It looks to me like the sort of 
214 million rows is what's killing you.  I suppose you could try to increase 
the sort memory, but that's a lot of memory.  It seems to me an index merge of 
a relation this large would be faster, but that's a topic for the experts.

On a theoretical level, the problem is that it's sorting the largest table.  
Perhaps you could re-cast the query so that it only has to sort the smaller 
table, something like

  select a.id from a where a.id not in (select distinct b.id from b)

where b is the smaller table.  There's still no guarantee that it won't do a sort on 
a, though.  In fact one of the clever things about Postgres is that it can convert a query like 
the one above into a regular join, unless you do something like select ... offset 0 which blocks 
the optimizer from doing the rearrangement.

But I think the first approach is to try to tune for a better plan using your 
original query.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] test / live environment, major performance difference

2007-06-11 Thread Craig James


On 2007-06-11 Christo Du Preez wrote:

I really hope someone can shed some light on my problem. I'm not sure
if this is a posgres or potgis issue.

Anyway, we have 2 development laptops and one live server, somehow I
managed to get the same query to perform very well om my laptop, but
on both the server and the other laptop it's really performing bad.


One simple possibility that bit me in the past: If you do pg_dump/pg_restore to 
create a copy of the database, you have to ANALYZE the newly-restored database. 
 I mistakenly assumed that pg_restore would do this, but you have to run 
ANALYZE explicitely after a restore.

Craig


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Replication

2007-06-14 Thread Craig James

Looking for replication solutions, I find:

Slony-I
Seems good, single master only, master is a single point of failure,
no good failover system for electing a new master or having a failed
master rejoin the cluster.  Slave databases are mostly for safety or
for parallelizing queries for performance.  Suffers from O(N^2) 
communications (N = cluster size).


Slony-II
Seems brilliant, a solid theoretical foundation, at the forefront of
computer science.  But can't find project status -- when will it be
available?  Is it a pipe dream, or a nearly-ready reality?

PGReplication
Appears to be a page that someone forgot to erase from the old GBorg site.

PGCluster
Seems pretty good, but web site is not current, there are releases in use
that are not on the web site, and also seems to always be a couple steps
behind the current release of Postgres.  Two single-points failure spots,
load balancer and the data replicator.

Is this a good summary of the status of replication?  Have I missed any 
important solutions or mischaracterized anything?

Thanks!
Craig


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Thanks to all who replied and filled in the blanks.  The problem with the web 
is you never know if you've missed something.

Joshua D. Drake wrote:

Looking for replication solutions, I find...
Slony-II

Dead


Wow, I'm surprised.  Is it dead for lack of need, lack of resources, too 
complex, or all of the above?  It sounded like such a promising theoretical 
foundation.

Ben wrote:

Which replication problem are you trying to solve?


Most of our data is replicated offline using custom tools tailored to our loading pattern, but we 
have a small amount of global information, such as user signups, system configuration, 
advertisements, and such, that go into a single small (~5-10 MB) global database used 
by all servers.

We need nearly-real-time replication, and instant failover.  That is, it's far more 
important for the system to keep working than it is to lose a little data.  Transactional integrity 
is not important.  Actual hardware failures are rare, and if a user just happens to sign up, or do 
save preferences, at the instant the global-database server goes down, it's not a 
tragedy.  But it's not OK for the entire web site to go down when the one global-database server 
fails.

Slony-I can keep several slave databases up to date, which is nice.  And I 
think I can combine it with a PGPool instance on each server, with the master 
as primary and few Slony-copies as secondary.  That way, if the master goes 
down, the PGPool servers all switch to their secondary Slony slaves, and 
read-only access can continue.  If the master crashes, users will be able to do 
most activities, but new users can't sign up, and existing users can't change 
their preferences, until either the master server comes back, or one of the 
slaves is promoted to master.

The problem is, there don't seem to be any vote a new master type of tools 
for Slony-I, and also, if the original master comes back online, it has no way to know 
that a new master has been elected.  So I'd have to write a bunch of SOAP services or 
something to do all of this.

I would consider PGCluster, but it seems to be a patch to Postgres itself.  I'm 
reluctant to introduce such a major piece of technology into our entire system, 
when only one tiny part of it needs the replication service.

Thanks,
Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Replication

2007-06-14 Thread Craig James

Andreas Kostyrka wrote:

Slony provides near instantaneous failovers (in the single digit seconds
 range). You can script an automatic failover if the master server
becomes unreachable.


But Slony slaves are read-only, correct?  So the system isn't fully functional 
once the master goes down.


That leaves you the problem of restarting your app
(or making it reconnect) to the new master.


Don't you have to run a Slony app to convert one of the slaves into the master?


5-10MB data implies such a fast initial replication, that making the
server rejoin the cluster by setting it up from scratch is not an issue.


The problem is to PREVENT it from rejoining the cluster.  If you have some 
semi-automatic process that detects the dead server and converts a slave to the 
master, and in the mean time the dead server manages to reboot itself (or its 
network gets fixed, or whatever the problem was), then you have two masters 
sending out updates, and you're screwed.


The problem is, there don't seem to be any vote a new master type of
tools for Slony-I, and also, if the original master comes back online,
it has no way to know that a new master has been elected.  So I'd have
to write a bunch of SOAP services or something to do all of this.


You don't need SOAP services, and you do not need to elect a new master.
if dbX goes down, dbY takes over, you should be able to decide on a
static takeover pattern easily enough.


I can't see how that is true.  Any self-healing distributed system needs 
something like the following:

 - A distributed system of nodes that check each other's health
 - A way to detect that a node is down and to transmit that
   information across the nodes
 - An election mechanism that nominates a new master if the
   master fails
 - A way for a node coming online to determine if it is a master
   or a slave

Any solution less than this can cause corruption because you can have two nodes 
that both think they're master, or end up with no master and no process for 
electing a master.  As far as I can tell, Slony doesn't do any of this.  Is 
there a simpler solution?  I've never heard of one.


The point here is, that the servers need to react to a problem, but you
probably want to get the admin on duty to look at the situation as
quickly as possible anyway.


No, our requirement is no administrator interaction.  We need instant, 
automatic recovery from failure so that the system stays online.


Furthermore, you need to checkout pgpool, I seem to remember that it has
some bad habits in routing queries. (E.g. it wants to apply write
queries to all nodes, but slony makes the other nodes readonly.
Furthermore, anything inside a BEGIN is sent to the master node, which
is bad with some ORMs, that by default wrap any access into a transaction)


I should have been more clear about this.  I was planning to use PGPool in the 
PGPool-1 mode (not the new PGPool-2 features that allow replication).  So it 
would only be acting as a failover mechanism.  Slony would be used as the 
replication mechanism.

I don't think I can use PGPool as the replicator, because then it becomes a new 
single point of failure that could bring the whole system down.  If you're 
using it for INSERT/UPDATE, then there can only be one PGPool server.

I was thinking I'd put a PGPool server on every machine in failover mode only.  
It would have the Slony master as the primary connection, and a Slony slave as 
the failover connection.  The applications would route all INSERT/UPDATE 
statements directly to the Slony master, and all SELECT statements to the 
PGPool on localhost.  When the master failed, all of the PGPool servers would 
automatically switch to one of the Slony slaves.

This way, the system would keep running on the Slony slaves (so it would be 
read-only), until a sysadmin could get the master Slony back online.  And when 
the master came online, the PGPool servers would automatically reconnect and 
write-access would be restored.

Does this make sense?

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Replication

2007-06-18 Thread Craig James

Markus Schiltknecht wrote:
Not quite... there's still Postgres-R, see www.postgres-r.org  And I'm 
continuously working on it, despite not having updated the website for 
almost a year now...


I planned on releasing the next development snapshot together with 8.3, 
as that seems to be delayed, that seems realistic ;-)


Is Postgres-R the same thing as Slony-II?  There's a lot of info and news 
around about Slony-II, but your web page doesn't seem to mention it.

While researching replication solutions, I had a heck of a time sorting out the 
dead or outdated web pages (like the stuff on gborg) from the active projects.

Either way, it's great to know you're working on it.

Craig

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3

2007-06-29 Thread Craig James

Dolafi, Tom wrote:
min(fmin) |   max(fmin)|avg(fmin)  
   1  |   55296469 |11423945 


min(fmax) |   max(fmax)|avg(fmax)
  18  |   3288 |11424491

There are 5,704,211 rows in the table.


When you're looking for weird index problems, it's more interesting to know if 
there are certain numbers that occur a LOT.  From your statistics above, each 
number occurs about 10 times in the table.  But do some particular numbers 
occur thousands, or even millions, of times?

Here is a query that will print a list of the highest-occuring values.  You 
might expect a few occurances of 20, and maybe 30, but if you have thousands or 
millions of occurances of certain numbers, then that can screw up an index.

  select fmax, c from
   (select fmax, count(fmax) as c from your_table group by fmax) as foo
  where c  3 order by c desc;

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Join with lower/upper limits doesn't scale well

2007-07-02 Thread Craig James

I have the same schema in two different databases.  In smalldb, the two tables of 
interest have about 430,000 rows, in bigdb, the two tables each contain about 5.5 
million rows.  I'm processing the data, and for various reasons it works out well to process it in 
100,000 row chunks.  However, it turns out for the big schema, selecting 100,000 rows is the 
longest single step of the processing.

Below is the explain/analyze output of the query from each database.  Since 
both tables are indexed on the joined columns, I don't understand why the big 
table should be so much slower -- I hoped this would scale well, or at least 
O(log(N)), not O(N).

What's going on here?  I don't know if I'm reading this right, but it looks 
like the sort is taking all the time, but that doesn't make sense because in 
both cases it's sorting 100,000 rows.

Thanks,
Craig


bigdb= explain analyze
bigdb-   select r.row_num, m.molkeys from my_rownum r
bigdb-   join my_molkeys m on (r.version_id = m.version_id)
bigdb-   where r.row_num = 10 AND r.row_num  20
bigdb-   order by r.row_num;

Sort  (cost=431000.85..431248.23 rows=98951 width=363) (actual 
time=46306.748..46417.448 rows=10 loops=1)
  Sort Key: r.row_num
  -  Hash Join  (cost=2583.59..422790.68 rows=98951 width=363) (actual 
time=469.010..45752.131 rows=10 loops=1)
Hash Cond: (outer.version_id = inner.version_id)
-  Seq Scan on my_molkeys m  (cost=0.00..323448.30 rows=5472530 
width=363) (actual time=11.243..33299.933 rows=5472532 loops=1)
-  Hash  (cost=2336.21..2336.21 rows=98951 width=8) (actual 
time=442.260..442.260 rows=10 loops=1)
  -  Index Scan using i_chm_rownum_row_num on my_rownum r  
(cost=0.00..2336.21 rows=98951 width=8) (actual time=47.551..278.736 rows=10 
loops=1)
Index Cond: ((row_num = 10) AND (row_num  20))
Total runtime: 46543.163 ms


smalldb= explain analyze
smalldb-   select r.row_num, m.molkeys from my_rownum r
smalldb-   join my_molkeys m on (r.version_id = m.version_id)
smalldb-   where r.row_num = 10 AND r.row_num  20
smalldb-   order by r.row_num;

Sort  (cost=43598.23..43853.38 rows=102059 width=295) (actual 
time=4097.180..4207.733 rows=10 loops=1)
  Sort Key: r.row_num
  -  Hash Join  (cost=2665.09..35107.41 rows=102059 width=295) (actual 
time=411.635..3629.756 rows=10 loops=1)
Hash Cond: (outer.version_id = inner.version_id)
-  Seq Scan on my_molkeys m  (cost=0.00..23378.90 rows=459590 
width=295) (actual time=8.563..2011.455 rows=459590 loops=1)
-  Hash  (cost=2409.95..2409.95 rows=102059 width=8) (actual 
time=402.867..402.867 rows=10 loops=1)
  -  Index Scan using i_chm_rownum_row_num_8525 on my_rownum r  
(cost=0.00..2409.95 rows=102059 width=8) (actual time=37.122..242.528 rows=10 
loops=1)
Index Cond: ((row_num = 10) AND (row_num  20))
Total runtime: 4333.501 ms



Table bigdb.my_rownum
  Column   |  Type   | Modifiers 
+-+---
version_id | integer | 
parent_id  | integer | 
row_num| integer | 
Indexes:

   i_chm_rownum_row_num UNIQUE, btree (row_num)
   i_chm_rownum_version_id UNIQUE, btree (version_id)
   i_chm_rownum_parent_id btree (parent_id)



Table bigdb.my_molkeys
  Column   |  Type   | Modifiers 
+-+---
version_id | integer | 
molkeys| text| 
Indexes:

   i_chm_molkeys_version_id UNIQUE, btree (version_id)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
and   m.version_id = 320
and   m.version_id   330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It seemed to be ignoring the fact that it 
could push the between condition along to the second table, since the condition and the join are 
on the same indexed columns.  So, I added a redundant condition, and bingo, it was a lot faster.  In the 
analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a hot database 
that's been queried a couple of times.  In real life on a cold database, the times are more like 
10 seconds and 21 seconds, so it's quite significant.

Thanks,
Craig



db= explain analyze 
db- select r.version_id, r.row_num, m.molkeys from my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and   r.version_id   330
db- order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)
  Sort Key: r.version_id
  -  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
-  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 rows=44620 
width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id = 320) AND (version_id  330))
  -  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 
loops=1)
Index Cond: ((version_id = 320) AND (version_id  
330))
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947)
  Index Cond: (outer.version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db= explain analyze 
db- select r.version_id, r.row_num, m.molkeys from my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and r.version_id 330
db- and m.version_id =   320
db- and m.version_id 330
db- order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)
  Sort Key: r.version_id
  -  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)
Hash Cond: (outer.version_id = inner.version_id)
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 
loops=1)
  Index Cond: ((version_id = 320) AND (version_id  330))
-  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  -  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1)
Recheck Cond: ((version_id = 320) AND (version_id  
330))
-  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 
loops=1)
  Index Cond: ((version_id = 320) AND (version_id 
 330))
Total runtime: 1096.031 ms
(12 rows)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Equivalent queries produce different plans

2007-07-10 Thread Craig James

Sorry, I forgot to mention: This is 8.1.4, with a fairly ordinary configuration 
on a 4 GB system.

Craig


Craig James wrote:

The two queries below produce different plans.

select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
order by r.version_id;


select r.version_id, r.row_num, m.molkeys from my_rownum r
join my_molkeys m on (r.version_id = m.version_id)
where r.version_id = 320
and   r.version_id   330
and   m.version_id = 320
and   m.version_id   330
order by r.version_id;

I discovered this while looking at the plans for the first query.  It 
seemed to be ignoring the fact that it could push the between 
condition along to the second table, since the condition and the join 
are on the same indexed columns.  So, I added a redundant condition, and 
bingo, it was a lot faster.  In the analysis shown below, the timing 
(about 1.0 and 1.5 seconds respectively) are for a hot database that's 
been queried a couple of times.  In real life on a cold database, the 
times are more like 10 seconds and 21 seconds, so it's quite significant.


Thanks,
Craig



db= explain analyze db- select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and   r.version_id   330
db- order by r.version_id;

Sort  (cost=264979.51..265091.06 rows=44620 width=366) (actual 
time=1424.126..1476.048 rows=46947 loops=1)

  Sort Key: r.version_id
  -  Nested Loop  (cost=366.72..261533.64 rows=44620 width=366) (actual 
time=41.649..1186.331 rows=46947 loops=1)
-  Bitmap Heap Scan on my_rownum r  (cost=366.72..41168.37 
rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1)
  Recheck Cond: ((version_id = 320) AND (version_id  
330))
  -  Bitmap Index Scan on i_chm_rownum_version_id_4998  
(cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 
rows=46947 loops=1)
Index Cond: ((version_id = 320) AND (version_id 
 330))
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 
loops=46947)

  Index Cond: (outer.version_id = m.version_id)
Total runtime: 1534.638 ms
(10 rows)


db= explain analyze db- select r.version_id, r.row_num, m.molkeys from 
my_rownum r

db- join my_molkeys m on (r.version_id = m.version_id)
db- where r.version_id = 320
db- and r.version_id 330
db- and m.version_id =   320
db- and m.version_id 330
db- order by r.version_id;

Sort  (cost=157732.20..157732.95 rows=298 width=366) (actual 
time=985.383..1037.423 rows=46947 loops=1)

  Sort Key: r.version_id
  -  Hash Join  (cost=41279.92..157719.95 rows=298 width=366) (actual 
time=502.875..805.402 rows=46947 loops=1)

Hash Cond: (outer.version_id = inner.version_id)
-  Index Scan using i_chm_molkeys_version_id on my_molkeys m  
(cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 
rows=46947 loops=1)
  Index Cond: ((version_id = 320) AND (version_id  
330))
-  Hash  (cost=41168.37..41168.37 rows=44620 width=8) (actual 
time=502.813..502.813 rows=46947 loops=1)
  -  Bitmap Heap Scan on my_rownum r  
(cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 
rows=46947 loops=1)
Recheck Cond: ((version_id = 320) AND 
(version_id  330))
-  Bitmap Index Scan on 
i_chm_rownum_version_id_4998  (cost=0.00..366.72 rows=44620 width=0) 
(actual time=21.174..21.174 rows=46947 loops=1)
  Index Cond: ((version_id = 320) AND 
(version_id  330))

Total runtime: 1096.031 ms
(12 rows)





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] pg_restore causes 100

2007-07-12 Thread Craig James

Here's an oddity.  I have 10 databases, each with about a dozen connections to Postgres 
(about 120 connections total), and at midnight they're all idle.  These are mod_perl 
programs (like a FastCGI -- they stay connected so they're ready for instant service).  
So using ps -ef and grep, we find one of the databases looks like this:

postgres 22708  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46915) idle
postgres 22709  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46916) idle
postgres 22710  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46917) idle
postgres 22711  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46918) idle
postgres 22712  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(46919) idle
postgres 22724  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42440) idle
postgres 22725  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42441) idle
postgres 22726  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42442) idle
postgres 22727  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42443) idle
postgres 22728  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42444) idle
postgres 22731  7619  0 Jul11 ?00:00:06 postgres: chemmega chemmega 
192.168.10.192(42447) idle

Now here's the weird thing.  I'm running a pg_restore of a database (on the order of 4GB 
compressed, maybe 34M rows of ordinary data, and 15M rows in one BLOB table that's 
typically 2K per blob).  When I do this, ALL of the postgress backends start working at 
about 1% CPU apiece.  This means that the 120 idle postgres backends are 
together using almost 100% of one CPU on top of the 100% CPU being used by pg_restore.  
See the output of top(1) below.

Is this normal?  All I can guess at is that something's going on in shared 
memory that every Postgres backend has to respond to.

Thanks,
Craig



Tasks: 305 total,   1 running, 304 sleeping,   0 stopped,   0 zombie
Cpu(s): 33.5% us,  1.5% sy,  0.0% ni, 57.8% id,  6.6% wa,  0.2% hi,  0.4% si
Mem:   4151456k total,  4011020k used,   140436k free,10096k buffers
Swap:  2104504k total,94136k used,  2010368k free,  3168596k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND 
6681 postgres  16   0  217m 188m 161m D 50.4  4.6   4:29.30 postmaster   
1577 root  10  -5 000 S  1.0  0.0 108:01.97 md0_raid1
8487 postgres  15   0  187m 8704 4996 S  1.0  0.2   0:06.56 postmaster   
8506 postgres  15   0  187m 8604 4892 S  1.0  0.2   0:06.37 postmaster   
8507 postgres  15   0  187m 8708 5004 S  1.0  0.2   0:06.42 postmaster   
8512 postgres  15   0  187m 8612 4904 S  1.0  0.2   0:06.65 postmaster   
8751 postgres  15   0  187m  10m 7520 S  1.0  0.3   0:07.95 postmaster   
8752 postgres  15   0  187m  10m 7492 S  1.0  0.3   0:07.84 postmaster   
14053 postgres  15   0  187m 8752 5044 S  1.0  0.2   0:06.53 postmaster   
16515 postgres  15   0  187m 8156 4452 S  1.0  0.2   0:06.33 postmaster   
25351 postgres  15   0  187m 9772 6064 S  1.0  0.2   0:06.75 postmaster   
25387 postgres  15   0  187m 8444 4752 S  1.0  0.2   0:06.45 postmaster

Re: [PERFORM] select on 1milion register = 6s

2007-07-28 Thread Craig James

Bruno Rodrigues Siqueira wrote:
Who can help me? My SELECT in a base with 1 milion register, 
using  expression index = 6seconds…


Run your query using 


  EXPLAIN ANALYZE SELECT ... your query ...

and then post the results to this newsgroup.  Nobody can help until they see 
the results of EXPLAIN ANALYZE.  Also, include all other relevant information, 
such as Postgres version, operating system, amount of memory, and any changes 
you have made to the Postgres configuration file.

Craig



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow query with backwards index scan

2007-07-28 Thread Craig James

Tilmann Singer wrote:

* [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]:

Let's try putting the sort/limit in each piece of the UNION to speed them up 
separately.

SELECT * FROM (
 (SELECT * FROM large_table lt
 WHERE lt.user_id = 12345
 ORDER BY created_at DESC LIMIT 10) AS q1
 UNION
 (SELECT * FROM large_table lt
 WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345)
 ORDER BY created_at DESC LIMIT 10) AS q2
ORDER BY created_at DESC LIMIT 10;


It's not possible to use ORDER BY or LIMIT within unioned queries.

http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION


If I'm reading this documentation correctly, it *is* possible, as long as 
they're inside of a sub-select, as in this case.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] What to vacuum after deleting lots of tables

2007-09-10 Thread Craig James

If I delete a whole bunch of tables (like 10,000 tables), should I vacuum 
system tables, and if so, which ones?  (This system is still on 8.1.4 and isn't 
running autovacuum).

Thanks,
Craig

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Tablespaces and NFS

2007-09-19 Thread Craig James

Carlos Moreno wrote:

Anyone has tried a setup combining tablespaces with NFS-mounted partitions?


There has been some discussion of this recently, you can find it in the 
archives (http://archives.postgresql.org/).  The word seems to be that NFS can 
lead to data corruption.

Craig




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Craig James

Luiz K. Matsumura wrote:
Is connected to full 100Mb, it transfers many things quick to clients. 
Is running Apache adn JBoss, transfer rate is good, I did scp to copy 
many archives and is as quick as the old server.


I have no idea how to continue researching this problem. Now I'm going 
to do some networks tests.


Any chance this is your desktop machine, and you're also using it for audio?  
Microsoft built in a feature (!) that reduces network speed by 90% when music 
is playing:

 http://it.slashdot.org/article.pl?sid=07/08/26/1628200from=rss

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] Is ANALYZE transactional?

2007-11-05 Thread Craig James

If I do:

 begin;
 update some_table set foo = newvalue where a_bunch_of_rows_are_changed;
 analyze some_table;
 rollback;

does it roll back the statistics?  (I think the answer is yes, but I need to be 
sure.)

Thanks,
Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Curious about dead rows.

2007-11-14 Thread Craig James

Alvaro Herrera wrote:

To recap:

- your app only does inserts
- there has been no rollback lately
- there are no updates
- there are no deletes

The only other source of dead rows I can think is triggers ... do you
have any?  (Not necessarily on this table -- perhaps triggers on other
tables can cause updates on this one).

Oh, rolled back COPY can cause dead rows too.



What about an unreliable network that causes lot of disconnects?  Wouldn't the 
server process do a rollback?

Craig



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Query only slow on first run

2007-11-28 Thread Craig James

tmp wrote:

what exactly is that
random_number column


A random float that is initialized when the row is created and never 
modified afterwards. The physical row ordering will clearly not match 
the random_number ordering. However, other queries uses a row ordering 
by the primary key so I don't think it would make much sense to make the 
index on random_number a clustering index just in order to speed up this 
single query.



 and why are you desirous of ordering by it?


In order to simulate a random pick of K rows. See [1].


A trick that I used is to sample the random column once, and create a much 
smaller table of the first N rows, where N is the sample size you want, and use 
that.

If you need a different N samples each time, you can create a temporary table, 
put your random N rows into that, do an ANALYZE, and then join to this smaller 
table.  The overall performance can be MUCH faster even though you're creating 
and populating a whole table, than the plan that Postgres comes up with. This 
seems wrong-headed (why shouldn't Postgres be able to be as efficient on its 
own?), but it works.

Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] libgcc double-free, backend won't die

2007-12-10 Thread Craig James

This is driving me crazy.  I have some Postgres C function extensions in a 
shared library.  They've been working fine.  I upgraded to Fedora Core 6 and 
gcc4, and now every time psql(1) disconnects from the server, the serverlog 
gets this message:

  *** glibc detected *** postgres: mydb mydb [local] idle: double free or 
corruption! (!prev): 0x08bfcde8

and the backend process won't die.  Every single connection that executes one 
of my functions leaves an idle process, like this:

 $ ps -ef | grep postgres
 postgres 12938 12920  0  23:24 ?00:00:00 postgres: mydb mydb [local] idle

This error only happens on disconnect.  As long as I keep the connection open, I can 


Worse, these zombie Postgres processes won't die, which means I can't shut down and 
restart Postgres unless I kill -9 all of them, and I can't use this at all 
because I get zillions of these dead processes.

I've used valgrind on a test application that runs all of my functions outside 
of the Postgres environment, and not a single problem shows up even after hours 
of processing.  I tried setting MALLOC_CHECK_ to various values, so that I 
could trap the abort() call using gdb, but once MALLOC_CHECK_ is set, the 
double-free error never occurs.  (But malloc slows down too much this way.)

I even read through the documentation for C functions again, and carefully 
examined my code.  Nothing is amiss, some of the functions are quite simple yet 
still exhibit this problem.

Anyone seen this before?  It's driving me nuts.

 Postgres 8.1.4
 Linux kernel 2.6.22
 gcc 4.1.1

Thanks,
Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Alvaro Herrera wrote:

Craig James wrote:

This is driving me crazy.  I have some Postgres C function extensions
in a shared library.  They've been working fine.  I upgraded to Fedora
Core 6 and gcc4, and now every time psql(1) disconnects from the
server, the serverlog gets this message:

  *** glibc detected *** postgres: mydb mydb [local] idle: double free or 
corruption! (!prev): 0x08bfcde8


Do you have any Perl or Python functions or stuff like that?


There is one Perl function, but it is never invoked during this test.  I connect to 
Postgres, issue one select myfunc(), and disconnect.


 Postgres 8.1.4


Please upgrade to 8.1.10 and try again.  If it still fails we will be
much more interested in tracking it down.


Good idea, but alas, no difference.  I get the same double free or 
corruption! mesage.  I compiled 8.1.10 from source and installed, then rebuilt all 
of my code from scratch and reinstalled the shared object. Same message as before.

Here is my guess -- and this is just a guess.  My functions use a third-party 
library which, of necessity, uses malloc/free in the ordinary way.  I suspect 
that there's a bug in the Postgres palloc() code that's walking over memory 
that regular malloc() allocates.  The third-party library (OpenBabel) has been 
tested pretty thoroughly by me an others and has no memory corruption problems. 
 All malloc's are freed properly.  Does that seem like a possibility?

I can't figure out how to use ordinary tools like valgrind with a Postgres 
backend process to track this down.

Thanks,
Craig


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Tom Lane wrote:

Craig James [EMAIL PROTECTED] writes:

This is driving me crazy.  I have some Postgres C function extensions in a 
shared library.  They've been working fine.  I upgraded to Fedora Core 6 and 
gcc4, and now every time psql(1) disconnects from the server, the serverlog 
gets this message:
   *** glibc detected *** postgres: mydb mydb [local] idle: double free or 
corruption! (!prev): 0x08bfcde8


Have you tried attaching to one of these processes with gdb to see where
it ends up?  Have you checked to see if the processes are becoming
multi-threaded?

regards, tom lane




# ps -ef | grep postgres
postgres 31362 1  0 06:53 ?00:00:00 /usr/local/pgsql/bin/postmaster 
-D /postgres/main
postgres 31364 31362  0 06:53 ?00:00:00 postgres: writer process 
postgres 31365 31362  0 06:53 ?00:00:00 postgres: stats buffer process   
postgres 31366 31365  0 06:53 ?00:00:00 postgres: stats collector process
postgres 31442 31362  0 06:54 ?00:00:00 postgres: craig_test craig_test [local] idle 
root 31518 31500  0 07:06 pts/600:00:00 grep postgres

# gdb -p 31442
GNU gdb Red Hat Linux (6.5-15.fc6rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.

[snip - a bunch of symbol table stuff]

0x00110402 in __kernel_vsyscall ()
(gdb) bt
#0  0x00110402 in __kernel_vsyscall ()
#1  0x0082fb8e in __lll_mutex_lock_wait () from /lib/libc.so.6
#2  0x007bfce8 in _L_lock_14096 () from /lib/libc.so.6
#3  0x007befa4 in free () from /lib/libc.so.6
#4  0x00744f93 in _dl_map_object_deps () from /lib/ld-linux.so.2
#5  0x0074989d in dl_open_worker () from /lib/ld-linux.so.2
#6  0x00745c36 in _dl_catch_error () from /lib/ld-linux.so.2
#7  0x00749222 in _dl_open () from /lib/ld-linux.so.2
#8  0x00858712 in do_dlopen () from /lib/libc.so.6
#9  0x00745c36 in _dl_catch_error () from /lib/ld-linux.so.2
#10 0x008588c5 in __libc_dlopen_mode () from /lib/libc.so.6
#11 0x00836139 in init () from /lib/libc.so.6
#12 0x008362d3 in backtrace () from /lib/libc.so.6
#13 0x007b3e11 in __libc_message () from /lib/libc.so.6
#14 0x007bba96 in _int_free () from /lib/libc.so.6
#15 0x007befb0 in free () from /lib/libc.so.6
#16 0x001f943a in DeleteByteCode (node=0x890ff4) at chains.cpp:477
#17 0x00780859 in exit () from /lib/libc.so.6
#18 0x081a6064 in proc_exit ()
#19 0x081b5b9d in PostgresMain ()
#20 0x0818e34b in ServerLoop ()
#21 0x0818f1de in PostmasterMain ()
#22 0x08152369 in main ()
(gdb) 



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Alvaro Herrera wrote:

Craig James wrote:

Alvaro Herrera wrote:

Craig James wrote:


Here is my guess -- and this is just a guess.  My functions use a
third-party library which, of necessity, uses malloc/free in the
ordinary way.  I suspect that there's a bug in the Postgres palloc()
code that's walking over memory that regular malloc() allocates.  The
third-party library (OpenBabel) has been tested pretty thoroughly by
me an others and has no memory corruption problems.  All malloc's are
freed properly.  Does that seem like a possibility?

Not really.  palloc uses malloc underneath.

But some Postgres code could be walking off the end of a malloc'ed
block, even if palloc() is allocating and deallocating correctly.
Which is why I was hoping to use valgrind to see what's going on.


I very much doubt it.  Since you've now shown that OpenBabel is
multithreaded, then that's a much more likely cause.


Can you elaborate?  Are multithreaded libraries not allowed to be linked to 
Postgres?

Thanks,
Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Alvaro Herrera wrote:

...Since you've now shown that OpenBabel is
multithreaded, then that's a much more likely cause.

Can you elaborate?  Are multithreaded libraries not allowed to be
linked to Postgres?


Absolutely not.


Ok, thanks, I'll work on recompiling OpenBabel without thread support.

Since I'm not a Postgres developer, perhaps one of the maintainers could update 
the Postgres manual.  In chapter 32.9.6, it says,

 To be precise, a shared library needs to be created.

This should be amended to say,

 To be precise, a non-threaded, shared library needs to be created.

Cheers,
Craig



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Tom Lane wrote:

Craig James [EMAIL PROTECTED] writes:

GNU gdb Red Hat Linux (6.5-15.fc6rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.



[snip - a bunch of symbol table stuff]


Please show that stuff you snipped --- it might have some relevant
information.  The stack trace looks a bit like a threading problem...


# gdb -p 31442
GNU gdb Red Hat Linux (6.5-15.fc6rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-redhat-linux-gnu.
Attaching to process 31442
Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols 
found)...done.
Using host libthread_db library /lib/libthread_db.so.1.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libreadline.so.5...(no debugging symbols 
found)...done.
Loaded symbols for /usr/lib/libreadline.so.5
Reading symbols from /lib/libtermcap.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libtermcap.so.2
Reading symbols from /lib/libcrypt.so.1...
(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libm.so.6...
(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...(no debugging symbols 
found)...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /usr/local/pgsql/lib/libchmoogle.so...done.
Loaded symbols for /usr/local/pgsql/lib/libchmoogle.so
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/jaguarformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/jaguarformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/libopenbabel.so.2...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/libopenbabel.so.2
Reading symbols from /usr/lib/libstdc++.so.6...done.
Loaded symbols for /usr/lib/libstdc++.so.6
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fastaformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fastaformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/cansmilesformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/cansmilesformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/APIInterface.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/APIInterface.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmodformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmodformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/molreportformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/molreportformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fhformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fhformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/chemkinformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/chemkinformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmcifformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmcifformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/thermoformat.so...done.
Loaded symbols for 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/thermoformat.so
Reading symbols from 
/usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/carformat.so...done.
Loaded symbols

Re: [PERFORM] libgcc double-free, backend won't die

2007-12-11 Thread Craig James

Tom Lane wrote:

Magnus Hagander [EMAIL PROTECTED] writes:

On Tue, Dec 11, 2007 at 07:50:17AM -0800, Craig James wrote:
Since I'm not a Postgres developer, perhaps one of the maintainers could 
update the Postgres manual.  In chapter 32.9.6, it says,


To be precise, a shared library needs to be created.

This should be amended to say,

To be precise, a non-threaded, shared library needs to be created.



Just before someone goes ahead and writes it (which is probably a good idea
in general), don't write it just like taht - because it's platform
dependent.


I can find no such text in our documentation at all, nor any reference
to OpenBabel.  I think Craig must be looking at someone else's
documentation.


http://www.postgresql.org/docs/8.1/static/xfunc-c.html#DFUNChttp://www.postgresql.org/docs/8.1/static/xfunc-c.html#DFUNC

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] libgcc double-free, backend won't die

2007-12-16 Thread Craig James

Gregory Stark wrote:

Tom Lane [EMAIL PROTECTED] writes:


James Mansion [EMAIL PROTECTED] writes:
Is there any particular reason not to ensure that any low-level 
threading support in libc is enabled right

from the get-go, as a build-time option?

Yes.
1) It's of no value to us


Who is us?  Some of us would like to use the system for advanced scientific 
work, and scientific libraries are usually written in C++.


2) On many platforms there is a nonzero performance penalty


I'm surprised you say this, given that you're usually the voice of reason when it comes 
to rejecting hypothetical statements in favor of tested facts.  If building Postgres 
using thread-safe technology is really a performance burden, that could be easily 
verified.  A nonzero performance penalty, what does that mean, a 0.0001% 
slowdown?  I find it hard to believe that the performance penalty of thread-safe version 
would even be measurable.

If nobody has the time to do such a test, or other priorities take precedence, 
that's understandable.  But the results aren't in yet.


And the only reason to do that would be to work around one bug in one small
range of glibc versions. If you're going to use a multi-threaded library
(which isn't very common since it's hard to do safely for all those other
reasons) surely using a version of your OS without any thread related bugs is
a better idea.


You're jumping ahead.  This problem has not been accurately diagnosed yet.  It 
could be that the pthreads issue is completely misleading everyone, and in fact 
there is a genuine memory corruption going on here.  Or not.  We don't know 
yet.  I have made zero progress fixing this problem.

The one small range of glibc versions is a giveaway.  I've seen this problem in FC3, 5, 
and 6 (I went through this series of upgrades all in one week trying to fix this problem).  With 
each version, I recompiled Postgres and OpenBabel from scratch.  I'm going to try FC7 next since 
it's now the only official supported version, but I don't believe glibc is the problem.

Andrew Dalke, a regular contributor to the OpenBabel forum, suggests another 
problem: It could be a result of linking the wrong libraries together.  The 
gcc/ld system has a byzantine set of rules and hacks that if I understand 
Andrew's posting) select different versions of the same library depending on 
what it thinks you might need.  It's possible that the wrong version of some 
system library is getting linked in.

Craig

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Multi-threading friendliness (was: libgcc double-free, backend won't die)

2007-12-17 Thread Craig James

Bruce Momjian wrote:

James Mansion wrote:

I think you have your head in the ground, but its your perogative.
*You* might not care, but anyone wanting to use thread-aware libraries
(and I'm *not* talking about threading in any Postgres code) will
certainly value it if they can do so with some stability.


I suggest you find out the cause of your problem and then we can do more
research.  Talking about us changing the Postgres behavior from the
report of one user who doesn't even have the full details isn't
productive.


I think you're confusing James Mansion with me (Craig James).  I'm the one with 
the unresolved problem.

James is suggesting, completely independently of whether or not there's a bug 
in my system, that a thread-friendly option for Postgres would be very useful.

Don't confuse thread-friendly with a threaded implemetation of Postgres itself. 
 These are two separate questions.  Thread-friendly involves compile/link 
options that don't affect the Postgres source code at all.

Craig

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Making the most of memory?

2008-01-23 Thread Craig James

Guy Rouillier wrote:

Scott Marlowe wrote:

I assume you're talking about solid state drives?  They have their
uses, but for most use cases, having plenty of RAM in your server will
be a better way to spend your money.  For certain high throughput,
relatively small databases (i.e. transactional work) the SSD can be
quite useful.


Unless somebody has changes some physics recently, I'm not understanding 
the recent discussions of SSD in the general press.  Flash has a limited 
number of writes before it becomes unreliable.  On good quality consumer 
grade, that's about 300,000 writes, while on industrial grade it's about 
10 times that.  That's fine for mp3 players and cameras; even 
professional photographers probably won't rewrite the same spot on a 
flash card that many times in a lifetime.  But for database 
applications, 300,000 writes is trivial. 3 million will go a lot longer, 
but in non-archival applications, I imagine even that mark won't take 
but a year or two to surpass.


One trick they use is to remap the physical Flash RAM to different logical 
addresses.  Typical apps update a small percentage of the data frequently, and 
the rest of the data rarely or never.  By shuffling the physical Flash RAM 
around, the media lasts a lot longer than a simple analysis might indicate.

Craig

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] 8x2.5 or 6x3.5 disks

2008-01-29 Thread Craig James

Mike Smith wrote:
I’ve seen a few performance posts on using different hardware 
technologies to gain improvements. Most of those comments are on raid, 
 interface and rotation speed.   One area that doesn’t seem to have 
 been mentioned  is to  run your disks empty.

...
On the outside of the disk you get a lot more data per seek than on the 
inside. Double whammy you get it faster.


Performance  can vary more than  100% between the outer and inner tracks 
of the disk.   So running a slower disk twice as big may give you more 
benefit than running a small capacity 15K disk full.  The slower disks 
are also generally more reliable and mostly much cheaper.

...
This is not very green as you need to buy more disks for the same amount 
of data and its liable to upset your purchasing department who won’t 
understand why you don’t want to fill your disks up.


So presumably the empty-disk effect could also be achieved by partitioning, say 25% of 
the drive for the database, and 75% empty partition.  But in fact, you could use that 
low performance 75% for rarely-used or static data, such as the output from 
pg_dump, that is written during non-peak times.

Pretty cool.

Craig

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Dell Perc/6

2008-02-12 Thread Craig James

Does anyone have performance info about the new Dell Perc/6 controllers?  I found a long 
discussion (Dell vs HP) about the Perc/5, but nothing about Perc/6.  What's 
under the covers?

Here is the (abbreviated) info from Dell on this machine:

PowerEdge 1950 IIIQuad Core Intel® Xeon® E5405, 2x6MB Cache, 2.0GHz, 
1333MHz FSB
Additional Processors Quad Core Intel® Xeon® E5405, 2x6MB Cache, 2.0GHz, 
1333MHz FSB
Memory8GB 667MHz (4x2GB), Dual Ranked DIMMs
Hard Drive Configuration  Integrated SAS/SATA RAID 5, PERC 6/i Integrated

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] How to allocate 8 disks

2008-03-01 Thread Craig James

We're upgrading to a medium-sized server, a Dell PowerEdge 2950, dual-quad CPU's and 
8 GB memory.  This box can hold at most 8 disks (10K SCSI 2.5 146 GB drives) 
and has Dell's Perc 6/i RAID controller.

I'm thinking of this:

 6 disks  RAID 1+0  Postgres data
 1 disk   WAL
 1 disk   Linux

I've often seen RAID 1 recommended for the WAL.  Is that strictly for 
reliability, or is there a performance advantage to RAID 1 for the WAL?

It seems to me separating the OS and WAL on two disks is better than making a 
single RAID 1 and sharing it, from a performance point of view.

Thanks,
Craig

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to allocate 8 disks

2008-03-01 Thread Craig James

Joshua D. Drake wrote:

On Sat, 01 Mar 2008 10:06:54 -0800
Craig James [EMAIL PROTECTED] wrote:


We're upgrading to a medium-sized server, a Dell PowerEdge 2950,
dual-quad CPU's and 8 GB memory.  This box can hold at most 8 disks
(10K SCSI 2.5 146 GB drives) and has Dell's Perc 6/i RAID controller.

I'm thinking of this:

  6 disks  RAID 1+0  Postgres data
  1 disk   WAL
  1 disk   Linux

I've often seen RAID 1 recommended for the WAL.  Is that strictly for
reliability, or is there a performance advantage to RAID 1 for the
WAL?

It seems to me separating the OS and WAL on two disks is better than
making a single RAID 1 and sharing it, from a performance point of
view.


This scares me... You lose WAL you are a goner. Combine your OS and
WAL into a RAID 1.


Right, I do understand that, but reliability is not a top priority in this 
system.  The database will be replicated, and can be reproduced from the raw 
data.  It's not an accounting system, it finds scientific results.  That's not 
to say I *won't* take your advice, we may in fact combine the OS and WAL on one 
disk.  Reliability is a good thing, but I need to know all of the tradeoffs, so 
that I can weigh performance, reliability, and cost and make the right choice.

So my question still stands: From a strictly performance point of view, would 
it be better to separate the OS and the WAL onto two disks?  Is there any 
performance advantage to RAID 1?  My understanding is that RAID 1 can give 2x 
seek performance during read, but no advantage during write.  For the WAL, it 
seems to me that RAID 1 has no performance benefits, so separating the WAL and 
OS seems like a peformance advantage.

Another option would be:

 4 disks   RAID 1+0  Postgres data
 2 disks   RAID 1WAL
 1 diskLinux
 1 diskspare

This would give us reliability, but I think the performance would be 
considerably worse, since the primary Postgres data would come from 4 disks 
instead of six.

I guess we could also consider:

 4 disks   RAID 1+0  Postgres data
 4 disks   RAID 1+0  WAL and Linux

Or even

 8 disks   RAID 1+0  Everything

This is a dedicated system and does nothing but Apache/Postgres, so the OS 
should get very little traffic.  But if that's the case, I guess you could 
argue that your suggestion of combining OS and WAL on a 2-disk RAID 1 would be 
the way to go, since the OS activity wouldn't affect the WAL very much.

I suppose the thing to do is get the system, and run bonnie on various 
configurations.  I've never run bonnie before -- can I get some useful results 
without a huge learning curve?

Thanks,
Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] How to allocate 8 disks

2008-03-03 Thread Craig James

Matthew wrote:

On Sat, 1 Mar 2008, Craig James wrote:
Right, I do understand that, but reliability is not a top priority in 
this system.  The database will be replicated, and can be reproduced 
from the raw data.


So what you're saying is:

1. Reliability is not important.
2. There's zero write traffic once the database is set up.


Well, I actually didn't say either of those things, but I appreciate the 
feedback.  RAID 0 is an interesting suggestion, but given our constraints, it's 
not an option.  Reliability is important, but not as important as, say, a 
banking system.

And as far as zero write traffic, I don't know where that came from.  It's a 
hitlist based system, where complex search results are saved for the user in 
tables, and the write traffic can be quite high.

If this is true, then RAID-0 is the way to go. I think Greg's options 
are good. Either:


2 discs RAID 1: OS
6 discs RAID 0: database + WAL

which is what we're using here (except with more discs), or:

8 discs RAID 10: everything


Right now, an 8-disk RAID 10 is looking like the best choice.  The Dell Perc 6i 
has configurations that include a battery-backed cache, so performance should 
be quite good.

However, if reliability *really* isn't an issue, and you can accept 
reinstalling the system if you lose a disc, then there's a third option:


8 discs RAID 0: Everything


I imagine the MTBF on a system like this would be  1 year, which is out of the 
question, even with a backup system that can take over.  A failure completely 
wipes the system, OS and everything, so you're guaranteed that once or twice a 
year, you have to rebuild your system from the ground up.  I'd rather spend that 
time at the beach!

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your Subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

In the 3 years I've been using Postgres, the problem of count() performance has come up 
more times than I can recall, and each time the answer is, It's a sequential scan 
-- redesign your application.

My question is: What do the other databases do that Postgres can't do, and why 
not?

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?

On the one hand, I understand that Postgres has its architecture, and I 
understand the issue of row visibility, and so forth.  On the other hand, my 
database is just sitting there, nothing going on, no connections except me, 
and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that 
either Oracle or MySQL would answer in a fraction of a second.  It's hard for 
me to believe there isn't a better way.

This is a real problem.  Countless people (including me) have spent significant effort 
rewriting applications because of this performance flaw in Postgres.  Over and over, the 
response is, You don't really need to do that ... change your application.  
Well, sure, it's always possible to change the application, but that misses the point.  
To most of us users, count() seems like it should be a trivial operation.  On other 
relational database systems, it is a trivial operation.

This is really a significant flaw on an otherwise excellent relational database 
system.

My rant for today...
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance


Re: [PERFORM] count * performance issue

2008-03-06 Thread Craig James

Tom Lane wrote:

Craig James [EMAIL PROTECTED] writes:

Count() on Oracle and MySQL is almost instantaneous, even for very large 
tables. So why can't Postgres do what they do?


AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.


My experience doesn't match this claim.  When I ported my application from 
Oracle to Postgres, this was the single biggest performance problem.  count() 
in Oracle was always very fast.  We're not talking about a 20% or 50% 
difference, we're talking about a small fraction of a second (Oracle) versus a 
minute (Postgres) -- something like two or three orders of magnitude.

It may be that Oracle has a way to detect when there's no transaction and use a 
faster method.  If so, this was a clever optimization -- in my experience, that 
represents the vast majority of the times you want to use count().  It's not 
very useful to count the rows of a table that many apps are actively modifying 
since the result may change the moment your transaction completes.  Most of the 
time when you use count(), it's because you're the only one modifying the 
table, so the count will be meaningful.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-12 Thread Craig James

I just received a new server and thought benchmarks would be interesting.  I think this 
looks pretty good, but maybe there are some suggestions about the configuration file.  
This is a web app, a mix of read/write, where writes tend to be insert into ... 
(select ...) where the resulting insert is on the order of 100 to 10K rows of two 
integers.  An external process also uses a LOT of CPU power along with each query.

Thanks,
Craig


Configuration:
 Dell 2950
 8 CPU (Intel 2GHz Xeon)
 8 GB memory
 Dell Perc 6i with battery-backed cache
 RAID 10 of 8x 146GB SAS 10K 2.5 disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 100
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

--

Delete files in random order...done.
Version  1.03
--Sequential Output--   --Sequential Input-  --Random-
 -Per Chr-   --Block---Rewrite- -Per Chr-   --Block----Seeks--
Size K/sec %CP   K/sec  %CP   K/sec  %CPK/sec %CP   K/sec  %CP/sec %CP
 16G 64205  99   234252  38   112924  2665275  98   293852  24   940.3   1

--Sequential Create--Random Create
 -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP
  16 12203  95   + +++   19469  94   12297  95   + +++   15578  82

www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+,+++,19469,94,12297,95,+,+++,15578,82

--

$ pgbench -c 10 -t 1 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-13 Thread Craig James

Joshua D. Drake wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 12 Mar 2008 21:55:18 -0700
Craig James [EMAIL PROTECTED] wrote:



Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 100
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB


I didn't see which OS but I assume linux. I didn't see postgresql so I
assume 8.3.


Right on both counts.


wal_sync_method = open_sync
checkpoint_segments = 30
shared_buffers = 2000MB
asyncrhonous_commit = off (sp?)

Try again.


Nice improvement!  About 25% increase in TPS:

$ pgbench -c 10 -t 1 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 3423.636423 (including connections establishing)
tps = 3425.957521 (excluding connections establishing)

For reference, here are the results before your suggested changes:

$ pgbench -c 10 -t 1 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 1
number of transactions actually processed: 10/10
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)

Thanks!
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to storage music files in Postgresql

2008-03-16 Thread Craig James

Rich wrote:

I am going to embarkon building a music library using apache,
postgresql and php.  What is the best way to store the music files?
Which file type should I use?


In Postgres, its all just binary data.  It's entirely up to you which 
particular format you use. mp2, mp3 mp4, wmv, avi, whatever, it's all the same 
to Postgres.

A better question is: Should you store the binary data in Postgres itself, or keep it in 
files and only store the filenames?  The Postgres archives have several discussions of 
this topic, and the answer is, it depends.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-16 Thread Craig James

Dave Cramer wrote:


On 16-Mar-08, at 2:19 AM, Justin wrote:



I decided to reformat the raid 10 into ext2 to see if there was any 
real big difference in performance as some people have noted   here is 
the test results


please note the WAL files are still on the raid 0 set which is still 
in ext3 file system format.  these test where run with the fsync as 
before.   I made sure every thing was the same as with the first test.


This is opposite to the way I run things. I use ext2 on the WAL and ext3 
on the data. I'd also suggest RAID 10 on the WAL it is mostly write.


Just out of curiosity: Last time I did research, the word seemed to be that xfs 
was better than ext2 or ext3.  Is that not true?  Why use ext2/3 at all if xfs 
is faster for Postgres?

Criag

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-16 Thread Craig James

Craig James wrote:

Dave Cramer wrote:


On 16-Mar-08, at 2:19 AM, Justin wrote:



I decided to reformat the raid 10 into ext2 to see if there was any 
real big difference in performance as some people have noted   here 
is the test results


please note the WAL files are still on the raid 0 set which is still 
in ext3 file system format.  these test where run with the fsync as 
before.   I made sure every thing was the same as with the first test.


This is opposite to the way I run things. I use ext2 on the WAL and 
ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly write.


Just out of curiosity: Last time I did research, the word seemed to be 
that xfs was better than ext2 or ext3.  Is that not true?  Why use 
ext2/3 at all if xfs is faster for Postgres?


Criag


And let's see if I can write my own name ...

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10

2008-03-17 Thread Craig James

Justin wrote:

2000 tps ??? do you have fsync turned off ?

Dave



No its turned on.


Unless I'm seriously confused, something is wrong with these numbers.  That's 
the sort of performance you expect from a good-sized RAID 10 six-disk array.  
With a single 7200 rpm SATA disk and XFS, I get 640 tps.  There's no way you 
could 2000 tps from a single disk.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Anybody using the Dell Powervault MD3000 array?

2008-04-16 Thread Craig James

Gavin M. Roy wrote:
On Wed, Apr 16, 2008 at 4:39 PM, Joshua D. Drake [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


On Wed, 16 Apr 2008 13:37:32 -0700
Jeffrey Baker [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

I can second this. The MSA 70 is a great unit for the money.
 
  Thank you both.  The MSA 70 looks like an ordinary disk shelf.  What
  controllers do you use?  Or, do you just go with a software RAID?
 

P800, from HP.


In a Dell box I use a Perc 6/E with a SAS to Mini SAS cable.


There was a fairly long recent thread discussing the Dell Perc 6 controller 
starting here:

 http://archives.postgresql.org/pgsql-performance/2008-03/msg00264.php

and one relevant follow-up regarding the MD1000 box:

 http://archives.postgresql.org/pgsql-performance/2008-03/msg00280.php

(Unfortunately, the Postgres web archive does a terrible job formatting 
plain-old-text messages, it doesn't seem to know that it should wrap 
paragraphs, so some of these are pretty hard to read as web pages.)

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] two memory-consuming postgres processes

2008-05-02 Thread Craig James

On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote:

 I naively thought that if I have a 100,000,000 row table, of the form
(integer,integer,smallint,date), and add a real coumn to it, it will scroll
through the memory reasonably fast.


In Postgres, an update is the same as a delete/insert.  That means that 
changing the data in one column rewrites ALL of the columns for that row, and 
you end up with a table that's 50% dead space, which you then have to vacuum.

Sometimes if you have a volatile column that goes with several static 
columns, you're far better off to create a second table for the volatile data, duplicating the 
primary key in both tables.  In your case, it would mean the difference between 10^8 inserts of 
(int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes 
of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow).

The down side of this design is that later on, it requires a join to fetch all 
the data for each key.

You do have a primary key on your data, right?  Or some sort of index?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Backup causing poor performance - suggestions

2008-05-05 Thread Craig James

Campbell, Lance wrote:
We currently backup all of our database tables per schema using pg_dump 
every half hour.  We have been noticing that the database performance 
has been very poor during the backup process.  How can I improve the 
performance?


It sounds like the goal is to have frequent, near-real-time backups of your databases for 
recovery purposes.  Maybe instead of looking at pg_dump's performance, a better solution 
would be a replication system such as Slony, or a warm backup using Skype 
Tools.

Backing up the database every half hour puts a large load on the system during 
the dump, and means you are re-dumping the same data, 48 times per day.  If you 
use a replication solution, the backup process is continuous (spread out 
through the day), and you're not re-dumping static data; the only data that 
moves around is the new data.

I've used Slony with mixed success; depending on the complexity and size of your 
database, it can be quite effective.  I've heard very good reports about Skype Tools, 
which has both a Slony-like replicator (not as configurable as Slony, but easier to set 
up and use), plus an entirely separate set of scripts that simplifies warm 
standby using WAL logging.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What constitutes a complex query

2008-05-06 Thread Craig James

Justin wrote:
This falls under the stupid question and i'm just curious what other 
people think what makes a query complex?


There are two kinds:

1. Hard for Postgres to get the answer.

2. Hard for a person to comprehend.

Which do you mean?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers

2008-05-06 Thread Craig James

Greg Smith wrote:

On Mon, 5 May 2008, Craig James wrote:


pgbench -i -s 20 -U test


That's way too low to expect you'll see a difference in I/O schedulers. 
A scale of 20 is giving you a 320MB database, you can fit the whole 
thing in RAM and almost all of it on your controller cache.  What's 
there to schedule?  You're just moving between buffers that are 
generally large enough to hold most of what they need.


Test repeated with:
autovacuum enabled
database destroyed and recreated between runs
pgbench -i -s 600 ...
pgbench -c 10 -t 5 -n ...

I/O Sched AVG Test1  Test2
---  -
cfq705  695715
noop   758  769747
deadline   741  705775
anticipatory   494  477511

I only did two runs of each, which took about 24 minutes.  Like the first round of tests, the 
noise in the measurements (about 10%) exceeds the difference between 
scheduler-algorithm performance, except that anticipatory seems to be measurably slower.

So it still looks like cfq, noop and deadline are more or less equivalent when 
used with a battery-backed RAID.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers

2008-05-06 Thread Craig James

Greg Smith wrote:

On Tue, 6 May 2008, Craig James wrote:

I only did two runs of each, which took about 24 minutes.  Like the 
first round of tests, the noise in the measurements (about 10%) 
exceeds the difference between scheduler-algorithm performance, except 
that anticipatory seems to be measurably slower.


Those are much better results.  Any test that says anticipatory is 
anything other than useless for database system use with a good 
controller I presume is broken, so that's how I know you're in the right 
ballpark now but weren't before.


In order to actually get some useful data out of the noise that is 
pgbench, you need a lot more measurements of longer runs.  As 
perspective, the last time I did something in this area, in order to get 
enough data to get a clear picture I ran tests for 12 hours.  I'm hoping 
to repeat that soon with some more common hardware that gives useful 
results I can give out.


This data is good enough for what I'm doing.  There were reports from non-RAID 
users that the I/O scheduling could make as much as a 4x difference in 
performance (which makes sense for non-RAID), but these tests show me that 
three of the four I/O schedulers are within 10% of each other.  Since this 
matches my intuition of how battery-backed RAID will work, I'm satisfied.  If 
our servers get overloaded to the point where 10% matters, then I need a much 
more dramatic solution, like faster machines or more machines.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problem with 11 M records table

2008-05-13 Thread Craig James

idc danny wrote:

Hi everybody,

I'm fairly new to PostgreSQL and I have a problem with
a query:

SELECT * FROM LockerEvents LIMIT 1 OFFSET
1099

The table LockerEvents has 11 Mlillions records on it
and this query takes about 60 seconds to complete.


The OFFSET clause is almost always inefficient for anything but very small 
tables or small offsets.  In order for a relational database (not just 
Postgres) to figure out which row is the 1100th row, it has to actually 
retrieve the first 1099 rows and and discard them.  There is no magical way 
to go directly to the 11-millionth row.  Even on a trivial query such as yours 
with no WHERE clause, the only way to determine which row is the 11 millionths 
is to scan the previous 1099.

There are better (faster) ways to achieve this, but it depends on why you are 
doing this query.  That is, do you just want this one block of data, or are you 
scanning the whole database in 10,000-row blocks?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Problem with 11 M records table

2008-05-13 Thread Craig James

idc danny wrote:

Hi James,

Than you for your response.

What I want to achieve is to give to the application
user 10k rows where the records are one after another
in the table, and the application has a paginating GUI
(First page, Previous page, Next page, Last
page - all links  Jump to page combobox) where
thsi particular query gets to run if the user clicks
on the Last page link.
The application receive the first 10k rows in under a
second when the user clicks on First page link and
receive the last 10k rows in about 60 seconds when he
clicks on Last page link.


You need a sequence that automatically assigns an ascending my_rownum to each 
row as it is added to the table, and an index on that my_rownum column.  Then you select 
your page by (for example)

 select * from my_table where my_rownum = 100 and id  110;

That will do what you want, with instant performance that's linear over your 
whole table.

If your table will have deletions, then you have to update the row numbering a lot, which will cause you 
terrible performance problems due to the nature of the UPDATE operation in Postgres.  If this is the case, 
then you should keep a separate table just for numbering the rows, which is joined to your main table when 
you want to retrieve a page of data.  When you delete data (which should be batched, since this 
will be expensive), then you truncate your rownum table, reset the sequence that generates your row numbers, 
then regenerate your row numbers with something like insert into my_rownum_table (select id, 
nextval('my_rownum_seq') from my_big_table).  To retrieve a page, just do select ... from 
my_table join my_rownum_table on (...), which will be really fast since you'll have indexes on both 
tables.

Note that this method requires that you have a primary key, or at least a 
unique column, on your main table, so that you have something to join with your 
row-number table.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which ext3 fs type should I use for postgresql

2008-05-15 Thread Craig James

Matthew Wakeling wrote:

Probably of more use are some of the other settings:

 -m reserved-blocks-percentage - this reserves a portion of the filesystem
that only root can write to. If root has no need for it, you can kill
this by setting it to zero. The default is for 5% of the disc to be
wasted.


This is not a good idea.  The 5% is NOT reserved for root's use, but rather is 
to prevent severe file fragmentation.  As the disk gets full, the remaining 
empty spaces tend to be small spaces scattered all over the disk, meaning that 
even for modest-sized files, the kernel can't allocate contiguous disk blocks.  
If you reduce this restriction to 0%, you are virtually guaranteed poor 
performance when you fill up your disk, since those files that are allocated 
last will be massively fragmented.

Worse, the fragmented files that you create remain fragmented even if you clean 
up to get back below the 95% mark.  If Postgres happened to insert a lot of 
data on a 99% full file system, those blocks could be spread all over the 
place, and they'd stay that way forever, even after you cleared some space.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Typecast bug?

2008-06-25 Thread Craig James

This seems like a bug to me, but it shows up as a performance problem.  Since 
the column being queried is an integer, the second query (see below) can't 
possibly match, yet Postgres uses a typecast, forcing a full table scan for a 
value that can't possibly be in the table.

The application could intercept these bogus queries, but that requires building 
schema-specific and postgres-specific knowledge into the application (i.e. What is 
the maximum legal integer for this column?).

Craig


explain analyze select version_id, parent_id from version where version_id = 
9;
QUERY PLAN  
--

Index Scan using version_pkey on version  (cost=0.00..9.89 rows=1 width=8) 
(actual time=0.054..0.054 rows=0 loops=1)
 Index Cond: (version_id = 9)
Total runtime: 0.130 ms
(3 rows)

emol_warehouse_1= explain analyze select version_id, parent_id from version 
where version_id = 999;
 QUERY PLAN   


Seq Scan on version  (cost=0.00..253431.77 rows=48393 width=8) (actual 
time=3135.530..3135.530 rows=0 loops=1)
 Filter: ((version_id)::numeric = 999::numeric)
Total runtime: 3135.557 ms
(3 rows)


\d version
Table emol_warehouse_1.version
 Column   |  Type   | Modifiers 
+-+---

version_id | integer | not null
parent_id  | integer | not null
... more columns
Indexes:
  version_pkey PRIMARY KEY, btree (version_id)





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Typecast bug?

2008-06-26 Thread Craig James

Tom Lane wrote:

Craig James [EMAIL PROTECTED] writes:

This seems like a bug to me, but it shows up as a performance problem.



emol_warehouse_1= explain analyze select version_id, parent_id from version 
where version_id = 999;


If you actually *need* so many 9's here as to force it out of the range
of bigint, then why is your id column not declared numeric?

This seems to me to be about on par with complaining that intcol = 4.2e1
won't be indexed.  We have a numeric data type hierarchy, learn to
work with it ...


Your suggestion of learn to work with it doesn't fly.  A good design 
separates the database schema details from the application to the greatest extent 
possible.  What you're suggesting is that every application that queries against a 
Postgres database should know the exact range of every numeric data type of every indexed 
column in the schema, simply because Postgres can't recognize an out-of-range numeric 
value.

In this case, the optimizer could have instantly returned zero results with no 
further work, since the query was out of range for that column.

This seems like a pretty simple optimization to me, and it seems like a helpful 
suggestion to make to this forum.

BTW, this query came from throwing lots of junk at a web app in an effort to 
uncover exactly this sort of problem.  It's not a real query, but then, hackers 
don't use real queries.  The app checks that its input is a well-formed integer 
expression, but then assumes Postgres can deal with it from there.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Mailing list hacked by spammer?

2008-07-18 Thread Craig James

I've never gotten a single spam from the Postgres mailing list ... until today. 
 A Chinese company selling consumer products is using this list.  I have my 
filters set to automatically trust this list because it has been so reliable 
until now.  It would be really, really unfortunate if this list fell to the 
spammers.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Mailing list hacked by spammer?

2008-07-18 Thread Craig James

Glyn Astill wrote:

Most likely just a forged header or something, hardly hacked
though is it.


Yes, hack is the correct term.  The bad guys have hacked into the major email 
systems, including gmail, which was the origin of this spam:

 http://www.theregister.co.uk/2008/02/25/gmail_captcha_crack/


 I think you need to do some training:
http://www2.b3ta.com/bigquiz/hacker-or-spacker/


Sending a link to a web site that plays loud rap music is not a friendly way to 
make your point.

Craig





- Original Message 

From: Craig James [EMAIL PROTECTED]
To: pgsql-performance@postgresql.org
Sent: Friday, 18 July, 2008 4:02:37 PM
Subject: [PERFORM] Mailing list hacked by spammer?

I've never gotten a single spam from the Postgres mailing list ... until today.  
A Chinese company selling consumer products is using this list.  I have my 
filters set to automatically trust this list because it has been so reliable 
until now.  It would be really, really unfortunate if this list fell to the 
spammers.


Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




  __
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at 
Yahoo! http://uk.docs.yahoo.com/ymail/new.html




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Perl/DBI vs Native

2008-07-21 Thread Craig James
Valentin Bogdanov wrote: 

I have ran quite a few tests comparing how long a query takes to
execute from Perl/DBI as compared to psql/pqlib. No matter how many
times I run the test the results were always the same.

I run a SELECT all on a fairly big table and enabled the
log_min_duration_statement option. With psql postgres consistently
logs half a second while the exact same query executed with Perl/DBI
takes again consistently 2 seconds.


The problem may be that your two tests are not equivalent.  When Perl executes 
a statement, it copies the *entire* result set back to the client before it 
returns the first row.  The following program might appear to just be fetching 
the first row:

 $sth = $dbh-prepare(select item from mytable);
 $sth-execute();
 $item = $sth-fetchrow_array();

But in fact, before Perl returns from the $sth-execute() statement, it has 
already run the query and copied all of the rows into a hidden, client-side cache.  
Each $sth-fetchrow_array() merely copies the data from the hidden cache into your 
local variable.

By contrast, psql executes the query, and starts returning the data a page at a 
time.  So it may appear to be much faster.

This also means that Perl has trouble with very large tables.  If the mytable 
in the above example is very large, say a hundred billion rows, you simply can't execute 
this statement in Perl.  It will try to copy 100 billion rows into memory before 
returning the first answer.

The reason for Perl's behind-the-scenes caching is because it allows multiple 
connections to a single database, and multiple statements on each database 
handle.  By executing each statement completely, it gives the appearance that 
multiple concurrent queries are supported.  The downside is that it can be a 
huge memory hog.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using PK value as a String

2008-08-11 Thread Craig James

Valentin Bogdanov wrote:

--- On Mon, 11/8/08, Gregory Stark [EMAIL PROTECTED] wrote:


From: Gregory Stark [EMAIL PROTECTED]
Subject: Re: [PERFORM] Using PK value as a String
To: Jay [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Date: Monday, 11 August, 2008, 10:30 AM
Jay [EMAIL PROTECTED] writes:


I have a table named table_Users:

CREATE TABLE table_Users (
   UserID   character(40)  NOT NULL default

'',

   Username   varchar(256)  NOT NULL default

'',

   Email  varchar(256) NOT NULL default

''

   etc...
);


...

But the real question here is what's the better design.
If you use Username
you'll be cursing if you ever want to provide a
facility to allow people to
change their usernames. You may not want such a facility
now but one day...



I don't understand Gregory's suggestion about the design. I thought
using natural primary keys as opposed to surrogate ones is a better
design strategy, even when it comes to performance considerations
and even more so if there are complex relationships within the database.


No, exactly the opposite.  Data about users (such as name, email address, etc.) are 
rarely a good choice as a foreign key, and shouldn't be considered keys in 
most circumstances.  As Gregory points out, you're spreading the user's name across the 
database, effectively denormalizing it.

Instead, you should have a user record, with an arbitrary key, an integer or 
OID, that you use as the foreign key for all other tables.  That way, when the 
username changes, only one table will be affected.  And it's much more 
efficient to use an integer as the key than a long string.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

The OOM killer is a terrible idea for any serious database server.  I wrote a 
detailed technical paper on this almost 15 years ago when Silicon Graphics had 
this same feature, and Oracle and other critical server processes couldn't be 
made reliable.

The problem with overallocating memory as Linux does by default is that EVERY 
application, no matter how well designed and written, becomes unreliable: It can be 
killed because of some OTHER process.  You can be as clever as you like, and do all the 
QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL 
be unreliable if you run it on a Linux system that allows overcommitted memory.

IMHO, all Postgres servers should run with memory-overcommit disabled.  On 
Linux, that means  /proc/sys/vm/overcommit_memory=2.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

[EMAIL PROTECTED] wrote:

On Wed, 27 Aug 2008, Craig James wrote:
The OOM killer is a terrible idea for any serious database server.  I 
wrote a detailed technical paper on this almost 15 years ago when 
Silicon Graphics had this same feature, and Oracle and other critical 
server processes couldn't be made reliable.


The problem with overallocating memory as Linux does by default is 
that EVERY application, no matter how well designed and written, 
becomes unreliable: It can be killed because of some OTHER process.  
You can be as clever as you like, and do all the QA possible, and 
demonstrate that there isn't a single bug in Postgres, and it will 
STILL be unreliable if you run it on a Linux system that allows 
overcommitted memory.


IMHO, all Postgres servers should run with memory-overcommit 
disabled.  On Linux, that means  /proc/sys/vm/overcommit_memory=2.


it depends on how much stuff you allow others to run on the box. if you 
have no control of that then yes, the box is unreliable (but it's not 
just becouse of the OOM killer, it's becouse those other users can eat 
up all the other box resources as well CPU, network bandwidth, disk 
bandwidth, etc)


even with overcommit disabled, the only way you can be sure that a 
program will not fail is to make sure that it never needs to allocate 
memory. with overcommit off you could have one program that eats up 100% 
of your ram without failing (handling the error on memory allocation 
such that it doesn't crash), but which will cause _every_ other program 
on the system to fail, including any scripts (becouse every command 
executed will require forking and without overcommit that will require 
allocating the total memory that your shell has allocated so that it can 
run a trivial command (like ps or kill that you are trying to use to fix 
the problem)


if you have a box with unpredictable memory use, disabling overcommit 
will not make it reliable. it may make it less unreliable (the fact that 
the linux OOM killer will pick one of the worst possible processes to 
kill is a problem), but less unreliable is not the same as reliable.


The problem with any argument in favor of memory overcommit and OOM is that 
there is a MUCH better, and simpler, solution.  Buy a really big disk, say a 
terabyte, and allocate the whole thing as swap space.  Then do a decent job of 
configuring your kernel so that any reasonable process can allocate huge chunks 
of memory that it will never use, but can't use the whole terrabyte.

Using real swap space instead of overallocated memory is a much better solution.

- It's cheap.
- There is no performance hit at all if you buy enough real memory
- If runaway processes start actually using memory, the system slows
 down, but server processes like Postgres *aren't killed*.
- When a runaway process starts everybody swapping, you can just
 find it and kill it.  Once it's dead, everything else goes back
 to normal.

It's hard to imagine a situation where any program or collection of programs 
would actually try to allocate more than a terrabyte of memory and exceed the 
swap space on a single terrabyte disk.  The cost is almost nothing, a few 
hundred dollars.

So turn off overcommit, and buy an extra disk if you actually need a lot of virtual 
memory.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

Matthew Wakeling wrote:

On Thu, 28 Aug 2008, Steve Atkins wrote:
Probably the best solution is to just tell the kernel somehow to 
never kill the postmaster.


Or configure adequate swap space?


Oh yes, that's very important. However, that gives the machine the 
opportunity to thrash.


No, that's where the whole argument for allowing overcommitted memory falls 
flat.

The entire argument for allowing overcommitted memory hinges on the fact that 
processes *won't use the memory*.  If they use it, then overcommitting causes 
problems everywhere, such as a Postmaster getting arbitrarily killed.

If a process *doesn't* use the memory, then there's no problem with thrashing, 
right?

So it never makes sense to enable overcommitted memory when Postgres, or any 
server, is running.

Allocating a big, fat terabyte swap disk is ALWAYS better than allowing 
overcommitted memory.  If your usage is such that overcommitted memory would 
never be used, then the swap disk will never be used either.  If your processes 
do use the memory, then your performance goes into the toilet, and you know 
it's time to buy more memory or a second server, but in the mean time your 
server processes at least keep running while you kill the rogue processes.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Craig James

-Mensaje original-
De: [EMAIL PROTECTED] 
* I think we will go for hardware-based RAID 1 with a good 
battery-backed-up controller. I have read that software RAID 
perform surprisingly good, but for a production site where 
hotplug replacement of dead disks is required, is software 
RAID still worth it?
... 

I havent had any issues with software raid (mdadm) and hot-swaps. It keeps
working in degraded mode and as soon as you replace the defective disk it
can reconstruct the array on the fly. Performance will suffer while at it
but the service keeps up.
The battery backup makes a very strong point for a hw controller. Still, I
have heard good things on combining a HW controller with JBODS leaving the
RAID affair to mdadm. In your scenario though with *lots* of random reads,
if I had to choose between a HW controller  2 disks or software RAID with 4
or 6 disks, I would go for the disks. There are motherboards with 6 SATA
ports. For the money you will save on the controller you can afford 6 disks
in a RAID 10 setup.


This is good advice.  Hot-swapping seems cool, but how often will you actually 
use it? Maybe once every year?  With Software RAID, replacing a disk means 
shutdown, swap the hardware, and reboot, which is usually less than ten 
minutes, and you're back in business.  If that's the only thing that happens, 
you'll have 99.97% uptime on your server.

If you're on a limited budget, a software RAID 1+0 will be very cost effective 
and give good performance for lots of random reads.  Hardware RAID with a 
battery-backed cache helps with writes and hot swapping.  If your random-read 
performance needs outweigh these two factors, consider software RAID.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-29 Thread Craig James

James Mansion wrote:
I can't see how an OS can lie to processes about memory being allocated 
to them and not be ridiculed as a toy, but there you go.  I don't think 
Linux is the only perpetrator - doesn't AIX do this too?


This is a leftover from the days of massive physical modeling (chemistry, 
physics, astronomy, ...) programs written in FORTRAN. Since FORTRAN didn't have 
pointers, scientists would allocate massive three-dimensional arrays, and their 
code might only access a tiny fraction of the memory.  The operating-system 
vendors, particularly SGI, added features to the various flavors of UNIX, 
including the ability to overcommit memory, to support these FORTRAN programs, 
which at the time were some of the most important applications driving computer 
science and computer architectures of workstation-class computers.

When these workstation-class computers evolved enough to rival mainframes, 
companies started shifting apps like Oracle onto the cheaper workstation-class 
computers.  Unfortunately, the legacy of the days of these FORTRAN programs is 
still with us, and every few years we have to go through this discussion again.

Disable overcommitted memory.  There is NO REASON to use it on any modern 
server-class computer, and MANY REASONS WHY IT IS A BAD IDEA.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-09-15 Thread Craig James

Florian Weimer wrote:

* Craig James:

So it never makes sense to enable overcommitted memory when
Postgres, or any server, is running.


There are some run-time environments which allocate huge chunks of
memory on startup, without marking them as not yet in use.  SBCL is in
this category, and also the Hotspot VM (at least some extent).


I stand by my assertion: It never makes sense.  Do these applications allocate 
a terrabyte of memory?  I doubt it.  Buy a terrabyte swap disk and disable 
overcommitted memory.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-09-15 Thread Craig James

Florian Weimer wrote:

* Craig James:


There are some run-time environments which allocate huge chunks of
memory on startup, without marking them as not yet in use.  SBCL is in
this category, and also the Hotspot VM (at least some extent).

I stand by my assertion: It never makes sense.  Do these
applications allocate a terrabyte of memory?  I doubt it.


SBCL sizes its allocated memory region based on the total amount of
RAM and swap space.  In this case, buying larger disks does not
help. 8-P


SBCL, as Steel Bank Common Lisp? Why would you run that on a server machine 
alongside Postgres? If I had to use SBLC and Postgres, I'd put SBLC on a 
separate machine all its own, so that it couldn't corrupt Postgres or other 
servers that had to be reliable.

Are you saying that if I bought a terrabyte of swap disk, SBLC would allocate a 
terrabyte of space?

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sort causes system to freeze

2008-12-01 Thread Craig James

Maybe this is an obviously dumb thing to do, but it looked reasonable to me.  The problem is, the 
seemingly simple sort below causes a fairly powerful computer to completely freeze for 5-10 
minutes.  During the sort, you can't login, you can't use any shell sessions you already have open, 
the Apache server barely works, and even if you do nice -20 top before you start the 
sort, the top(1) command comes to a halt while the sort is proceeding!  As nearly as I can tell, 
the sort operation is causing a swap storm of some sort -- nothing else in my many years of 
UNIX/Linux experience can cause a nice -20 process to freeze.

The sort operation never finishes -- it's always killed by the system.  Once it 
dies, everything returns to normal.

This is 8.3.0.  (Yes, I'll upgrade soon.)  Is this a known bug, or do I have to 
rewrite this query somehow?  Maybe add indexes to all four columns being sorted?

Thanks!
Craig


= explain select * from plus order by supplier_id, compound_id, units, price;
 QUERY PLAN   
---

Sort  (cost=5517200.48..5587870.73 rows=28268100 width=65)
  Sort Key: supplier_id, compound_id, units, price
  -  Seq Scan on plus  (cost=0.00..859211.00 rows=28268100 width=65)

= \d plus   Table emol_warehouse_1.plus
   Column | Type  | Modifiers 
---+---+---
supplier_id   | integer   | 
supplier_name | text  | 
compound_id   | text  | 
amount| text  | 
units | text  | 
price | numeric(12,2) | 
currency  | text  | 
description   | text  | 
sku   | text  | 
Indexes:

   i_plus_compound_id btree (supplier_id, compound_id)
   i_plus_supplier_id btree (supplier_id)


max_connections = 1000
shared_buffers = 2000MB
work_mem = 256MB
max_fsm_pages = 100
max_fsm_relations = 5000
synchronous_commit = off
#wal_sync_method = fdatasync
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB

Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID controller, 8 
disks as RAID10

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help with 8.4 Performance Testing

2008-12-09 Thread Craig James

justin wrote:

Tom Lane wrote:

Hmm ... I wonder whether this means that the current work on
parallelizing I/O (the posix_fadvise patch in particular) is a dead
end.  Because what that is basically going to do is expend more CPU
to improve I/O efficiency.  If you believe this thesis then that's
not the road we want to go down.

regards, tom lane


What does the CPU/ Memory/Bus performance road map look like?

Is the IO performance for storage device for what ever it be, going to 
be on par with the above to cause this problem?


Flash memory will become just a fourth layer in the memory caching system 
(on-board CPU, high-speed secondary cache, main memory, and persistent memory). 
 The idea of external storage will probably disappear altogether -- computers 
will just have memory, and won't forget anything when you turn them off.  Since 
most computers are 64 bits these days, all data and programs will just hang out 
in memory at all times, and be directly addressable by the CPU.

The distinction between disk and memory arose from the fact that disks were large, slow 
devices relative to core memory and had to be connected by long wires, hence 
the need for I/O subsystems.  As flash memory becomes mainstream, I expect this 
distinction to disappear.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql 8.3 tps rate

2009-01-22 Thread Craig James

David Rees wrote:

On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani
ibrahim.harr...@gmail.com wrote:

Version 1.93d   --Sequential Output-- --Sequential Input- --Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
myserver 300M   391  97  9619   1  8537   2   673  99 + +++  1196  16
Latency   211ms 388ms 325ms   27652us 722us6720ms
Version 1.93d   --Sequential Create-- Random Create
myserver-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
 files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
16  9004  25 + +++ + +++  8246  20 + +++ + +++
Latency   592ms 208us 102us 673ms 179us 100us


You should be testing bonnie with a file size that is at least double
the amount of memory in your machine - in this case, 4GB files, not
300MB files.


When I compare my  bonnie++ result with the one at
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm,
It seems that there is something wrong with the disks!?


Yes, your machine appears to be very slow.  You should be able to
write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+.
Random IO should be in the 200 tps range for a 7200rpm SATA RAID1.


Have you tried the really basic speed test?

 time (dd if=/dev/zero of=bigfile bs=8192 count=100; sync)

 time dd if=bigfile of=/dev/null bs=8192

Divide 8.2GB by the times reported.  On a single 10K SATA drive, I get about 
55MB/sec write and 61 MB/sec read.

If you can't get similar numbers, then something is wrong.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql 8.3 tps rate

2009-01-22 Thread Craig James

Ibrahim Harrani wrote:

Hi Craig,

Here is the result. It seems that disk write is terrible!.

r...@myserver /usr]#  time (dd if=/dev/zero of=bigfile bs=8192
count=100; sync)


100+0 records in
100+0 records out
819200 bytes transferred in 945.343806 secs (8665630 bytes/sec)

real15m46.206s
user0m0.368s
sys 0m15.560s


So it's nothing to do with Postgres.  I'm no expert solving this sort of 
problem, but I'd start by looking for:

 - a rogue process that's using disk bandwidth (use vmstat when the system is 
idle)
 - system logs, maybe there are a zillion error messages
 - if you have a second disk, try its performance
 - if you don't have a second disk, buy one, install it, and try it
 - get another SATA controller and try that

Or do the reverse: Put the disk in a different computer (one that you've tested 
beforehand and verified is fast) and see if the problem follows the disk.  Same 
for the SATA card.

It could be your SATA controller, the disk, some strange hdparm setting ... who 
knows?

I ran into this once a LONG time ago with a kernal that didn't recognize the 
disk or driver or something, and disabled the DMA (direct-memory access) 
feature, which meant the CPU had to handle every single byte coming from the 
disk, which of course meant SLOW, plus you couldn't even type while the disk 
was busy.  A simple manual call to hdparm(1) to force DMA on fixed it.  Weird 
stuff like that can be very hard to find.

I also saw very low write speed once on a RAID device with a battery-backed 
cache, when the battery went dead.  The RAID controller went into its 
conservative mode, which for some reason was much slower than the disk's raw 
performance.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I have a fusion IO drive available for testing

2009-03-31 Thread Craig James

Dave Cramer wrote:
So I tried writing directly to the device, gets around 250MB/s, reads at 
around 500MB/s


The client is using redhat so xfs is not an option.


I'm using Red Hat and XFS, and have been for years. Why is XFS not an option 
with Red Hat?

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition question for new server setup

2009-04-28 Thread Craig James

Whit Armstrong wrote:

I have the opportunity to set up a new postgres server for our
production database.  I've read several times in various postgres
lists about the importance of separating logs from the actual database
data to avoid disk contention.

Can someone suggest a typical partitioning scheme for a postgres server?

My initial thought was to create /var/lib/postgresql as a partition on
a separate set of disks.

However, I can see that the xlog files will be stored here as well:
http://www.postgresql.org/docs/8.3/interactive/storage-file-layout.html

Should the xlog files be stored on a separate partition to improve performance?

Any suggestions would be very helpful.  Or if there is a document that
lays out some best practices for server setup, that would be great.

The database usage will be read heavy (financial data) with batch
writes occurring overnight and occassionally during the day.

server information:
Dell PowerEdge 2970, 8 core Opteron 2384
6 1TB hard drives with a PERC 6i
64GB of ram


We're running a similar configuration: PowerEdge 8 core, PERC 6i, but we have 8 of 
the 2.5 10K 384GB disks.

When I asked the same question on this forum, I was advised to just put all 8 disks into a single RAID 10, and forget about separating things.  The performance of a battery-backed PERC 6i (you did get a battery-backed cache, right?) with 8 disks is quite good. 


In order to separate the logs, OS and data, I'd have to split off at least two 
of the 8 disks, leaving only six for the RAID 10 array.  But then my xlogs 
would be on a single disk, which might not be safe.  A more robust approach 
would be to split off four of the disks, put the OS on a RAID 1, the xlog on a 
RAID 1, and the database data on a 4-disk RAID 10.  Now I've separated the 
data, but my primary partition has lost half its disks.

So, I took the advice, and just made one giant 8-disk RAID 10, and I'm very 
happy with it.  It has everything: Postgres, OS and logs.  But since the RAID 
array is 8 disks instead of 4, the net performance seems to quite good.

But ... your mileage may vary.  My box has just one thing running on it: 
Postgres.  There is almost no other disk activity to interfere with the 
file-system caching.  If your server is going to have a bunch of other activity 
that generate a lot of non-Postgres disk activity, then this advice might not 
apply.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] partition question for new server setup

2009-04-28 Thread Craig James

Kenneth Marshall wrote:

Additionally are there any clear choices w/ regard to filesystem
types? ?Our choices would be xfs, ext3, or ext4.

Well, there's a lot of people who use xfs and ext3.  XFS is generally
rated higher than ext3 both for performance and reliability.  However,
we run Centos 5 in production, and XFS isn't one of the blessed file
systems it comes with, so we're running ext3.  It's worked quite well
for us.



The other optimizations are using data=writeback when mounting the
ext3 filesystem for PostgreSQL and using the elevator=deadline for
the disk driver. I do not know how you specify that for Ubuntu.


After a reading various articles, I thought that noop was the right choice 
when you're using a battery-backed RAID controller.  The RAID controller is going to 
cache all data and reschedule the writes anyway, so the kernal schedule is irrelevant at 
best, and can slow things down.

On Ubuntu, it's

 echo noop /sys/block/hdx/queue/scheduler

where hdx is replaced by the appropriate device.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] superlative missuse

2009-05-14 Thread Craig James

David Wilson wrote:

On Tue, May 12, 2009 at 5:53 PM, Angel Alvarez cl...@uah.es wrote:


we suffer a 'more optimal' superlative missuse

there is  not so 'more optimal' thing but a simple 'better' thing.

im not native english speaker but i think it still applies.

Well this a superlative list so all of you deserve a better optimal use.


As a native english speaker:

You are technically correct. However, more optimal has a
well-understood meaning as closer to optimal, and as such is
appropriate and generally acceptable despite being technically
incorrect.


I disagree -- it's a glaring error.  More optimized or better optimized are perfectly good, and 
correct, phrases.  Why not use them?  Every time I read more optimal, I am embarrassed for the person who 
is showing his/her ignorance of the basics of English grammar.  If I wrote, It's more best, would you find 
that acceptable?


This is a postgres mailing list, not an english grammar mailing list...


Since you replied on the list, it's only appropriate to get at least one 
rebuttal.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] raid10 hard disk choice

2009-05-21 Thread Craig James

Matthew Wakeling wrote:

On Thu, 21 May 2009, Linos wrote:
i have to buy a new server and in the budget i have (small) i have 
to select one of this two options:


-4 sas 146gb 15k rpm raid10.
-8 sas 146gb 10k rpm raid10.


It depends what you are doing. I think in most situations, the second 
option is better, but there may be a few situations where the reverse is 
true.


Basically, the first option will only be faster if you are doing lots of 
seeking (small requests) in a single thread. As soon as you go 
multi-threaded or are looking at sequential scans, you're better off 
with more discs.


Since you have to share the disks with a file server, which might be heavily 
used, the 8-disk array will probably be better even if you're doing lots of 
seeking in a single thread.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hosted servers with good DB disk performance?

2009-05-26 Thread Craig James

Greg Smith wrote:
What I'd love to have is a way to rent a fairly serious piece of 
dedicated hardware, ideally with multiple (at least 4) hard drives in a 
RAID configuration and a battery-backed write cache.  The cache is 
negotiable. Linux would be preferred, FreeBSD or Solaris would also 
work; not Windows though (see good DB performance).


We tried this with poor results.  Most of the co-location and server-farm 
places are set up with generic systems that are optimized for 
small-to-medium-sized web sites.  They use MySQL and are surprised to hear 
there's an alternative open-source DB.  They claim to be able to create custom 
configurations, but it's a lie.

The problem is that they run on thin profit margins, and their techs are mostly 
ignorant, they just follow scripts.  If something goes wrong, or they make an 
error, you can't get anything through their thick heads.  And you can't go down 
there and fix it yourself.

For example, we told them EXACTLY how to set up our system, but they decided 
that automatic monthly RPM OS updates couldn't hurt.  So the first of the 
month, we in the morning to find that Linux had been updated to libraries that 
were incompatible with our own software, the system automatically rebooted and 
our web site was dead.  And many similar incidents.

We finally bought some nice Dell servers and found a co-location site that 
provides us all the infrastructure (reliable power, internet, cooling, 
security...), and we're in charge of the computers.  We've never looked back.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan issues - volatile tables

2009-06-04 Thread Craig James

Brian Herlihy wrote:

We have a problem with some of our query plans.  One of our
tables is quite volatile, but postgres always uses the last
statistics snapshot from the last time it was analyzed for query
planning.  Is there a way to tell postgres that it should not
trust the statistics for this table?  Basically we want it to
assume that there may be 0, 1 or 100,000 entries coming out from
a query on that table at any time, and that it should not make
any assumptions. 


I had a similar problem, and just changed my application to do an analyze 
either just before the query, or just after a major update to the table.  
Analyze is very fast, almost always a orders of magnitude faster than the time 
lost to a poor query plan.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Scalability in postgres

2009-06-05 Thread Craig James

Greg Smith wrote:
No amount of theoretical discussion advances that any until 
you're at least staring at a very specific locking problem you've 
already characterized extensively via profiling.  And even then, 
profiling trumps theory every time.


In theory, there is no difference between theory and practice.  In practice, 
there is a great deal of difference.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Craig James

Mathieu Nebra wrote:

Greg Stark a écrit :

All the other comments are accurate, though it does seem like
something the database ought to be able to handle.

The other thing which hasn't been mentioned is that you have a lot of
indexes. Updates require maintaining all those indexes. Are all of
these indexes really necessary? Do you have routine queries which look
up users based on their flags? Or all all your oltp transactions for
specific userids in which case you probably just need the index on
userid.



We are using these indexes, but I can't be sure if we _really_ need them
or not.

I can go into detail. We have:

UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite

So basically, we toggle the boolean flag WrittenStatus when the user has
written in that topic. The same goes for IsFavorite.


Do those last two columns hold much data?  Another thing to consider is to 
split this into two tables:

 UserID - TopicID - LastReadAnswerID 


 UserID - TopicID - WrittenStatus - IsFavorite

As others have pointed out, an UPDATE in Postgres is a select/delete/insert, 
and if you're updating just the LastReadAnswerID all the time, you're wasting 
time deleting and re-inserting a lot of data that never change (assuming 
they're not trivially small columns).

This might also solve the problem of too many indexes -- the table that's 
updated frequently would only have an index on (UserID, TopicID), so the update 
only affects one index.

Then to minimize the impact on your app, create a view that looks like the 
original table for read-only apps.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sorting by an arbitrary criterion

2009-07-09 Thread Craig James

Suppose I have a large table with a small-cardinality CATEGORY column (say, 
categories 1..5).  I need to sort by an arbitrary (i.e. user-specified) mapping 
of CATEGORY, something like this:

 1 = 'z'
 2 = 'a'
 3 = 'b'
 4 = 'w'
 5 = 'h'

So when I get done, the sort order should be 2,3,5,4,1.

I could create a temporary table with the category-to-key mapping, but is there 
any way to do this in a single SQL statement?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Used computers?

2009-07-20 Thread Craig James

Apologies for a slightly off-topic question ... a friend is overseeing the demise 
of a company and has several computers that they need to get rid of.  She's an 
attorney and knows little about them except that they're IBM and cost $50K 
originally.  Where does one go to sell equipment like this, and/or get a rough 
idea of its worth?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Per-database warm standby?

2009-08-14 Thread Craig James

8.4 has vastly improved the warm-standby features, but it looks to me like this 
is still an installation-wide backup, not a per-database backup.  That is, if 
you have (say) a couple hundred databases, and you only want warm-backup on one 
of them, you can't do it (except using other solutions like Slony).  Is that 
right?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Fabio La Farcioli wrote:

i am developing a web app for thousands users (1.000/2.000).

Each user have a 2 table of work...I finally have 2.000 (users) x 2 
tables = 4.000 tables!


Postgres support an elevate number of tables??
i have problem of performance ???


We have run databases with over 100,000 tables with no problems.

However, we found that it's not a good idea to have a table-per-user design.  
As you get more users, it is hard to maintain the database.  Most of the time 
there are only a few users active.

So, we create a single large archive table, identical to the per-user table except that 
it also has a user-id column.  When a user hasn't logged in for a few hours, a cron process copies 
their tables into the large archive table, and returns their personal tables to a pool 
of available tables.

When the user logs back in, a hidden part of the login process gets a table 
from the pool of available tables, assigns it to this user, and copies the 
user's  data from the archive into this personal table.  They are now ready to 
work. This whole process takes just a fraction of a second for most users.

We keep a pool of about 200 tables, which automatically will expand (create 
more tables) if needed, but we've never had more than 200 users active at one 
time.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of tables

2009-08-20 Thread Craig James

Greg Stark wrote:

What you want is a multi-column primary key where userid is part of
the key. You don't want to have a separate table for each user unless
each user has their own unique set of columns.


Not always true.


When the user logs back in, a hidden part of the login process gets a table
from the pool of available tables, assigns it to this user, and copies the
user's  data from the archive into this personal table.  They are now ready
to work. This whole process takes just a fraction of a second for most
users.


And what does all this accomplish?


The primary difference is between

 delete from big_table where userid = xx

vesus

 truncate user_table

There are also significant differences in performance for large inserts, 
because a single-user table almost never needs indexes at all, whereas a big 
table for everyone has to have at least one user-id column that's indexed.

In our application, the per-user tables are hitlists -- scratch lists that 
are populated something like this.  The hitlist is something like this:

  create table hitlist_xxx (
row_id integer,
sortorder integer default nextval('hitlist_seq_xxx')
  )


  truncate table hitlist_xxx;
  select setval(hitlist_seq_xxx, 1, false);
  insert into hitlist_xxx (row_id) (select some_id from ... where ... order by 
...);

Once the hitlist is populated, the user can page through it quickly with no 
further searching, e.g. using a web app.

We tested the performance using a single large table in Postgres, and it was 
not nearly what we needed.  These hitlists tend to be transitory, and the 
typical operation is to discard the entire list and create a new one.  
Sometimes the user will sort the entire list based on some criterion, which 
also requires a copy/delete/re-insert using a new order-by.

With both Oracle and Postgres, truncate is MUCH faster than delete, and the 
added index needed for a single large table only makes it worse.  With 
Postgres, the repeated large delete/insert makes for tables that need a lot of 
vacuuming and index bloat, further hurting performance.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Using Gprof with Postgresql

2009-09-08 Thread Craig James

Pierre Frédéric Caillaud wrote:
I just compiled it with gcc and produces the gmon.out file for every 
process; by the way I am running below script in order to produce 
readable .out files


  gprof .../pgsql/bin/postgres gmon.out  createtable2.out

is postgres the right executable?

regards
reydan


Off topic, but hace you tried oprofile ? It's excellent...


I find valgrind to be an excellent profiling tool.  It has the advantage that it runs on 
an unmodified executable (using a virtual machine).  You can compile postgres the regular 
way, start the system up, and then create a short shell script called 
postgres that you put in place of the original executable that invokes 
valgrind on the original executable.  Then when postgres starts up your backend, you have 
just one valgrind process running, rather than the whole Postgres system.

Valgrind does 100% tracing of the program rather than statistical sampling, and 
since it runs in a pure virtual machine, it can detect almost all memory 
corruption and leaks.

The big disadvantage of valgrind is that it slows the process WAY down, like by 
a factor of 5-10 on CPU.  For a pure CPU process, it doesn't screw up your 
stats, but if a process is mixed CPU and I/O, the CPU will appear to dominate.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!

2009-09-24 Thread Craig James

Dave Dutcher wrote:

You need a COMMIT for every BEGIN.  If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction.  If you are beginning a transaction,
doing a select, and then not committing, then yes that is a bug.


The BEGIN can be hidden, though.  For example, if the application is written in 
Perl,

 $dbh = DBI-connect($dsn, $user, $pass, {AutoCommit = 0});

will automatically start a transaction the first time you do anything.  Under 
the covers, the Perl DBI issues the BEGIN for you, and you have to do an 
explicit

 $dbh-commit();

to commit it.

Craig



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!

2009-09-25 Thread Craig James

Gerhard Wiesinger wrote:

Hello Craig,

Are you sure this is correct?

The test program (see below) with autocommit=0 counts up when an insert 
is done in another session and there is no commit done.


I think with each new select a new implicit transaction is done when no 
explicit BEGIN has been established.


Sorry, I should have been more specific.  A transaction starts when you do 
something that will alter data in the database, such as insert, update, alter 
table, create sequence, and so forth.  The Perl DBI won't start a transaction 
for a select.

But my basic point is still valid: Some languages like Perl can implicitely 
start a transaction, so if programmers aren't familiar with this behavior, they 
can accidentally create long-running transactions.

Craig



Can one confirm this behavior?

Thnx.

Ciao,
Gerhard

# Disable autocommit!
my $dbh = DBI-connect($con, $dbuser, $dbpass, {RaiseError = 1, 
AutoCommit=0}) || die Unable to access Database '$dbname' on host 
'$dbhost' as user '$dbuser'. Error returned was: . $DBI::errstr .;


my $sth = $dbh-prepare('SELECT COUNT(*) FROM employee;');

for (;;)
{
  $sth-execute();
  my ($count) = $sth-fetchrow();
  print count=$count\n;
  $sth-finish();
#  $dbh-commit;
  sleep(3);
}

$dbh-disconnect;

--
http://www.wiesinger.com/


On Thu, 24 Sep 2009, Craig James wrote:


Dave Dutcher wrote:

You need a COMMIT for every BEGIN.  If you just run a SELECT statement
without first beginning a transaction, then you should not end up with a
connection that is Idle in Transaction.  If you are beginning a 
transaction,

doing a select, and then not committing, then yes that is a bug.


The BEGIN can be hidden, though.  For example, if the application is 
written in Perl,


$dbh = DBI-connect($dsn, $user, $pass, {AutoCommit = 0});

will automatically start a transaction the first time you do 
anything.  Under the covers, the Perl DBI issues the BEGIN for you, 
and you have to do an explicit


$dbh-commit();

to commit it.

Craig



--
Sent via pgsql-performance mailing list 
(pgsql-performance@postgresql.org)

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad performance of SELECT ... where id IN (...)

2009-09-26 Thread Craig James

Xia Qingran wrote:

Hi,
I have a big performance problem in my SQL select query:


select * from event where user_id in
(500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401,
400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301,
300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201,
200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101,
100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);


The above SELECT always spends 1200ms.


If your user_id is always in a narrow range like this, or even in any range 
that is a small fraction of the total, then add a range condition, like this:

select * from event where user_id = 500 and user_id = 0 and user_id in (...)

I did this exact same thing in my application and it worked well.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best suiting OS

2009-10-05 Thread Craig James

Scott Marlowe wrote:

Personally, I use Fedora, and my servers have been quite stable. One of our
main web servers running Fedora:


It's not that there can't be stable releases of FC, it's that it's not
the focus of that project.  So, if you get lucky, great!  I can't
imagine running a production DB on FC, with it's short supported life
span and focus on development and not stability.


I use Fedora, and it was a mistake. I am looking for a better solution.  Fedora 
has been very stable (uptime of 430 days on one server), BUT...

Realistically, the lifetime of a release is as low as SIX MONTHS.  We bought 
servers just as a FC release was coming out, and thought we'd be safe by going 
with the older, tested release.  But six months after that, the next FC release 
came out, and the version we'd installed fell off the support list.

It takes almost no time with Fedora to run into big problems.  Maybe there's a 
security release of ssh, you try to compile it, but it needs the latest gcc, 
but that's not available on your unsupported version of FC that you installed 
less than a year ago.

Or maybe you need a new version of PHP to pass audit with your credit-card 
processor, but again, your FC release isn't supported so you have to uninstall 
the FC PHP, get the source, and compile PHP from scratch ... on and on it goes.

Fedora is a very nice project, but it's not suitable for production database 
servers.

This discussion has been very helpful indeed, and we appreciate everyone's 
contributions.  I'm leaning towards a stable Debian release for our next 
upgrade, but there are several other well-reasoned suggestions here.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] updating a row in a table with only one row

2009-10-06 Thread Craig James

Merlin Moncure wrote:

On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek f...@mageo.cz wrote:

Merlin Moncure wrote:

On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek f...@mageo.cz wrote:


 Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB
 with write-back enabled. Could it be that its internal cache becomes
 full and all disk I/O operations are delayed until it writes all
 changes to hard drives?

that's possible...the red flag is going to be iowait. if your server
can't keep up with the sync demands for example, you will eventually
outrun the write cache and you can start to see slow queries.  With
your server though it would take in the hundreds of (write)
transactions per second to do that minimum.

 The problem is that the server is not loaded in any way. The iowait is
 0.62%, there's only 72 sectors written/s, but the maximum await that I
 saw was 28ms (!). Any attempts to reduce the time (I/O schedulers,
 disabling bgwriter, increasing number of checkpoints, decreasing shared
 buffers, disabling read cache on the card etc.) didn't help. After some
 3-5m there occurs a COMMIT which takes 100-1x longer time than
 usual. Setting fsynch to off Temporarily improved the COMMIT times
 considerably but I fear to have this option off all the time.

 Is anybody else using the same RAID card? I suspect the problem lies
 somewhere between the aacraid module and the card. The aacraid module
 ignores setting of the 'cache' parameter to 3 -- this should completely
 disable the SYNCHRONIZE_CACHE command.


I think you're right.  One thing you can do is leave fsync on but
disable synchronous_commit.  This is compromise between fsync on/off
(data consistent following crash, but you may lose some transactions).

We need to know what iowait is at the precise moment you get the long
commit time.  Throw a top, give it short update interval (like .25
seconds), and watch.


top(1) has a batch mode (-b) that's useful for sending results to a file.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query plan for NOT IN

2009-10-07 Thread Craig James

Kevin Grittner wrote:

Which leaves the issue open -- a flexible way to flag the *reason* (or
*reasons*) for the absence of a value could be a nice enhancement, if
someone could invent a good implementation.  Of course, one could
always add a column to indicate the reason for a NULL; and perhaps
that would be as good as any scheme to attach reason flags to NULL. 
You'd just have to make sure the reason column was null capable for

those rows where there *was* a value, which would make the reason not
applicable


I'd argue that this is just a special case of a broader problem of metadata: Data about the data.  
For example, I could have a temperature, 40 degrees, and an error bounds, +/- 0.25 degrees.  Nobody 
would think twice about making these separate columns.  I don't see how this is any different from 
a person's middle initial of NULL, plus a separate column indicating not known versus 
doesn't have one if that distinction is important.  There are many examples like this, 
where a simple value in one column isn't sufficient, so another column contains metadata that 
qualifies or clarifies the information.  NULL is just one such case.

But, this should probably be on an SQL discussion board, not PG performance...

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   >