Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-15 Thread Steve Wampler

Chris Browne wrote:

jgard...@jonathangardner.net jgard...@jonathangardner.net writes:

My question is how can I configure the database to run as quickly as
possible if I don't care about data consistency or durability? That
is, the data is updated so often and it can be reproduced fairly
rapidly so that if there is a server crash or random particles from
space mess up memory we'd just restart the machine and move on.


For such a scenario, I'd suggest you:

- Set up a filesystem that is memory-backed.  On Linux, RamFS or TmpFS
  are reasonable options for this.

- The complication would be that your restart the machine and move
  on needs to consist of quite a few steps:

  - recreating the filesystem
  - fixing permissions as needed
  - running initdb to set up new PG instance
  - automating any needful fiddling with postgresql.conf, pg_hba.conf
  - starting up that PG instance
  - creating users, databases, schemas, ...


Doesn't PG now support putting both WAL and user table files onto
file systems other than the one holding the PG config files and PG
'admin' tables?  Wouldn't doing so simplify the above considertably
by allowing just the WAL and user tables on the memory-backed file
systems?  I wouldn't think the performance impact of leaving
the rest of the stuff on disk would be that large.

Or does losing WAL files mandate a new initdb?

--
Steve Wampler -- swamp...@noao.edu
The gods that smiled on your birth are now laughing out loud.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-27 Thread Steve Wampler
On Thu, Apr 27, 2006 at 08:57:51AM -0400, Ketema Harris wrote:
 OK.  My thought process was that having non local storage as say a big raid
 5 san ( I am talking 5 TB with expansion capability up to 10 ) would allow
 me to have redundancy, expandability, and hopefully still retain decent
 performance from the db.  I also would hopefully then not have to do
 periodic backups from the db server to some other type of storage.  Is this
 not a good idea?  How bad of a performance hit are we talking about?  Also,
 in regards to the commit data integrity, as far as the db is concerned once
 the data is sent to the san or nas isn't it written?  The storage may have
 that write in cache, but from my reading and understanding of how these
 various storage devices work that is how they keep up performance.  I would
 expect my bottleneck if any to be the actual Ethernet transfer to the
 storage, and I am going to try and compensate for that with a full gigabit
 backbone.

Well, if you have to have both the best performance and remote attach
storage, I think you'll find that a fibre-channel SAN is still the king
of the hill.  4Gb FC switches are common now, though finding a 4Gb
HBA for your computer might be a trick.  2Gb HBAs are everywhere in
FC land.  That's a premium price solution, however, and I don't know
anything about how well PG would perform with a FC SAN.  We use our
SAN for bulk science data and leave the PGDB on a separate machine
with local disk.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Running on an NFS Mounted Directory

2006-04-26 Thread Steve Wampler
On Wed, Apr 26, 2006 at 10:06:58PM -0400, Ketema Harris wrote:
 I was wondering if there were any performance issues with having a data
 directory that was an nfs mounted drive?  Say like a SAN or NAS device? Has
 anyone done this before?
 
My understanding is that NFS is pretty poor in performance in general,
so I would expect it to be particularly bad for a DB.  You might run
some (non-DB) performance tests to get a feel for how bad it might me.
(Someone once told me that NFS topped out at around 12MB/s, but I don't
know if that's really true [they were trying to sell a competitive
networked filesystem]).

In any event, you're at least limited by ethernet speeds, if not more.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [pgsql-www] [PERFORM] Help speeding up delete

2005-11-17 Thread Steve Wampler
Christopher Kings-Lynne wrote:
 That most people don't know they should use the new one I understand
 though. But I don't see how this will help against that :-)
 
 It'll make it easy...

As the miscreant that caused this thread to get started, let me
*wholeheartedly* agree with Chris.  An easy way to get the pg_dump
for the upgrade target to run with the upgradable source
would work wonders.  (Instructions included, of course.)


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread Steve Wampler
Joshua D. Drake wrote:
 The reason you want the dual core cpus is that PostgreSQL can only
 execute 1 query per cpu at a time,...

Is that true?  I knew that PG only used one cpu per query, but how
does PG know how many CPUs there are to limit the number of queries?

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-16 Thread Steve Wampler
David Boreham wrote:
 Steve Wampler wrote:
 
 Joshua D. Drake wrote:
  

 The reason you want the dual core cpus is that PostgreSQL can only
 execute 1 query per cpu at a time,...
   


 Is that true?  I knew that PG only used one cpu per query, but how
 does PG know how many CPUs there are to limit the number of queries?

  

 He means only one query can be executing on each cpu at any particular
 instant.

Got it - the cpu is only acting on one query in any instant but may be
switching between many 'simultaneous' queries.  PG isn't really involved
in the decision.  That makes sense.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Help speeding up delete

2005-11-16 Thread Steve Wampler
Arjen van der Meijden wrote:
 On 15-11-2005 15:18, Steve Wampler wrote:
 
 Magnus Hagander wrote:
 (This is after putting an index on the (id,name,value) tuple.)  That
 outer seq scan
 is still annoying, but maybe this will be fast enough.

 I've passed this on, along with the (strong) recommendation that they
 upgrade PG.
 
 
 Have you tried with an index on (name,value) and of course one on id ?

Yes, although not with a unique index on (name,value) [possible, but not
so on the just-id index].  Anyway, it turns out the latest incarnation
is 'fast enough' for the user's need, so she's not doing any more with
it until after an upgrade.


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Help speeding up delete

2005-11-15 Thread Steve Wampler
Magnus Hagander wrote:
Because I think we need to.  The above would only delete rows 
that have name = 'obsid' and value = 'oid080505'.  We need to 
delete all rows that have the same ids as those rows.  
However, from what you note, I bet we could do:

   DELETE FROM tmp_table2 WHERE id IN
  (SELECT id FROM temp_table2 WHERE name = 'obsid' and 
value= 'oid080505');

However, even that seems to have a much higher cost than I'd expect:

   lab.devel.configdb=# explain delete from tmp_table2 where id in
(select id from tmp_table2 where name='obsid' and 
value = 'oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=0.00..65705177237.26 
rows=769844 width=6)
 SubPlan
   -  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
 -  Seq Scan on tmp_table2  (cost=0.00..42674.32 
rows=38 width=50)

   EXPLAIN
...
 
 Earlier pg versions have always been bad at dealing with IN subqueries.
 Try rewriting it as (with fixing any broken syntax, I'm not actually
 testing this :P)
 
 DELETE FROM tmp_table2 WHERE EXISTS 
  (SELECT * FROM tmp_table2 t2 WHERE t2.id=tmp_table2.id AND
 t2.name='obsid' AND t2.value='oid080505')

Thanks - that looks *significantly* better:

   lab.devel.configdb=# explain delete from tmp_table2 where exists
   (select 1 from tmp_table2 t2 where
   t2.id=tmp_table2.id and
   t2.name='obsid' and t2.value='oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=0.00..9297614.80 rows=769844 width=6)
 SubPlan
   -  Index Scan using inv_index_2 on tmp_table2 t2  (cost=0.00..6.02 
rows=1 width=0)

   EXPLAIN

(This is after putting an index on the (id,name,value) tuple.)  That outer seq 
scan
is still annoying, but maybe this will be fast enough.

I've passed this on, along with the (strong) recommendation that they
upgrade PG.

Thanks!!

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance PG 8.0 on dual opteron / 4GB / 3ware

2005-11-15 Thread Steve Wampler
Joost Kraaijeveld wrote:
 If I understand correctly (I have 4GB ram):
 
 [EMAIL PROTECTED]:~/tmp$ dd if=/dev/zero of=bigfile bs=8k count=100
 100+0 records in
 100+0 records out
 819200 bytes transferred in 304.085269 seconds (26939812 bytes/sec)
 
 Which looks suspicious: 26308 MB/sec???

Eh?  That looks more like ~25.7 MB/sec, assuming 1MB = 1024*1024 bytes.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler

We've got an older system in production (PG 7.2.4).  Recently
one of the users has wanted to implement a selective delete,
but is finding that the time it appears to take exceeds her
patience factor by several orders of magnitude.  Here's
a synopsis of her report.  It appears that the WHERE
id IN ... is resulting in a seq scan that is causing
the problem, but we're not SQL expert enough to know
what to do about it.

Can someone point out what we're doing wrong, or how we
could get a (much) faster delete?  Thanks!

Report:

This command yields results in only a few seconds:

# SELECT at.id FROM tmp_table2 at, tmp_tabl2e a
#   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';

However, the following command does not seen to want to ever
complete (the person running this killed it after 1/2 hour).

# DELETE FROM tmp_table2 WHERE id IN
# (SELECT at.id FROM tmp_table2 at, tmp_table2 a
#   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505');

==

The table has four columns. There are 6175 rows satifying the condition
given, and the table itself has 1539688 entries.  Layout is:

lab.devel.configdb=# \d tmp_table2
  Table tmp_table2
 Column |   Type   | Modifiers
+--+---
 id | character varying(64)|
 name   | character varying(64)|
 units  | character varying(32)|
 value  | text |
 time   | timestamp with time zone |

==

lab.devel.configdb=# EXPLAIN DELETE FROM tmp_table2 WHERE id IN
lab.devel.configdb-#   (SELECT at.id FROM tmp_table2 at, tmp_table2 a
lab.devel.configdb(# WHERE at.id=a.id AND a.name='obsid' AND 
a.value='oid080505');
NOTICE:  QUERY PLAN:

Seq Scan on tmp_table2  (cost=0.00..154893452082.10 rows=769844 width=6)
  SubPlan
-  Materialize  (cost=100600.52..100600.52 rows=296330 width=100)
  -  Hash Join  (cost=42674.42..100600.52 rows=296330 width=100)
-  Seq Scan on tmp_table2 at  (cost=0.00..34975.88 
rows=1539688 width=50)
-  Hash  (cost=42674.32..42674.32 rows=38 width=50)
  -  Seq Scan on tmp_table2 a  (cost=0.00..42674.32 
rows=38 width=50)
EXPLAIN

lab.devel.configdb=# EXPLAIN  (SELECT at.id FROM tmp_table2 at, tmp_table2 a
lab.devel.configdb(#WHERE at.id=a.id AND a.name='obsid' AND 
a.value='oid080505');
NOTICE:  QUERY PLAN:

Hash Join  (cost=42674.42..100600.52 rows=296330 width=100)
  -  Seq Scan on tmp_table2 at  (cost=0.00..34975.88 rows=1539688 width=50)
  -  Hash  (cost=42674.32..42674.32 rows=38 width=50)
-  Seq Scan on tmp_table2 a  (cost=0.00..42674.32 rows=38 width=50)

EXPLAIN

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Scott Lamb wrote:
 On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote:
 
 # SELECT at.id FROM tmp_table2 at, tmp_tabl2e a
 #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';
 
 
 Isn't this equivalent?
 
 select id from tmp_table2 where name = 'obsid' and value = 'oid080505';

Probably, the user based the above on a query designed to find
all rows with the same id as those rows that have a.name='obsid' and
a.value='oid080505'.  However, I think the above would work to locate
all the ids, which is all we need for the delete (see below)

 # DELETE FROM tmp_table2 WHERE id IN
 # (SELECT at.id FROM tmp_table2 at, tmp_table2 a
 #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505');
 
 
 and this?
 
 delete from tmp_table2 where name = 'obsid' and value = 'oid080505';
 
 Why are you doing a self-join using id, which I assume is a primary key?

Because I think we need to.  The above would only delete rows that have
name = 'obsid' and value = 'oid080505'.  We need to delete all rows that
have the same ids as those rows.  However, from what you note, I bet
we could do:

   DELETE FROM tmp_table2 WHERE id IN
  (SELECT id FROM temp_table2 WHERE name = 'obsid' and value= 
'oid080505');

However, even that seems to have a much higher cost than I'd expect:

   lab.devel.configdb=# explain delete from tmp_table2 where id in
(select id from tmp_table2 where name='obsid' and value = 'oid080505');
   NOTICE:  QUERY PLAN:

   Seq Scan on tmp_table2  (cost=0.00..65705177237.26 rows=769844 width=6)
 SubPlan
   -  Materialize  (cost=42674.32..42674.32 rows=38 width=50)
 -  Seq Scan on tmp_table2  (cost=0.00..42674.32 rows=38 width=50)

   EXPLAIN

And, sure enough, is taking an extrordinarily long time to run (more than
10 minutes so far, compared to  10seconds for the select).  Is this
really typical of deletes?  It appears (to me) to be the Seq Scan on tmp_table2
that is the killer here.  If we put an index on, would it help?  (The user
claims she tried that and it's EXPLAIN cost went even higher, but I haven't
checked that...)

Thanks!
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Tom Lane wrote:
 Steve Wampler [EMAIL PROTECTED] writes:
 
We've got an older system in production (PG 7.2.4).  Recently
one of the users has wanted to implement a selective delete,
but is finding that the time it appears to take exceeds her
patience factor by several orders of magnitude.  Here's
a synopsis of her report.  It appears that the WHERE
id IN ... is resulting in a seq scan that is causing
the problem, but we're not SQL expert enough to know
what to do about it.
 
 
Can someone point out what we're doing wrong, or how we
could get a (much) faster delete?  Thanks!
 
 
 Update to 7.4 or later ;-)

I was afraid you'd say that :-)  I'm not officially involved in
this project anymore and was hoping for a fix that wouldn't drag
me back in.  The security issues aren't a concern because this
DB is *well* hidden from the outside world (it's part of a telescope
control system behind several firewalls with no outside access).
However, the data-loss-grade bugs issue *is* important.  We'll
try to do the upgrade as soon as we get some cloudy days to
actually do it!

Is the performance behavior that we're experiencing a known
problem with 7.2 that has been addressed in 7.4?  Or will the
upgrade fix other problems while leaving this one?

 Quite seriously, if you're still using 7.2.4 for production purposes
 you could justifiably be accused of negligence.  There are three or four
 data-loss-grade bugs fixed in the later 7.2.x releases, not to mention
 security holes; and that was before we abandoned support for 7.2.
 You *really* need to be thinking about an update.

Thanks!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Help speeding up delete

2005-11-14 Thread Steve Wampler
Scott Lamb wrote:
 On Nov 14, 2005, at 3:52 PM, Steve Wampler wrote:
 
 Scott Lamb wrote:

 On Nov 14, 2005, at 2:07 PM, Steve Wampler wrote:

 # SELECT at.id FROM tmp_table2 at, tmp_tabl2e a
 #   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';



 Isn't this equivalent?

 select id from tmp_table2 where name = 'obsid' and value =  'oid080505';


 Probably, the user based the above on a query designed to find
 all rows with the same id as those rows that have a.name='obsid' and
 a.value='oid080505'.
 
 
 Well, this indirection is only significant if those two sets can 
 differ. If (A) you meant tmp_table2 when you wrote tmp_tabl2e, so 
 this is a self-join, and (B) there is a primary key on id, I don't 
 think that can ever happen.

I wasn't clear.  The original query was:

   SELECT at.* FROM tmp_table2 at, tmp_table2 a
   WHERE at.id=a.id and a.name='obsid' and a.value='oid080505';

which is significantly different than:

   SELECT * FROM tmp_table2 WHERE name='obsid' and value='oid080505';

The user had adapted that query for her needs, but it would have been
better to just use the query that you suggested (as the subselect in
the DELETE FROM...).  Unfortunately, that only improves performance
slightly - it is still way too slow on deletes.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] batch inserts are slow

2005-05-03 Thread Steve Wampler
Tim Terlegård wrote:

Just as on Oracle you would use SQL*Loader for this application, you
should use the COPY syntax for PostgreSQL.  You will find it a lot
faster.  I have used it by building the input files and executing
'psql' with a COPY command, and also by using it with a subprocess,
both are quite effective.
 
 
 I tried this now. Now it's down to 0.45 seconds. It feels a bit hacky to
 run /usr/bin/psql from java, but it sure works. Thanks for the hint!

There was a patch against 7.4 that provided direct JDBC access to
PostgreSQL's COPY.  (I have it installed here and *love* it - it
gives outstanding performance.)  However, it hasn't made into an
official release yet.  I don't know why, perhaps there's
a problem yet to be solved with it ('works for me', though)?

Is this still on the board?  I won't upgrade past 7.4 without it.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: RE : RE: [PERFORM] Postgresql vs SQLserver for this application

2005-04-06 Thread Steve Wampler
Mohan, Ross wrote:
 I wish I had a Dell system and run case to show you Alex, but I don't...
 however...using Oracle's direct path feature, it's pretty straightforward. 
 
 We've done 110,000 rows per second into index-less tables on a big system
 (IBM Power5 chips, Hitachi SAN). ( Yes, I am sure: over 100K a second. 
 Sustained
 for almost 9 minutes. )
 
 Yes, this is an exception, but oracle directpath/InsertAppend/BulkLoad
 feature enabled us to migrate a 4 TB database...really quickly. 

How close to this is PG's COPY?  I get surprisingly good results using
COPY with jdbc on smallish systems (now if that patch would make into
the mainstream PG jdbc support!)  I think COPY has a bit more overhead
than what a Bulkload feature may have, but I suspect it's not that
much more.

 Now...if you ask me can this work without Power5 and Hitachi SAN?
 my answer is..you give me a top end Dell and SCSI III on 15K disks
 and I'll likely easily match it, yea.
 
 I'd love to see PG get into this range..i am a big fan of PG (just a
 rank newbie) but I gotta think the underlying code to do this has
 to be not-too-complex.

It may not be that far off if you can use COPY instead of INSERT.
But comparing Bulkload to INSERT is a bit apples-orangish.

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Reading recommendations

2005-03-31 Thread Steve Wampler
Stefan Weiss wrote:
 On 2005-03-31 15:19, [EMAIL PROTECTED] wrote:
 
Now *that* I want to see.  Aught to be at least as interesting
as the TCP/IP over carrier pigeon experiment - and more
challenging to boot!
 
 ..
 
Interestingly, we had a follow on contract to investigate routing
optimization using flooding techniques.  Oddly, it was commissioned by a
consortium of local car washes.  Work stopped when the park service sued us
for the cost of cleaning all the statuary, and the company went out of
business.  We were serving cornish game hens at our frequent dinner
parties for months.
 
 
 This method might have been safer (and it works great with Apaches):
 http://eagle.auc.ca/~dreid/

Aha - VOIPOBD as well as VOIPOBT!  What more can one want?

VOIPOCP, I suppose...


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
Mohan, Ross wrote:
 VOIP over BitTorrent?  

Now *that* I want to see.  Aught to be at least as interesting
as the TCP/IP over carrier pigeon experiment - and more
challenging to boot!


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
[EMAIL PROTECTED] wrote:

Mohan, Ross wrote:

VOIP over BitTorrent?

Now *that* I want to see.  Aught to be at least as interesting
as the TCP/IP over carrier pigeon experiment - and more
challenging to boot!

 
 
 It was very challenging.  I worked on the credit window sizing and
 retransmission timer estimation algorithms.  We took into account weather
 patterns, size and age of the bird, feeding times, and the average number
 of times a bird circles before determining magnetic north.  Interestingly,
 packet size had little effect in the final algorithms.
 
 I would love to share them with all of you, but they're classified.

Ah, but VOIPOBT requires many people all saying the same thing at the
same time.  The synchronization alone (since you need to distribute
these people adequately to avoid overloading a trunk line...) is probably
sufficiently hard to make it interesting.  Then there are the problems of
different accents, dilects, and languages ;)

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Steve Wampler
Hervé Piedvache wrote:
No ... as I have said ... how I'll manage a database getting a table of may be 
250 000 000 records ? I'll need incredible servers ... to get quick access or 
index reading ... no ?

So what we would like to get is a pool of small servers able to make one 
virtual server ... for that is called a Cluster ... no ?

I know they are not using PostgreSQL ... but how a company like Google do to 
get an incredible database in size and so quick access ?
Probably by carefully partitioning their data.  I can't imagine anything
being fast on a single table in 250,000,000 tuple range.  Nor can I
really imagine any database that efficiently splits a single table
across multiple machines (or even inefficiently unless some internal
partitioning is being done).
So, you'll have to do some work at your end and not just hope that
a magic bullet is available.
Once you've got the data partitioned, the question becomes one of
how to inhance performance/scalability.  Have you considered RAIDb?
--
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL clustering VS MySQL clustering

2005-01-20 Thread Steve Wampler
Hervé Piedvache wrote:
Le Jeudi 20 Janvier 2005 16:23, Dave Cramer a écrit :
Google uses something called the google filesystem, look it up in
google. It is a distributed file system.

Yes that's another point I'm working on ... make a cluster of server using 
GFS ... and making PostgreSQL running with it ...
A few years ago I played around with GFS, but not for postgresql.
I don't think it's going to help - logically there's no difference
between putting PG on GFS and putting PG on NFS - in both cases
the filesystem doesn't provide any support for distributing the
task at hand - and a PG database server isn't written to be
distributed across hosts regardless of the distribution of the
data across filesystems.

--
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Steve Wampler
On Mon, 2004-06-07 at 02:26, Kris Jurka wrote:
 On Sat, 5 Jun 2004, Steve Wampler wrote:
 
  
  [I want to use copy from JDBC]
  
 
 I made a patch to the driver to support COPY as a PG extension.
...
 http://archives.postgresql.org/pgsql-jdbc/2003-12/msg00186.php

Thanks Kris - that patch worked beautifully and bumped the
insert rate from ~1000 entries/second to ~9000 e/s in my
test code.

Here's hoping it makes it into 7.5.

I do have a little concern about what's happening in the
back end during the copy - I suspect the entire table is
locked, which may impact the performance when multiple
clients are saving entries into the table.  Anyone know
if that's how COPY works?  (For that matter, would that
also be true of a transaction consisting of a set of
inserts?)

Thanks again!
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] [JDBC] Using a COPY...FROM through JDBC?

2004-06-07 Thread Steve Wampler
On Mon, 2004-06-07 at 10:40, Steve Wampler wrote:

 Thanks Kris - that patch worked beautifully and bumped the
 insert rate from ~1000 entries/second to ~9000 e/s in my
 test code.

As a followup - that 9000 e/s becomes ~21,000 e/s if I don't
have the java code also dump the message to standard output!

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Using a COPY...FROM through JDBC?

2004-06-05 Thread Steve Wampler

I've got a simple database (no indices, 6 columns) that I need
to write data quickly into through JDBC connections from
multiple such connections simultaneously in a distributed
environment.  (This is going to be a message logging service
for software generated messages.)

Using a PreparedStatement, I can get about 400/s inserted.  If I
(on the java side) buffer up the entries and dump them in large
transaction blocks I can push this up to about 1200/s.  I'd
like to go faster.  One approach that I think might be
promising would be to try using a COPY command instead of
an INSERT, but I don't have a file for input, I have a 
Java collection, so COPY isn't quite right.  Is there anyway to
efficiently use COPY without having to create a file (remember
that the java apps are distributed on a LAN and aren't running
on the DB server.)  Is this a dead end because of the way
COPY is implemented to only use a file?

Is there something else I can do?  Ultimately, this will end
up on a machine running 1+0 RAID, so I expect that will give
me some performance boost as well, but I'd like to push it
up as best I can with my current hardware setup.

Thanks for any advice!
-Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Help tracking down problem with inserts slowing

2003-12-08 Thread Steve Wampler
On Sun, 2003-12-07 at 09:52, Tom Lane wrote:
 Steve Wampler [EMAIL PROTECTED] writes:
  Hmmm, I have a feeling that's not as obvious as I thought...  I can't
  identify the index (named 'id_index') in the output of vacuum verbose.
 
 In 7.2, the index reports look like
   Index %s: Pages %u; Tuples %.0f.
 and should appear in the part of the printout that deals with their
 owning table.

Thanks, Tom.  Are there any reasons why it would not appear?:
-
farm.devel.configdb=# vacuum verbose attributes_table;
NOTICE:  --Relation attributes_table--
NOTICE:  Pages 1389: Changed 0, Empty 0; Tup 111358: Vac 0, Keep 0,
UnUsed 51.
Total CPU 0.00s/0.02u sec elapsed 0.03 sec.
NOTICE:  --Relation pg_toast_1743942--
NOTICE:  Pages 0: Changed 0, Empty 0; Tup 0: Vac 0, Keep 0, UnUsed 0.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

farm.devel.configdb=# \d attributes_table
 Table attributes_table
 Column |   Type   |   Modifiers   
+--+---
 id | character varying(64)| not null
 name   | character varying(64)| not null
 units  | character varying(32)| 
 value  | text | 
 time   | timestamp with time zone | default now()
Indexes: id_index
Primary key: attributes_table_pkey
Triggers: trigger_insert
---

The odd thing is that I could have sworn it appeared yesterday...

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help tracking down problem with inserts slowing down...

2003-12-07 Thread Steve Wampler
On Fri, Dec 05, 2003 at 09:54:52PM -0500, Robert Treat wrote:
...
 A vacuum verbose could give you a good indication if you need to reindex, 
 compare the # of pages in the index with the # in the table. 

Hmmm, I have a feeling that's not as obvious as I thought...  I can't
identify the index (named 'id_index') in the output of vacuum verbose.
The closest I can find is:

NOTICE:  --Relation pg_index--
NOTICE:  Pages 2: Changed 0, Empty 0; Tup 56: Vac 0, Keep 0, UnUsed 42.
Total CPU 0.00s/0.00u sec elapsed 0.00 sec.

Which probably isn't correct, right (the name doesn't seem to match)?

The table's entry is:

NOTICE:  --Relation attributes_table--
NOTICE:  Pages 639: Changed 0, Empty 0; Tup 52846: Vac 0, Keep 0, UnUsed 48.
Total CPU 0.00s/0.01u sec elapsed 0.01 sec.

Thanks!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help tracking down problem with inserts slowing

2003-12-07 Thread Steve Wampler
On Fri, 2003-12-05 at 16:38, Neil Conway wrote:

 
 (1) Can you confirm that the VACUUM FULL on site B actually
 removed all the tuples you intended it to remove? Concurrent
 transactions can limit the amount of data that VACUUM FULL is
 able to reclaim. If you run contrib/pgstattuple (or compare
 the database's disk consumption with the number of live rows
 in it), you should be able to tell.

Hmmm, I installed 7.2.3 from RPMs, but the contrib package seems
to be missing the pgstattuple library code.  (According to the
readme, I should do:

$ make
$ make install
$ psql -e -f /usr/local/pgsql/share/contrib/pgstattuple.sql test

but the first two lines don't make sense with the binary rpm
distribution and trying the last line as (for my world):

  -psql -e -f /usr/share/pgsql/contrib/pgstattuple.sql
farm.devel.configdb

yields:

  DROP FUNCTION pgstattuple(NAME);
  psql:/usr/share/pgsql/contrib/pgstattuple.sql:1: ERROR: 
RemoveFunction: function 'pgstattuple(name)' does not exist
  CREATE FUNCTION pgstattuple(NAME) RETURNS FLOAT8
  AS '$libdir/pgstattuple', 'pgstattuple'
  LANGUAGE 'c' WITH (isstrict);
  psql:/usr/share/pgsql/contrib/pgstattuple.sql:4: ERROR:  stat failed
on file '$libdir/pgstattuple': No such file or directory

I don't need this right now (a reindex seems to have fixed
our problem for now...), but it sounds like it would be useful
in the future.

Thanks!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Help tracking down problem with inserts slowing down...

2003-12-05 Thread Steve Wampler

I need some help tracking down a sudden, massive slowdown
in inserts in one of our databases.

PG: 7.2.3  (RedHat 8.0)

Background.  We currently run nearly identical systems
at two sites: Site A is a 'lab' site used for development,
Site B is a production site.

The databases in question have identical structure:

  A simple table with 4 columns with a trigger function
 on inserts (which checks to see if the entry already
 exists, and if so, changes the insert into an update...)
  A simple view with 4 columns into the above table.

All access is through jdbc (JDK 1.3.1, jdbc 7.1-1.3),
postgresql.conf's are identical.

The two sites were performing at comparable speeds until
a few days ago, when we deleted several million records
from each database and then did a vacuum full; analyze
on both.  Now inserts at Site B are several orders of
magnitude slower than at Site A.  The odd thing is that
Site B's DB now has only 60,000 records while Site A's is
up around 3 million.  Inserts at A average 63ms, inserts
at B are now up at 4.5 seconds!

EXPLAIN doesn't show any difference between the two.

Can someone suggest ways to track this down?  I don't know
much about postgresql internals/configuration.

Thanks!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] [SQL] Replacing a simple nested query?

2003-07-14 Thread Steve Wampler
On Sun, 2003-07-13 at 14:50, Steve Wampler wrote:
 I've got a simple nested query:
 
   select * from attributes where id in (select id from
  attributes where (name='obsid') and (value='oid00066'));
 
 that performs abysmally.  I've heard this described as the
 'classic WHERE IN' problem.
 
 Is there a better way to obtain the same results?  The inner
 select identifies a set of ids (2049 of them, to be exact)
 that are then used to locate records that have the same id
 (about 30-40K of those, including the aforementioned 2049).

For the record, Joe Conway and Hannu Krosing both provided
the same solution:

   select at.* from attributes_table at, attributes a
  where at.id = a.id and a.name='obsid' and a.value='oid00066';

which is several orders of infinity faster than than my naive
approach above:
-
lab.devel.configdb=# explain analyze select * from
attributes_table where id in (select id from attributes
where (name='obsid') and (value='oid00066')) order by id;
NOTICE:  QUERY PLAN:

Index Scan using id_index on attributes_table  (cost=0.00..8773703316.10
rows=241201 width=59) (actual time=136297.91..3418016.04 rows=32799
loops=1)
  SubPlan
-  Materialize  (cost=18187.48..18187.48 rows=15 width=25) (actual
time=0.01..1.68 rows=1979 loops=482402)
  -  Index Scan using name_index on attributes_table 
(cost=0.00..18187.48 rows=15 width=25) (actual time=0.27..251.95
rows=2049 loops=1)
Total runtime: 3418035.38 msec
--
lab.devel.configdb=# explain analyze select at.* from
attributes_table at, attributes a
where at.id = a.id and a.name='obsid' and a.value='oid00066';
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..18739.44 rows=217 width=84) (actual
time=0.76..1220.65 rows=32799 loops=1)
  -  Index Scan using name_index on attributes_table 
(cost=0.00..18187.48 rows=15 width=25) (actual time=0.47..507.31
rows=2049 loops=1)
  -  Index Scan using id_index on attributes_table at 
(cost=0.00..35.80 rows=12 width=59) (actual time=0.11..0.31 rows=16
loops=2049)
Total runtime: 1235.42 msec
---

My thanks to both Joe and Hannu!
Steve
-- 
Steve Wampler -- [EMAIL PROTECTED]
Quantum materiae materietur marmota monax si marmota
monax materiam possit materiari?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster