Re: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?)

2005-09-29 Thread Ron Peacetree
>From: "Jeffrey W. Baker" <[EMAIL PROTECTED]>
>Sent: Sep 29, 2005 12:33 AM
>Subject: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?)
>
>On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote:
>>>From: "Jeffrey W. Baker" <[EMAIL PROTECTED]>
>>>Perhaps I believe this because you can now buy as much sequential I/O
>>>as you want.  Random I/O is the only real savings.
>>>
>> 1= No, you can not "buy as much sequential IO as you want".  Even if
>> with an infinite budget, there are physical and engineering limits.  Long
>> before you reach those limits, you will pay exponentially increasing costs
>> for linearly increasing performance gains.  So even if you _can_ buy a
>> certain level of sequential IO, it may not be the most efficient way to
>> spend money.
>
>This is just false.  You can buy sequential I/O for linear money up to
>and beyond your platform's main memory bandwidth.  Even 1GB/sec will
>severely tax memory bandwidth of mainstream platforms, and you can
>achieve this rate for a modest cost.  
>
I don't think you can prove this statement.
A= www.pricewatch.com lists 7200rpm 320GB SATA II HDs for ~$160.
ASTR according to www.storagereview.com is ~50MBps.  Average access
time is ~12-13ms.
Absolute TOTL 15Krpm 147GB U320 or FC HDs cost ~4x as much per GB,
yet only deliver ~80-90MBps ASTR and average access times of 
~5.5-6.0ms.
Your statement is clearly false in terms of atomic raw HD performance.

B= low end RAID controllers can be obtained for a few $100's.  But even
amongst them, a $600+ card does not perform 3-6x better than a
$100-$200 card.  When the low end HW is not enough, the next step in
price is to ~$10K+ (ie Xyratex), and the ones after that are to ~$100K+
(ie NetApps) and ~$1M+ (ie EMC, IBM, etc).  None of these ~10x steps
in price results in a ~10x increase in performance.
Your statement is clearly false in terms of HW based RAID performance.

C= A commodity AMD64 mainboard with a dual channel DDR PC3200
RAM subsystem has 6.4GBps of bandwidth.  These are as common
as weeds and almost as cheap:  www.pricewatch.com
Your statement about commodity systems main memory bandwidth
being "severely taxed at 1GBps" is clearly false.

D= Xyratecs makes RAID HW for NetApps and EMC.  NONE of their
current HW can deliver 1GBps.  More like 600-700MBps.  Engino and
Dot Hill have similar limitations on their current products.  No PCI or
PCI-X based HW could ever do more than ~800-850MBps since
that's the RW limit of those busses.  Next Gen products are likely to
2x those limits and cross the 1GBps barrier based on ~90MBps SAS
or FC HD's and PCI-Ex8 (2GBps max) and PCI-Ex16 (4GBps max).
Note that not even next gen or 2 gens from now RAID HW will be
able to match the memory bandwidth of the current commodity
memory subsystem mentioned in "C" above.
Your statement that one can achieve a HD IO rate that will tax RAM
bandwidth at modest cost is clearly false.
   
QED Your statement is false on all counts and in all respects.


>I have one array that can supply this rate and it has only 15 disks.  It
>would fit on my desk.  I think your dire talk about the limits of
>science and engineering may be a tad overblown.
>
Name it and post its BOM, configuration specs, price and ordering
information.  Then tell us what it's plugged into and all the same
details on _that_.

If all 15 HD's are being used for one RAID set, then you can't be
using RAID 10, which means any claims re: write performance in
particular should be closely examined.

A 15 volume RAID 5 made of the fastest 15Krpm U320 or FC HDs,
each with ~85.9MBps ASTR, could in theory do ~14*85.9=
~1.2GBps raw ASTR for at least reads, but no one I know of makes
commodity RAID HW that can keep up with this, nor can any one
PCI-X bus support it even if such commodity RAID HW did exist.

Hmmm.  SW RAID on at least a PCI-Ex8 bus might be able to do it if
we can multiplex enough 4Gbps FC lines (4Gbps= 400MBps => max
of 4 of the above HDs per line and 4 FC lines) with low enough latency
and have enough CPU driving it...Won't be easy nor cheap though. 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Comparative performance

2005-09-29 Thread PFC


It appears that PostgreSQL is two to three times slower than MySQL.  For  
example, some pages that have some 30,000 characters (when saved as  
HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with  
PostgreSQL.  I had read that the former was generally faster than the  
latter, particularly for simple web applications but I was hoping that  
Postgres' performance would not be that noticeably slower.


	From my experience, the postgres libraries in PHP are a piece of crap,  
and add a lot of overhead even from small queries.
	For instance, a simple query like "SELECT * FROM table WHERE  
primary_key_id=1234" can take the following time, on my laptop, with data  
in the filesystem cache of course :


EXPLAIN ANALYZE <0.1 ms
python + psycopg 2  0.1 ms (damn fast)
php + mysql 0.3 ms
php + postgres  1-2 ms (damn slow)

	So, if your pages are designed in The PHP Way (ie. a large number of  
small queries), I might suggest using a language with a decent postgres  
interface (python, among others), or rewriting your bunches of small  
queries as Stored Procedures or Joins, which will provide large speedups.  
Doing >50 queries on a page is always a bad idea, but it's tolerable in  
php-mysql, not in php-postgres.


	If it's only one large query, there is a problem, as postgres is usually  
a lot smarter about query optimization.


	If you use the usual mysql techniques (like, storing a page counter in a  
row in a table, or storing sessions in a table) beware, these are no-nos  
for postgres, these things should NOT be done with a database anyway, try  
memcached for instance.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Pailloncy Jean-Gerard
Your main example seems to focus on a large table where a key  
column has

constrained values.  This case is interesting in proportion to the
number of possible values.  If I have billions of rows, each  
having one

of only two values, I can think of a trivial and very fast method of
returning the table "sorted" by that key: make two sequential passes,
returning the first value on the first pass and the second value  
on the

second pass.  This will be faster than the method you propose.


1= No that was not my main example.  It was the simplest example  
used to
frame the later more complicated examples.  Please don't get hung  
up on it.


2= You are incorrect.  Since IO is the most expensive operation we  
can do,
any method that makes two passes through the data at top scanning  
speed
will take at least 2x as long as any method that only takes one  
such pass.

You do not get the point.
As the time you get the sorted references to the tuples, you need to  
fetch the tuples themself, check their visbility, etc. and returns  
them to the client.


So,
if there is only 2 values in the column of big table that is larger  
than available RAM,

two seq scans of the table without any sorting
is the fastest solution.

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Magnus Hagander wrote:

That actually depends a lot on *how* you use it. I've seen pg-on-windows
deployments that come within a few percent of the linux performance.
I've also seen those that are absolutely horrible compared.

One sure way to kill the performance is to do a lot of small
connections. Using persistent connection is even more important on
Windows than it is on Unix. It could easily explain a difference like
this.


I just tried using pg_pconnect() and I didn't notice any significant 
improvement.  What bothers me most is that with Postgres I tend to see jerky 
behavior on almost every page:  the upper 1/2 or 2/3 of the page is displayed 
first and you can see a blank bottom (or you can see a half-filled completion 
bar).  With MySQL each page is generally displayed in one swoop.


Joe


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Comparative performance

2005-09-29 Thread Steinar H. Gunderson
On Thu, Sep 29, 2005 at 08:16:11AM -0400, Joe wrote:
> I just tried using pg_pconnect() and I didn't notice any significant 
> improvement.

PHP persistent connections are not really persistent -- or so I've been told.

Anyhow, what was discussed here was pg_query, not pg_connect. You really want
to reduce the number of pg_query() calls in any case; you haven't told us how
many there are yet, but it sounds like there are a lot of them.

> What bothers me most is that with Postgres I tend to see jerky behavior on
> almost every page:  the upper 1/2 or 2/3 of the page is displayed first and
> you can see a blank bottom (or you can see a half-filled completion bar).
> With MySQL each page is generally displayed in one swoop.

This might just be your TCP/IP stack finding out that the rest of the page
isn't likely to come anytime soon, and start sending it out... or something
else. I wouldn't put too much weight on it, it's likely to go away as soon as
you fix the rest of the problem.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Comparative performance

2005-09-29 Thread Gavin Sherry
On Thu, 29 Sep 2005, Joe wrote:

> Magnus Hagander wrote:
> > That actually depends a lot on *how* you use it. I've seen pg-on-windows
> > deployments that come within a few percent of the linux performance.
> > I've also seen those that are absolutely horrible compared.
> >
> > One sure way to kill the performance is to do a lot of small
> > connections. Using persistent connection is even more important on
> > Windows than it is on Unix. It could easily explain a difference like
> > this.
>
> I just tried using pg_pconnect() and I didn't notice any significant
> improvement.  What bothers me most is that with Postgres I tend to see jerky
> behavior on almost every page:  the upper 1/2 or 2/3 of the page is displayed
> first and you can see a blank bottom (or you can see a half-filled completion
> bar).  With MySQL each page is generally displayed in one swoop.

Please post the table definitions, queries and explain analyze results so
we can tell you why the performance is poor.

Gavin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

PFC wrote:
From my experience, the postgres libraries in PHP are a piece of 
crap,  and add a lot of overhead even from small queries.
For instance, a simple query like "SELECT * FROM table WHERE  
primary_key_id=1234" can take the following time, on my laptop, with 
data  in the filesystem cache of course :


EXPLAIN ANALYZE<0.1 ms
python + psycopg 20.1 ms (damn fast)
php + mysql0.3 ms
php + postgres1-2 ms (damn slow)


As a Trac user I was considering moving to Python, so it's good to know that, 
but the rewrite is a longer term project.


So, if your pages are designed in The PHP Way (ie. a large number 
of  small queries), I might suggest using a language with a decent 
postgres  interface (python, among others), or rewriting your bunches of 
small  queries as Stored Procedures or Joins, which will provide large 
speedups.  Doing >50 queries on a page is always a bad idea, but it's 
tolerable in  php-mysql, not in php-postgres.


The pages do use a number of queries to collect all the data for display but 
nowhere near 50.  I'd say it's probably less than a dozen.  As an aside, one of 
my tasks (before the conversion) was to analyze the queries and see where they 
could be tweaked for performance, but with MySQL that was never a top priority.


The schema is fairly simple having two main tables: topic and entry (sort of 
like account and transaction in an accounting scenario).  There are two 
additional tables that perhaps could be merged into the entry table (and that 
would reduce the number of queries) but I do not want to make major changes to 
the schema (and the app) for the PostgreSQL conversion.


Joe


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Gavin Sherry wrote:

Please post the table definitions, queries and explain analyze results so
we can tell you why the performance is poor.


I did try to post that last night but apparently my reply didn't make it to the 
list.  Here it is again:


Matthew Nuzum wrote:

> This is the right list. Post detail and I'm sure you'll get some suggestions.


Thanks, Matthew (and Chris and Gavin).

The main table used in the query is defined as follows:

CREATE TABLE entry (
  entry_id serial PRIMARY KEY,
  title VARCHAR(128) NOT NULL,
  subtitle VARCHAR(128),
  subject_type SMALLINT,
  subject_id INTEGER REFERENCES topic,
  actor_type SMALLINT,
  actor_id INTEGER REFERENCES topic,
  actor VARCHAR(64),
  actor_role VARCHAR(64),
  rel_entry_id INTEGER,
  rel_entry VARCHAR(64),
  description VARCHAR(255),
  quote text,
  url VARCHAR(255),
  entry_date CHAR(10),
  created DATE NOT NULL DEFAULT CURRENT_DATE,
  updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;
CREATE INDEX entry_actor_id ON entry (actor_id);
CREATE INDEX entry_subject_id ON entry (subject_id);

It has 3422 rows at this time.

The query for one of the pages is the following:

SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE subject_id = 1079
UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle;


The output of EXPLAIN ANALYZE is:

 Sort  (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000 
rows=59 loops=1)

   Sort Key: "type", title, subtitle
   ->  Unique  (cost=153.57..157.14 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
 ->  Sort  (cost=153.57..153.73 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
   Sort Key: entry_id, "type", subject_type, subject_id, 
actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, 
subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym, 
date_y, created, updated
   ->  Append  (cost=0.00..151.73 rows=62 width=568) (actual 
time=0.000..16.000 rows=59 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..17.21 rows=4 
width=568) (actual time=0.000..0.000 rows=3 loops=1)
   ->  Index Scan using entry_subject_id on entry 
(cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1)

 Index Cond: (subject_id = 1079)
 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..134.52 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)
   ->  Seq Scan on entry  (cost=0.00..133.94 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)

 Filter: (actor_id = 1079)
 Total runtime: 16.000 ms
(13 rows)

What I don't quite understand is why it's doing a sequential scan on actor_id 
instead of using the entry_actor_id index.  Note that actor_id has 928 non-null 
values (27%), whereas subject_id has 3089 non-null values (90%).


Note that the entry_date column was originally a MySQL date but it had partial 
dates, i.e., some days and months are set to zero.  Eventually I hope to define 
a PostgreSQL datatype for it and to simplify the substring retrievals.  However, 
I don't think the extra computational time should affect the overall runtime 
significantly.


Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and 
I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM).


Thanks for any feedback.

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Andreas Pflug

Joe wrote:




The pages do use a number of queries to collect all the data for display 
but nowhere near 50.  I'd say it's probably less than a dozen.  

The schema is fairly simple having two main tables: topic and entry 
(sort of like account and transaction in an accounting scenario).  There 
are two additional tables that perhaps could be merged into the entry 
table 


Hm, if you only have 4 tables, why do you need 12 queries?
To reduce queries, join them in the query; no need to merge them 
physically. If you have only two main tables, I'd bet you only need 1-2 
queries for the whole page.


Regards,
Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Delete query takes exorbitant amount of time

2005-09-29 Thread Gaetano Mendola
Mark Lewis wrote:
> I imported my test dataset
> and was almost immediately able to track down the cause of my
> performance problem.

Why don't you tell us what the problem was :-) ?

Regards
Gaetano Mendola






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] [HACKERS] Query in SQL statement

2005-09-29 Thread Christopher Kings-Lynne



CREATE SEQUENCE ai_id;
CREATE TABLE badusers (
  id int DEFAULT nextval('ai_id') NOT NULL,
  UserName varchar(30),
  Date  datetime DEFAULT '-00-00 00:00:00' NOT NULL,
  Reason varchar(200),
  Admin varchar(30) DEFAULT '-',
  PRIMARY KEY (id),
  KEY UserName (UserName),
  KEY Date (Date)
);


Am always getting foll. Errors,

ERROR:  relation "ai_id" already exists
ERROR:  syntax error at or near "(" at character 240


You have just copied the Mysql code to Postgresql.  It will in no way 
work.  Your default for 'Date' is illegal in postgresql and hence it 
must allow NULLs.  There is no such thing as a 'datetime' type.  There 
is no such thing as 'Key'.  Also your mixed case identifiers won't be 
preserved.  You want:


CREATE TABLE badusers (
  id SERIAL PRIMARY KEY,
  UserName varchar(30),
  Date  timestamp,
  Reason varchar(200),
  Admin varchar(30) DEFAULT '-'
);

CREATE INDEX UserName_Idx ON badusers(Username);
CREATE INDEX Date_Idx ON badusers(Date);


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Monitoring Postgresql performance

2005-09-29 Thread Gaetano Mendola
Arnau wrote:
> Hi all,
> 
>   I have been "googling" a bit searching info about a way to monitor
> postgresql (CPU & Memory, num processes, ... ) and I haven't found
> anything relevant. I'm using munin to monitor others parameters of my
> servers and I'd like to include postgresql or have a similar tool. Any
> of you is using anything like that? all kind of hints are welcome :-)
> 
> Cheers!

We use Cricket + Nagios ( new Netsaint release ).


Regards
Gaetano Mendola



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Advice on RAID card

2005-09-29 Thread Michael Ben-Nes

I think the answer is simple


if the question is low end Raid card or software ? go on the software 
and youll get better performance.


If this is a high end server i wouldnt think twice. HW RAID is a must 
and not only because the performance but because the easynes ( hot swap 
and such ) and the battery



Ron Peacetree wrote:


While I understand being economical, at some point one crosses the line
to being penny wise and pound foolish.

How much is the data on this server going to be worth?
How much much will it cost you to recover or restore it (assuming that
is even possible if you lose it)?

If your data is worth nothing or the cost to recover or restore it is
negligible, then you don't need (nor should want) a DB server.  You'll
get higher performance at less cost via a number of other methods.

OTOH, if you data _does_ have value by any of the above metrics,
then it is worth it to pay attention to reliable, safe, fast, physical IO.

Battery backed HD caches of appropriate size are usually well worth
the $, as they pay for themselves (and then some) with the first data
loss they prevent.

RAID 5 means you are _always_ only 2 HDs from data loss, and 1 HD
from a serious performance hit.  Part of the trade-off with using SATA
HDs that cost 1/3-1/4 their U320 15Krpm brethren is that such
circumstances are +FAR+ more likely with SATA HDs.  


If you are not going to use RAID 10 because of cost issues, then
spend the $ to get the biggest battery backed cache you can afford
and justify as being cheaper than what the proper RAID 6 or RAID 10
setup would cost you.  Even if you are going to use SW RAID and the
controller will just be a JBOD controller.

On the general subject of costs...  


At this writing, SODIMM RAM costs ~$100 (US) per GB.  Standard
DIMMs cost ~$75 per GB unless you buy 4GB ones, in which case
they cost ~$100 per GB.

The "sweet spot" in SATA HD pricing is ~$160 for 320GB at 7200rpm
(don't buy the 36GB or 74GB WD Raptors, they are no longer worth
it).  If you are careful you can get SATA HD's with 16MB rather than
8MB buffers for that price.  Each such HD will give you ~50MB/s of
raw Average Sustained Transfer Rate.

Decent x86 compatible CPUs are available for ~$200-$400 apiece.
Rarely will a commodity HW DB server need a more high end CPU.

Some of the above numbers rate to either fall to 1/2 cost or 2x in value
for the dollar within the next 6-9 months, and all of them will within the
next 18 months.  And so will RAID controller costs.

Your salary will hopefully not degrade at that rate, and it is unlikely that
your value for the dollar will increase at that rate.  Nor is it likely that
data worth putting on a DB server will do so.

Figure out what your required performance and reliability for the next 18
months is going to be, and buy the stuff from the above list that will
sustain that.  No matter what.

Anything less rates _highly_ to end up costing you and your organization
more money within the next 18months than you will "save" in initial
acquisition cost.

Ron





-Original Message-
From: PFC <[EMAIL PROTECTED]>
Sent: Sep 24, 2005 12:27 PM
Subject: Re: [PERFORM] Advice on RAID card


 


It looks like a rebranded low end Adaptec 64MB PCI-X <-> SATA RAID card.
Looks like the 64MB buffer is not upgradable.
Looks like it's SATA, not SATA II
   



	Yeah, that's exactly what it is. I can get one for 150 Euro, the Areca is  
at least 600. This is for a budget server so while it would be nice to  
have all the high-tech stuff, it's not the point. My question was raher,  
is it one of the crap RAID5 cards which are actually SLOWER than plain IDE  
disks, or is it decent, even though low-end (and cheap), and worth it  
compared to software RAID5 ?


 


Assuming you are not building 1U boxes, get one of the full height
cards and order it with the maximum size buffer you can afford.
The cards take 1 SODIMM, so that will be a max of 1GB or 2GB
depending on whether 2GB SODIMMs are available to you yet.
   



	It's for a budget dev server which should have RAID5 for reliability, but  
not necessarily stellar performance (and price). I asked about this card  
because I can get one at a good price.


Thanks for taking the time to answer.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org
 



--
--
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Cel: 972-52-8555757
Fax: 972-4-6990098
http://www.canaan.net.il
--


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-29 Thread PFC



I just tried using pg_pconnect() and I didn't notice any significant  
improvement.  What bothers me most is that with Postgres I tend to see  
jerky behavior on almost every page:  the upper 1/2 or 2/3 of the page  
is displayed first and you can see a blank bottom (or you can see a  
half-filled completion bar).  With MySQL each page is generally  
displayed in one swoop.


	Persistent connections are useful when your page is fast and the  
connection time is an important part of your page time. It is mandatory if  
you want to serve more than 20-50 hits/s without causing unnecessary load  
on the database. This is not your case, which is why you don't notice any  
improvement...


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread PFC


Just to add a little anarchy in your nice debate...

Who really needs all the results of a sort on your terabyte table ?

	I guess not many people do a SELECT from such a table and want all the  
results. So, this leaves :

- Really wanting all the results, to fetch using a cursor,
- CLUSTER type things, where you really want everything in order,
	- Aggregates (Sort->GroupAggregate), which might really need to sort the  
whole table.
	- Complex queries where the whole dataset needs to be examined, in order  
to return a few values

- Joins (again, the whole table is probably not going to be selected)
- And the ones I forgot.

However,

Most likely you only want to SELECT N rows, in some ordering :
- the first N (ORDER BY x LIMIT N)
- last N (ORDER BY x DESC LIMIT N)
- WHERE x>value ORDER BY x LIMIT N
- WHERE x	Or, you are doing a Merge JOIN against some other table ; in that case,  
yes, you might need the whole sorted terabyte table, but most likely there  
are WHERE clauses in the query that restrict the set, and thus, maybe we  
can get some conditions or limit values on the column to sort.


	Also the new, optimized hash join, which is more memory efficient, might  
cover this case.


	Point is, sometimes, you only need part of the results of your sort. And  
the bigger the sort, the most likely it becomes that you only want part of  
the results. So, while we're in the fun hand-waving, new algorithm trying  
mode, why not consider this right from the start ? (I know I'm totally in  
hand-waving mode right now, so slap me if needed).


I'd say your new, fancy sort algorithm needs a few more input values :

- Range of values that must appear in the final result of the sort :
		none, minimum, maximum, both, or even a set of values from the other  
side of the join, hashed, or sorted.

- LIMIT information (first N, last N, none)
	- Enhanced Limit information (first/last N values of the second column to  
sort, for each value of the first column) (the infamous "top10 by  
category" query)

- etc.

	With this, the amount of data that needs to be kept in memory is  
dramatically reduced, from the whole table (even using your compressed  
keys, that's big) to something more manageable which will be closer to the  
size of the final result set which will be returned to the client, and  
avoid a lot of effort.


	So, this would not be useful in all cases, but when it applies, it would  
be really useful.


Regards !


















---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Comparative performance

2005-09-29 Thread PFC



  Total runtime: 16.000 ms


	Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

	Can you try pg_query'ing this exact same query, FROM PHP, and timing it  
with getmicrotime() ?


	You can even do an EXPLAIN ANALYZE from pg_query and display the results  
in your webpage, to check how long the query takes on the server.


You can also try it on a Linux box.

This smells like a TCP communication problem.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Josh Berkus

Jeff, Ron,

First off, Jeff, please take it easy.  We're discussing 8.2 features at 
this point and there's no reason to get stressed out at Ron.  You can 
get plenty stressed out when 8.2 is near feature freeze.  ;-)



Regarding use cases for better sorts:

The biggest single area where I see PostgreSQL external sort sucking is 
 on index creation on large tables.   For example, for free version of 
TPCH, it takes only 1.5 hours to load a 60GB Lineitem table on OSDL's 
hardware, but over 3 hours to create each index on that table.  This 
means that over all our load into TPCH takes 4 times as long to create 
the indexes as it did to bulk load the data.


Anyone restoring a large database from pg_dump is in the same situation. 
 Even worse, if you have to create a new index on a large table on a 
production database in use, because the I/O from the index creation 
swamps everything.


Following an index creation, we see that 95% of the time required is the 
external sort, which averages 2mb/s.  This is with seperate drives for 
the WAL, the pg_tmp, the table and the index.  I've confirmed that 
increasing work_mem beyond a small minimum (around 128mb) had no benefit 
on the overall index creation speed.



--Josh Berkus



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Luke Lonergan
Josh,

On 9/29/05 9:54 AM, "Josh Berkus"  wrote:

> Following an index creation, we see that 95% of the time required is the
> external sort, which averages 2mb/s.  This is with seperate drives for
> the WAL, the pg_tmp, the table and the index.  I've confirmed that
> increasing work_mem beyond a small minimum (around 128mb) had no benefit
> on the overall index creation speed.

Yp!  That about sums it up - regardless of taking 1 or 2 passes through
the heap being sorted, 1.5 - 2 MB/s is the wrong number.  This is not
necessarily an algorithmic problem, but is a optimization problem with
Postgres that must be fixed before it can be competitive.

We read/write to/from disk at 240MB/s and so 2 passes would run at a net
rate of 120MB/s through the sort set if it were that efficient.

Anyone interested in tackling the real performance issue? (flame bait, but
for a worthy cause :-)

- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Jeffrey W. Baker
On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote:
> Josh,
> 
> On 9/29/05 9:54 AM, "Josh Berkus"  wrote:
> 
> > Following an index creation, we see that 95% of the time required is the
> > external sort, which averages 2mb/s.  This is with seperate drives for
> > the WAL, the pg_tmp, the table and the index.  I've confirmed that
> > increasing work_mem beyond a small minimum (around 128mb) had no benefit
> > on the overall index creation speed.
> 
> Yp!  That about sums it up - regardless of taking 1 or 2 passes through
> the heap being sorted, 1.5 - 2 MB/s is the wrong number.

Yeah this is really bad ... approximately the speed of GNU sort.

Josh, do you happen to know how many passes are needed in the multiphase
merge on your 60GB table?

Looking through tuplesort.c, I have a couple of initial ideas.  Are we
allowed to fork here?  That would open up the possibility of using the
CPU and the I/O in parallel.  I see that tuplesort.c also suffers from
the kind of postgresql-wide disease of calling all the way up and down a
big stack of software for each tuple individually.  Perhaps it could be
changed to work on vectors.

I think the largest speedup will be to dump the multiphase merge and
merge all tapes in one pass, no matter how large M.  Currently M is
capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over
the tape.  It could be done in a single pass heap merge with N*log(M)
comparisons, and, more importantly, far less input and output.

I would also recommend using an external processes to asynchronously
feed the tuples into the heap during the merge.

What's the timeframe for 8.2?

-jwb




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Josh Berkus
Jeff,

> Josh, do you happen to know how many passes are needed in the multiphase
> merge on your 60GB table?

No, any idea how to test that?

> I think the largest speedup will be to dump the multiphase merge and
> merge all tapes in one pass, no matter how large M.  Currently M is
> capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over
> the tape.  It could be done in a single pass heap merge with N*log(M)
> comparisons, and, more importantly, far less input and output.

Yes, but the evidence suggests that we're actually not using the whole 1GB 
of RAM ... maybe using only 32MB of it which would mean over 200 passes 
(I'm not sure of the exact match).  Just fixing our algorithm so that it 
used all of the work_mem permitted might improve things tremendously.

> I would also recommend using an external processes to asynchronously
> feed the tuples into the heap during the merge.
>
> What's the timeframe for 8.2?

Too far out to tell yet.  Probably 9mo to 1 year, that's been our history.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] SQL Function performance

2005-09-29 Thread adnandursun
Hi All,
 
I have a SQL function like :
 
CREATE OR REPLACE FUNCTION
fn_get_yetkili_inisyer_listesi(int4, int4)
  RETURNS SETOF kod_adi_liste_type AS
$BODY$
 SELECT Y.KOD,Y.ADI
   FROM T_YER Y
  WHERE EXISTS (SELECT 1
FROM T_GUZER G
  WHERE (G.BIN_YER_KOD = $1 OR COALESCE($1,0)=0)
 AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,$2) = 1
 AND G.IN_YER_KOD = Y.KOD)
AND Y.IPTAL = 'H';
$BODY$
  LANGUAGE 'sql' VOLATILE;

When i use like "SELECT * FROM
fn_get_yetkili_inisyer_listesi(1, 3474)" and 
planner result is  "Function Scan on
fn_get_yetkili_inisyer_listesi  (cost=0.00..12.50 rows=1000
width=36) (1 row) " and it runs very slow.
 
But when i use like 

"SELECT Y.KOD,Y.ADI
 FROM T_YER Y
   WHERE EXISTS (SELECT 1
FROM T_GUZER G
  WHERE (G.BIN_YER_KOD
= 1 OR COALESCE(1,0)=0)
 AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,3474) = 1
 AND G.IN_YER_KOD = Y.KOD)
 AND Y.IPTAL = 'H';" 

planner result :

" 
 QUERY PLAN
 

-
 Seq Scan on t_yer y  (cost=0.00..3307.79 rows=58 width=14)
   Filter: (((iptal)::text = 'H'::text) AND (subplan))
   SubPlan
 ->  Index Scan using
t_guzer_ucret_giris_performans_idx on t_guzer g  (cost
=0.00..28.73 rows=1 width=0)
   Index Cond: ((bin_yer_kod = 1) AND (in_yer_kod =
$0))
   Filter: (fn_firma_isvisible(firma_no, 3474) = 1)
(6 rows)
"
  and it runs very fast.

Any idea ?

Adnan DURSUN
ASRIN Bilişim Hiz.Ltd.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Monitoring Postgresql performance

2005-09-29 Thread Tony Wasson
On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote:
> On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote:
> > Hi all,
> >
> >I have been "googling" a bit searching info about a way to monitor
> > postgresql (CPU & Memory, num processes, ... ) and I haven't found
> > anything relevant. I'm using munin to monitor others parameters of my
> > servers and I'd like to include postgresql or have a similar tool. Any
> > of you is using anything like that? all kind of hints are welcome :-)

We are also using cricket + nagios.

On each DB server: Setup snmpd and use snmpd.conf to set disk quotas
and mark processes that need to be running  (like
postmaster,syslog,sshd)

On the monitoring server(s): Use cricket for long term trends &
graphs. Use nagios for current status and alerting and some trending.
(Nagios has plugins over SNMP for load,cpu,memory,disk and processes)

Here's the nagios plugins I have hacked up over the past few months
and what they do. I'd imagine some could use better names. I can
provide these of package them up if anyone is interested.

check_pgconn.pl - Shows percentage of connections available. It uses
"SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It
can also alert when less than a certain number of connections are
available.

check_pgqueries.pl - If you have query logging enabled this summarizes
the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER
,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been
running longer than 5 minutes (configurable).

check_pglocks.pl - Look for locks that block and for baselining lock activity.

check_pgtime.pl - Makes sure that postgresql's time is in sync with
the monitoring server.

check_pgqueries.pl - Whines if any queries are in the "waiting" state.
The script that runs on each DB server does "ps auxww | grep postgres
| grep -i "[W]aiting"" and exposes that through SNMP using the exec
functionality. Nagios then alerts if queries are being blocked.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

Andreas Pflug wrote:

Hm, if you only have 4 tables, why do you need 12 queries?
To reduce queries, join them in the query; no need to merge them 
physically. If you have only two main tables, I'd bet you only need 1-2 
queries for the whole page.


There are more than four tables and the queries are not functionally 
overlapping.  As an example, allow me to refer to the page 
www.freedomcircle.com/topic.php/Economists.


The top row of navigation buttons (Life, Liberty, etc.) is created from a query 
of the 'topic' table.  It could've been hard-coded as a PHP array, but with less 
flexibility.  The alphabetical links are from a SELECT DISTINCT substring from 
topic.  It could've been generated by a PHP for loop (originally implemented 
that way) but again with less flexibility.  The listing of economists is another 
SELECT from topic.  The subheadings (Articles, Books) come from a SELECT of an 
entry_type table --which currently has 70 rows-- and is read into a PHP array 
since we don't know what headings will be used in a given page.  The detail of 
the entries comes from that query that I posted earlier, but there are three 
additional queries that are used for specialized entry types (relationships 
between topics --e.g., Prof. Williams teaches at George Mason, events, and 
multi-author or multi-subject articles and books).  And there's yet another 
table for the specific book information.  Once the data is retrieved it's sorted 
internally with PHP, at the heading level, before display.


Maybe there is some way to merge all the queries (some already fairly complex) 
that fetch the data for the entries box but I believe it would be a monstrosity 
with over 100 lines of SQL.


Thanks,

Joe


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Joe

PFC wrote:
Even though this query isn't that optimized, it's still only 16  
milliseconds.

Why does it take this long for PHP to get the results ?

Can you try pg_query'ing this exact same query, FROM PHP, and timing 
it  with getmicrotime() ?


Thanks, that's what I was looking for.  It's microtime(), BTW.  It'll take me 
some time to instrument it, but that way I can pinpoint what is really slow.


You can even do an EXPLAIN ANALYZE from pg_query and display the 
results  in your webpage, to check how long the query takes on the server.


You can also try it on a Linux box.


My current host only supports MySQL.  I contacted hub.org to see if they could 
assist in this transition but I haven't heard back.



This smells like a TCP communication problem.


I'm puzzled by that remark.  How much does TCP get into the picture in a local 
Windows client/server environment?


Joe


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Monitoring Postgresql performance

2005-09-29 Thread Juraj Holtak
Hi,

impressive

But you forgot to include those scipts as attachment or they got lost
somehow ;-)

could you post them (again)?

thanx,
Juraj

Am Donnerstag, den 29.09.2005, 13:02 -0700 schrieb Tony Wasson:
> On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote:
> > On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote:
> > > Hi all,
> > >
> > >I have been "googling" a bit searching info about a way to monitor
> > > postgresql (CPU & Memory, num processes, ... ) and I haven't found
> > > anything relevant. I'm using munin to monitor others parameters of my
> > > servers and I'd like to include postgresql or have a similar tool. Any
> > > of you is using anything like that? all kind of hints are welcome :-)
> 
> We are also using cricket + nagios.
> 
> On each DB server: Setup snmpd and use snmpd.conf to set disk quotas
> and mark processes that need to be running  (like
> postmaster,syslog,sshd)
> 
> On the monitoring server(s): Use cricket for long term trends &
> graphs. Use nagios for current status and alerting and some trending.
> (Nagios has plugins over SNMP for load,cpu,memory,disk and processes)
> 
> Here's the nagios plugins I have hacked up over the past few months
> and what they do. I'd imagine some could use better names. I can
> provide these of package them up if anyone is interested.
> 
> check_pgconn.pl - Shows percentage of connections available. It uses
> "SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It
> can also alert when less than a certain number of connections are
> available.
> 
> check_pgqueries.pl - If you have query logging enabled this summarizes
> the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER
> ,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been
> running longer than 5 minutes (configurable).
> 
> check_pglocks.pl - Look for locks that block and for baselining lock activity.
> 
> check_pgtime.pl - Makes sure that postgresql's time is in sync with
> the monitoring server.
> 
> check_pgqueries.pl - Whines if any queries are in the "waiting" state.
> The script that runs on each DB server does "ps auxww | grep postgres
> | grep -i "[W]aiting"" and exposes that through SNMP using the exec
> functionality. Nagios then alerts if queries are being blocked.
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Monitoring Postgresql performance

2005-09-29 Thread Cosimo Streppone

Arnau wrote:


Hi all,

  I have been "googling" a bit searching info about a way to monitor 
postgresql (CPU & Memory, num processes, ... ) and I haven't found 
anything relevant. I'm using munin to monitor others parameters of my 
servers and I'd like to include postgresql or have a similar tool. Any 
of you is using anything like that? all kind of hints are welcome :-)


Probably, as you said, this is not so much relevant,
as it is something at *early* stages of usability :-)
but have you looked at pgtop?

The basic requirement is that you enable your postmaster
stats collector and query command strings.

Here is the first announcement email:
  http://archives.postgresql.org/pgsql-announce/2005-05/msg0.php

And its home on the CPAN:
  http://search.cpan.org/dist/pgtop/pgtop

--
Cosimo


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] How to Trigger An Automtic Vacuum on Selected Tables

2005-09-29 Thread Lane Van Ingen
I am running version 8.0.1 on Windows 2003. I have an application that
subjects PostgreSQL to sudden bursts of activity at times which cannot be
predicted. The bursts are significant enough to cause performance
degradation, which can be fixed by a 'vacuum analyze'.

I am aware of the existence and contents of tables like pg_class.

QUESTION: I would like to trigger a vacuum analyze process on a table
whenever it gets a large enough burst of activity to warrant it. Using the
data in pg_class (like the number of pages the system found the last time it
was vacuumed / analyzed), I would like to compare those statistics to
current size, and trigger a vacuum/analyze on a table if needed.

Does anyone know of any available tools, or an approach I could use, to
determine what the CURRENT SIZE is ?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Is There Any Way ....

2005-09-29 Thread Lane Van Ingen
... to do the following:
  (1) Make a table memory-resident only ?
  (2) Set up user variables in memory that are persistent across all
sessions, for
  as long as the database is up and running ?
  (3) Assure that a disk-based table is always in memory (outside of keeping
it in
  memory buffers as a result of frequent activity which would prevent
LRU
  operations from taking it out) ?



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables

2005-09-29 Thread Dario
Autovacuum does exactly what I understood you want :-)

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Lane Van
Ingen
Enviado el: jueves, 29 de septiembre de 2005 20:06
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables


I am running version 8.0.1 on Windows 2003. I have an application that
subjects PostgreSQL to sudden bursts of activity at times which cannot be
predicted. The bursts are significant enough to cause performance
degradation, which can be fixed by a 'vacuum analyze'.

I am aware of the existence and contents of tables like pg_class.

QUESTION: I would like to trigger a vacuum analyze process on a table
whenever it gets a large enough burst of activity to warrant it. Using the
data in pg_class (like the number of pages the system found the last time it
was vacuumed / analyzed), I would like to compare those statistics to
current size, and trigger a vacuum/analyze on a table if needed.

Does anyone know of any available tools, or an approach I could use, to
determine what the CURRENT SIZE is ?



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Is There Any Way ....

2005-09-29 Thread Dario
1) AFAIK, no. Just in case you are thinking "There should be a way coz I
know it will be used all the time", you must know that postgresql philosophy
is "I'm smarter than you". If table is used all the time, it will be in
memory, if not, it won't waste memory.
2) don't know.
3) see number 1) Of course, you could run into a pathological case where
table is queried just before being taken out of memory. But it means, the
table isn't queried all the time...

Greetings...



-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Lane Van
Ingen
Enviado el: jueves, 29 de septiembre de 2005 20:21
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] Is There Any Way 


... to do the following:
  (1) Make a table memory-resident only ?
  (2) Set up user variables in memory that are persistent across all
sessions, for
  as long as the database is up and running ?
  (3) Assure that a disk-based table is always in memory (outside of keeping
it in
  memory buffers as a result of frequent activity which would prevent
LRU
  operations from taking it out) ?



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Is There Any Way ....

2005-09-29 Thread mudfoot
Quoting Lane Van Ingen <[EMAIL PROTECTED]>:

> ... to do the following:
>   (1) Make a table memory-resident only ?

Put it on a RAM filesystem.  On Linux, shmfs.  On *BSD, mfs.  Solaris, tmpfs.

>   (2) Set up user variables in memory that are persistent across all
> sessions, for
>   as long as the database is up and running ?

This sounds like a client thing?  Dunno.

>   (3) Assure that a disk-based table is always in memory (outside of
> keeping
> it in
>   memory buffers as a result of frequent activity which would prevent
> LRU
>   operations from taking it out) ?
> 

Put on RAM fs (like question 1).

Basically, RAM filesystems are on RAM, meaning you need to have enough physical
memory to support them.  And of course their contents completely disappear
between reboots, so you'll need a way to populate them on bootup and make sure
that your updates go to a real nonvolatile storage medium (like disks).  And you
might get swapping on some types of memory filesystems--Solaris' tmpfs is carved
out of virtual memory, which means it will cause swapping if tmpfs contents plus
the rest of your applications exceed physical memory.

> 
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend
> 



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Is There Any Way ....

2005-09-29 Thread Steinar H. Gunderson
On Thu, Sep 29, 2005 at 07:21:08PM -0400, Lane Van Ingen wrote:
>   (1) Make a table memory-resident only ?

You might want to look into memcached, but it's impossible to say whether it
will fit your needs or not without more details.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Ron Peacetree
>From: Pailloncy Jean-Gerard <[EMAIL PROTECTED]>
>Sent: Sep 29, 2005 7:11 AM
>Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>>>Jeff Baker:
>>>Your main example seems to focus on a large table where a key  
>>>column has constrained values.  This case is interesting in
>>>proportion to the number of possible values.  If I have billions
>>>of rows, each having one of only two values, I can think of a
>>>trivial and very fast method of returning the table "sorted" by
>>>that key: make two sequential passes, returning the first value
>>>on the first pass and the second value on the second pass.
>>> This will be faster than the method you propose.
>>
>>Ron Peacetree:
>>1= No that was not my main example.  It was the simplest example  
>>used to frame the later more complicated examples.  Please don't
>>get hung up on it.
>>
>>2= You are incorrect.  Since IO is the most expensive operation we  
>>can do, any method that makes two passes through the data at top
>>scanning speed will take at least 2x as long as any method that only
>>takes one such pass.
>
>You do not get the point.
>As the time you get the sorted references to the tuples, you need to  
>fetch the tuples themself, check their visbility, etc. and returns  
>them to the client.
>
As PFC correctly points out elsewhere in this thread, =maybe= you
have to do all that.  The vast majority of the time people are not
going to want to look at a detailed record by record output of that
much data.

The most common usage is to calculate or summarize some quality
or quantity of the data and display that instead or to use the tuples
or some quality of the tuples found as an intermediate step in a
longer query process such as a join.

Sometimes there's a need to see _some_ of the detailed records; a
random sample or a region in a random part of the table or etc.
It's rare that there is a RW need to actually list every record in a
table of significant size.

On the rare occasions where one does have to return or display all
records in such large table, network IO and/or display IO speeds
are the primary performance bottleneck.  Not HD IO.

Nonetheless, if there _is_ such a need, there's nothing stopping us
from rearranging the records in RAM into sorted order in one pass
through RAM (using at most space for one extra record) after
constructing the cache conscious Btree index.  Then the sorted
records can be written to HD in RAM buffer sized chunks very
efficiently.  
Repeating this process until we have stepped through the entire
data set will take no more HD IO than one HD scan of the data
and leave us with a permanent result that can be reused for
multiple purposes.  If the sorted records are written in large
enough chunks, rereading them at any later time can be done
at maximum HD throughput

In a total of two HD scans (one to read the original data, one
to write out the sorted data) we can make a permanent
rearrangement of the data.  We've essentially created a 
cluster index version of the data.


>So, if there is only 2 values in the column of big table that is larger  
>than available RAM, two seq scans of the table without any sorting
>is the fastest solution.
>
If you only need to do this once, yes this wins.  OTOH, if you have
to do this sort even twice, my method is better.

regards,
Ron  

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Ron Peacetree
>From: Zeugswetter Andreas DAZ SD <[EMAIL PROTECTED]>
>Sent: Sep 29, 2005 9:28 AM
>Subject: RE: [HACKERS] [PERFORM] A Better External Sort?
>
>>In my original example, a sequential scan of the 1TB of 2KB 
>>or 4KB records, => 250M or 500M records of data, being sorted 
>>on a binary value key will take ~1000x more time than reading 
>>in the ~1GB Btree I described that used a Key+RID (plus node 
>>pointers) representation of the data.
>
>Imho you seem to ignore the final step your algorithm needs of
>collecting the data rows. After you sorted the keys the collect
>step will effectively access the tuples in random order (given a 
>sufficiently large key range).
>
"Collecting" the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Luke Lonergan
Jeff,

On 9/29/05 10:44 AM, "Jeffrey W. Baker" <[EMAIL PROTECTED]> wrote:

> On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote:
> Looking through tuplesort.c, I have a couple of initial ideas.  Are we
> allowed to fork here?  That would open up the possibility of using the
> CPU and the I/O in parallel.  I see that tuplesort.c also suffers from
> the kind of postgresql-wide disease of calling all the way up and down a
> big stack of software for each tuple individually.  Perhaps it could be
> changed to work on vectors.

Yes!
 
> I think the largest speedup will be to dump the multiphase merge and
> merge all tapes in one pass, no matter how large M.  Currently M is
> capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over
> the tape.  It could be done in a single pass heap merge with N*log(M)
> comparisons, and, more importantly, far less input and output.

Yes again, see above.
 
> I would also recommend using an external processes to asynchronously
> feed the tuples into the heap during the merge.

Simon Riggs is working this idea a bit - it's slightly less interesting to
us because we already have a multiprocessing executor.  Our problem is that
4 x slow is still far too slow.

> What's the timeframe for 8.2?

Let's test it out in Bizgres!
 
- Luke



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-29 Thread Ron Peacetree
>From: Josh Berkus 
>Sent: Sep 29, 2005 12:54 PM
>Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>
>The biggest single area where I see PostgreSQL external
>sort sucking is on index creation on large tables.   For
>example, for free version of TPCH, it takes only 1.5 hours to
>load a 60GB Lineitem table on OSDL's hardware, but over 3
>hours to create each index on that table.  This means that
>over all our load into TPCH takes 4 times as long to create 
>the indexes as it did to bulk load the data.
>
Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.

Creating the table and then creating the indexes on the table
is going to require more physical IO than if we created the
table and the indexes concurrently in chunks and then
combined the indexes on the chunks into the overall indexes
for the whole table, so there's a potential speed-up.

The method I've been talking about is basically a recipe for
creating indexes as fast as possible with as few IO operations,
HD or RAM, as possible and nearly no random ones, so it
could help as well.

OTOH, HD IO rate is the fundamental performance metric.
As long as our HD IO rate is pessimal, so will the performance
of everything else be.   Why can't we load a table at closer to
the peak IO rate of the HDs?   


>Anyone restoring a large database from pg_dump is in the
>same situation.  Even worse, if you have to create a new
>index on a large table on a production database in use,
>because the I/O from the index creation swamps everything.
>
Fix for this in the works ;-)


>Following an index creation, we see that 95% of the time
>required is the external sort, which averages 2mb/s.
>
Assuming decent HD HW, this is HORRIBLE.

What's kind of instrumenting and profiling has been done of
the code involved?


>This is with seperate drives for the WAL, the pg_tmp, the table
>and the index.  I've confirmed that increasing work_mem 
>beyond a small minimum (around 128mb) had no benefit on
>the overall index creation speed.
>
No surprise.  The process is severely limited by the abyssmally
slow HD IO.

Ron

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Comparative performance

2005-09-29 Thread Magnus Hagander
> > This smells like a TCP communication problem.
> 
> I'm puzzled by that remark.  How much does TCP get into the 
> picture in a local Windows client/server environment?

Windows has no Unix Domain Sockets (no surprise there), so TCP
connections over the loopback interface are used to connect to the
server.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster