Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Fri, 8 Aug 2003, mixo wrote:

 I have just installed redhat linux 9 which ships with Pg
 7.3.2. Pg has to be setup so that data inserts (blobs) should
 be able to handle at least 8M at a time.

Nothing has to be done to tune postgresql to handle this, 8 Meg blobs are 
no problem as far as I know.

 The machine has
 two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
 a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
 What would be the recomended setup for good performance
 considering that the db will have about 15 users for
 9 hours in a day, and about 10 or so users throughout the day
 who wont be conistenly using the db.

Seeing as you have only one hard drive, how you arrange things on it 
doesn't really make a big difference.   If you can get another drive and 
mirror your data partition that will help speed up selects as well as 
provide some redundancy should one drive fail.

How many queries per second are you looking at handling?  If it's 1 or 
less, you probably don't have much to worry about with this setup.  We run 
dual PIII-750s at work with 1.5 Gig ram, and while we're going to upgrade 
the servers (they're currently handling apache/php/postgresql  ldap) 
we'll keep the dual PIII-750 machines as the database boxes with nothing 
else on them.  Postgresql is quite snappy on such hardware.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Shridhar Daithankar
On 11 Aug 2003 at 23:42, Ron Johnson wrote:

 On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:
   Well, yeah.  But given the Linux propensity for introducing major
   features in minor releases (and thereby introducing all the
   attendant bugs), I'd think twice about using _any_ Linux feature
   until it's been through a major version (e.g. things introduced in
   2.4.x won't really be stable until 2.6.x) -- and even there one is
   taking a risk[1].
  
  Dudes, seriously - switch to FreeBSD :P
 
 But, like, we want a *good* OS... 8-0

Joke aside, I guess since postgresql is pretty much reliant on file system for 
basic file functionality, I guess it's time to test Linux 2.6 and compare it.

And don't forget, for large databases, there is still XFS out there which is 
probably the ruler at upper end..

Bye
 Shridhar

--
Unfair animal names:-- tsetse fly   -- bullhead-- booby
 -- duck-billed 
platypus-- sapsucker-- Clarence -- Gary Larson


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] How can I Improve performance in Solaris?

2003-08-14 Thread scott.marlowe
On Wed, 13 Aug 2003, ingrid martinez wrote:

 Floes table looks like this
 
 Table flows
 Column|   Type   | Modifiers
 --+--+---
  flidload | bigint   | not null
  firsttime| bigint   |
  fldestpeeraddress| character varying(30)|
  fldesttransaddress   | bigint   |
  fldesttranstype  | smallint |
  fldfromoctets| bigint   |
  fldscodepoint| smallint |
  fldtooctets  | bigint   |
  flfrompdus   | bigint   |
  flid | text |
  flidrule | bigint   |
  flsourcepeeraddress  | character varying(30)|
  flsourcetransaddress | bigint   |
  flsourcetranstype| smallint |
  fltime   | timestamp with time zone |
  fltopdus | bigint   |
  lasttime | bigint   |
  sourceinterface  | smallint |
  destinterface| smallint |
  sourceasn| smallint |
  destasn  | smallint |
 Primary key: flows_pkey

Which columns are in the pkey?


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Josh Berkus
Jeff,

 Informix, etc. have spent a lot of time and money working on it.
 They also have the advantage of having many paid fulltime
 developers who are doing this for a job, not as a weekend hobby
 (Compared to the what? 2-3 full time PG developers).

I think 4-6 full-time, actually, plus about 200 part-time contributors.  Which 
adds up to a bloody *lot* of code if you monitor pgsql-patches between 
versions.  The only development advantage the commercials have over us is the 
ability to engage in large projects (e.g. replication, raw filesystems, etc.) 
that are difficult for a distributed network of people.

 The other advantage (which I hinted to above) with raw disks is being able
 to optimize queries to take advantage of it.  Informix is multithreaded
 and it will spawn off multiple readers to do say, a seq scan (and merge
 the results at the end).

I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm  we'd 
need to see some tests demonstrating that this approach was still a technical 
advantage given the improvements in RAID  and FS technology since Informix 
was designed.

As I have said elsewhere, Informix is probably a poor database to emulate 
since they are effectively an old dead-end fork of the Ingres/Postgres code, 
and have already been mined for most of the improvements they made.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Browne
Martha Stewart called it a Good Thing [EMAIL PROTECTED] (Gregory S. Williamson)wrote:
 FWIW, Informix can be run using a cooked (Unix) file for storing
 data or it uses raw disk space and bypasses the ordinary (high
 level) UNIX controllers and does its own reads/writes. About 10
 times faster and safer. Of course, itmay have taken a lot of
 programmer time to make that solid. But the performance gains are
 significant.

Are you _certain_ that's still true?  Have you a metric that shows
Informix being 10x faster on a modern system?  That would be quite
surprising...

It may have been true on '80s style UFS implementations, but a couple
of decades have passed, and pretty much any Unix system has new
selections of filesystems that probably aren't so much slower.

It could conceivably be an interesting idea to implement a
block-oriented filesystem where the granularity of files was 8K (or
some such number :-)).

Oracle seems to have done something vaguely like this...
http://otn.oracle.com/tech/linux/open_source.html

But long and short is that the guys implementing OSes have been
putting a LOT of effort into making the potential performance gains of
using raw partitions less and less.
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/sap.html
(eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
  ; will pick up previous value to var set!-ed,
  ; the unassigned object.
-- from BBN-CL's cl-parser.scm

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Yaroslav Mazurak
		Hi, All!

Richard Huxton wrote:

On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote:

Richard Huxton wrote:

On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:

PG's memory use can be split into four areas (note - I'm not a developer so 
this could be wrong).
1. Shared memory - vital so that different connections can communicate with 
each other. Shouldn't be too large, otherwise PG spends too long managing its 
shared memory rather than working on your queries.
2. Sort memory - If you have to sort results during a query it will use up to 
the amount you define in sort_mem and then use disk if it needs any more. 
This is for each sort.
3. Results memory - If you're returning 8000 rows then PG will assemble these 
and send them to the client which also needs space to store the 8000 rows.
4. Working memory - to actually run the queries - stack and heap space to keep 
track of its calculations etc.
	Hence, total free RAM - shared_buffers - k * sort_mem - 
effective_cache_size == (results memory + working memory)?

For the moment, I'd leave the settings roughly where they are while we look at 
the query, then once that's out of the way we can fine-tune the settings.
	OK.

Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)),
used for indexing, showcalc executes about 16 seconds. With function
SUBSTR the same showcalc executes 6 seconds.

Fair enough - substr should be fairly efficient.
	Cost of user-defined SQL function call in PostgreSQL is high?

OK - bear in mind that these suggestions are made without the benefit of the 
explain analyse:

1. You could try splitting out the various tags of your mask into different 
fields - that will instantly eliminate all the substr() calls and might make 
a difference. If you want to keep the mask for display purposes, we could 
build a trigger to keep it in sync with the separate flags.
	This will be next step. :)

2. Use a calculations table and build your results step by step. So - 
calculate all the simple accounts, then calculate the ones that contain the 
simple accounts.
	I give to SQL to user and few helper functions. Therefore single step 
is required for building results.

3. You could keep a separate account_contains table that might look like:
  acc_id | contains
  A001   | A001
  A002   | A002
  A003   | A003
  A003   | A001
  A004   | A004
  A004   | A003
  A004   | A001

So here A001/A002 are simple accounts but A003 contains A001 too. A004 
contains A003 and A001. The table can be kept up to date automatically using 
some triggers.
This should make it simple to pick up all the accounts contained within the 
target account and might mean you can eliminate the recursion.
	Thanks, sounds not so bad, but I suspect that this method don't improve 
performance essentially.
	I think about another secondary table for showcomp (compshow :)) with 
showings compiled into account numbers and characteritics. After 
inserting or updating new or old showing this showing will be 
recompiled by explicit function call or trigger into atomary account 
numbers and characteristics.

Post the EXPLAIN ANALYSE first - maybe someone smarter than me will have an 
idea.
	First result - simple showing 'B00202' (without recursion).
	Second result - complex showing 'B00204' with recursion (1 level depth).
	Showing 'B00202' contains 85 accounts, 'B00203' - 108 accounts, and 
'B00204' = 'B00202' - 'B00203'.
	Query text:

EXPLAIN ANALYZE SELECT COALESCE(
	(SELECT sc.koef * 100
		FROM showing AS s NATURAL JOIN showcomp AS sc
		WHERE s.kod = 'B00202'
			AND NOT SUBSTR(acc_mask, 1, 1) = '['
			AND SUBSTR(acc_mask, 1, 4) = '6010'
			AND SUBSTR(acc_mask, 5, 1) = SUBSTR('20', 1, 1)),
	(SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, 
LENGTH(acc_mask) - 2), '20', '6010', 100), 0))
		FROM showing AS s NATURAL JOIN showcomp AS sc
		WHERE s.kod = 'B00202'
			AND SUBSTR(acc_mask, 1, 1) = '['),
	0) AS showing;

EXPLAIN ANALYZE SELECT COALESCE(
	(SELECT sc.koef * 100
		FROM showing AS s NATURAL JOIN showcomp AS sc
		WHERE s.kod = 'B00204'
			AND NOT SUBSTR(acc_mask, 1, 1) = '['
			AND SUBSTR(acc_mask, 1, 4) = '6010'
			AND SUBSTR(acc_mask, 5, 1) = SUBSTR('20', 1, 1)),
	(SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2, 
LENGTH(acc_mask) - 2), '20', '6010', 100), 0))
		FROM showing AS s NATURAL JOIN showcomp AS sc
		WHERE s.kod = 'B00204'
			AND SUBSTR(acc_mask, 1, 1) = '['),
	0) AS showing;

   QUERY PLAN 

--
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=704.39..704.39 
rows=1 loops=1)
   InitPlan
 -  Hash Join  (cost=5.22..449.63 rows=1 width=19) (actual 
time=167.28..352.90 rows=1 loops=1)
   Hash Cond: (outer.id_show = inner.id_show)
   -  Seq Scan on showcomp sc  (cost=0.00..444.40 rows=1 
width=15) (actual time=23.29..350.17 

[PERFORM] query/table design help

2003-08-14 Thread Ara Anjargolian
I have a table
permissions
with the fields (party_id integer, permission varchar, key_name varchar,
key_value integer)
for which I need to a query to see if a person has permission to carry out a
particular action.
The query looks like:
SELECT 1
FROM permissions
WHERE   party_id in (4, 7, 11, 26)
AND
permission = 'permission8'
AND
((key_name = 'keyname8' AND key_value = 223) OR
 (key_name = 'keyname1' AND key_value = 123) OR
 (key_name = 'keyname5' AND key_value = 212) OR
 (key_name = 'keyname7' AND key_value = 523) OR
 (key_name = 'keyname0' AND key_value = 123) OR
 (key_name = 'keyname10' AND key_value = 400));

would a permissions(party_id, permission) index work best here?
or should I index all 4 columns?

Also,
Another alternative is to combine the key_name and key_value fields into a
varchar
field key (e. g. 'keyname8=223'), in which case the equilalent query would
just check
1 field 6 times instead of having 6 ANDstatements.

I expect the table to have about 1 million rows at the most, and I need this
query to run as fast
as possible since it will be run many, many times.
So, from a design standpoint, what is the the best way to go, should I have
two fields key_name, and key_value,
or just one field key. And how should I index this table best.  I guess the
fundamental question here is, is it faster
to check a varchar(60) field for equality, or to check two check an integer
and then a varchar(30). Or does having
one varchar field replace an integer and a varchar field, allow for some
nice optimization not practical otherwise (i.e a 3-column index).

I'd greatly appreciate any insight into this matter.
-Ara Anjargolian





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] How to force PostgreeSQL to work faster?

2003-08-14 Thread Paul Thomas
On 11/08/2003 09:59 Serge Dorofeev wrote:
Hi PostrgeSQL  team,

My PostrgeSQL installed as part of CYGWIN (Windows XP).
I have compared performance PostrgeSQL to MS SQL (I used a little Java
program with number of inserts in table).
MS SQL is faster in 12 times :-(
It's very strange results.
Guys who developed this server: what you can tell in this - what
customizations needs to increase of productivity?
How to force PostgreeSQL to work faster?
 	 Speed (inserts/sec)	 Elapsed time (ms)	 
MS SQL (Average):	 295	 39 869	 
 	 testInsert 5000	  	 
 	 263	 18 977	 
 	 255	 19 619	 
 	 306	 16 334	 
 	  	  	 
 	 testInsert 1	  	 
 	 315	 31 716	 
 	 324	 30 905	 
 	 319	 31 325	 
 	  	  	 
 	 testInsert 2	  	 
 	 241	 82 919	 
 	 313	 63 922	 
 	 317	 63 101	 
 	  	  	 
PostrgreSQL (Average):	 24	 520 160	 
 	 testInsert 5000	  	 
 	 26	 191 434	 
 	 26	 191 264	 
 	 26	 192 295	 
 	  	  	 
 	 testInsert 1	  	 
 	 22	 463 669	 
 	 25	 393 510	 
 	 24	 409 528	 
 	  	  	 
 	 testInsert 2	  	 
 	 24	 834 911	 
 	 17	 1 184 613	 
 	 24	 820 218	 
MS SQL is faster (times):	 12	 13	 
You don't give any details about your test code or how the databases are 
configured so I'm guessing that you're inserts use an autocommitting 
connection. For PostgreSQL, this causes each insert to be run inside a 
tranaction and the transaction is then immediately written to disk. My 
guess is that MS SQL behaves differently and doesn't immediately write to 
disk (faster maybe but could cause data corruption). Try modifying your 
program to have connection.setAutoCommit(false) and do a 
connection.commit() after say every 100 inserts.

HTH

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Perfomance Tuning

2003-08-14 Thread mixo
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] ext3 block size

2003-08-14 Thread Wilson A. Galafassi Jr.



hello.
my database size is 5GB. what is the block size 
recommend?
thanks
wilson



Re: [PERFORM] Some vacuum tuning help

2003-08-14 Thread Tom Lane
Matthew T. O'Connor [EMAIL PROTECTED] writes:
 ... My really question
 was can I make large changes to a contrib module to a point release,
 meaning, 7.4.0 will have what is in beta, but 7.4.1 would have a much
 improved version.  Does that sound possible?

For core code, the answer would be a big NYET.  We do not do feature
additions in point releases, only bug fixes.  While contrib code is more
under the author's control than the core committee's control, I'd still
say that you'd be making a big mistake to not follow that basic
guideline.  People expect release x.y.z+1 to be the same as x.y.z except
for bug fixes.  Introducing any new bugs into x.y.z+1 would cause a
large loss in your credibility.

(speaking as one who's introduced new bugs into a point-release
recently, and is still embarrassed about it, even though the intent
was only to fix older bugs...)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Ron,

 If the databases are on different machines, maybe the postgres.conf
 or pg_hba.conf files are different, and the buffer counts is affect-
 ing the optimizer?

The databases are on the same machine, using the same postmaster.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Some vacuum tuning help

2003-08-14 Thread Jeff
I've been trying to search through the archives, but it hasn't been
successful.

We recently upgraded from pg7.0.2 to 7.3.4 and things were happy. I'm
trying to fine tune things to get it running a bit better and I'm trying
to figure out how vacuum output correlates to tuning parameters.

Here's the msot recent vacuum for the active table.  It gets a few
hundred updates/inserts a minute constantly throughout the day.

INFO:  Pages 27781: Changed 0, Empty 0; Tup 2451648: Vac 0, Keep 0, UnUsed
1003361.
Total CPU 2.18s/0.61u sec elapsed 2.78 sec.

I see unused is quite high. This morning I bumped max_fsm_pages to 50.
If I'm thinking right you want unused and max_fsm to be closish, right?
(Yesterday it was down around.. oh.. 600k?)

I'm thinking vacuum full's may be in order. Which stinks because I was
hoping to do away with the db essentially down for 10 minutes (includes
all the db's on that machine) while it vacuum'd.

The upside is: it is performing great.  During the vacuum analyze I do get
a few multi-second pauses while something occurs. I figured it was a
checkpoint, so I bumped checkpoint_timeout to 30 seconds and wal_buffers
to 128. (I'm just guessing on wal_buffers).

Machine is weenucks 2.2.17 on a dual p3 800, 2gb ram, 18gb drive (mirrored).
If you guys need other info (shared_buffers, etc) I'll be happy to funish
them. but the issue isn't query slowness.. just want to get this thing
oiled).

thanks

--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Moving postgresql.conf tunables into 2003...

2003-08-14 Thread Tom Lane
Sean Chittenden [EMAIL PROTECTED] writes:
 If you CLUSTER on an index and then ANALYSE, you get a correlation of
 1.0 (== optimum) for the first column of the index.

 Correlating of what to what?  Of data to nearby data?  Of data to
 related data (ie, multi-column index?)? Of related data to pages on
 disk?  Not 100% sure in what context you're using the word
 correlation...

The correlation is between index order and heap order --- that is, are
the tuples in the table physically in the same order as the index?
The better the correlation, the fewer heap-page reads it will take to do
an index scan.

Note it is possible to measure correlation without regard to whether
there actually is any index; ANALYZE is simply looking to see whether
the values appear in increasing order according to the datatype's
default sort operator.

One problem we have is extrapolating from the single-column correlation
stats computed by ANALYZE to appropriate info for multi-column indexes.
It might be that the only reasonable fix for this is for ANALYZE to
compute multi-column stats too when multi-column indexes are present.
People are used to the assumption that you don't need to re-ANALYZE
after creating a new index, but maybe we'll have to give that up.

 But that value will degrade after time and at what rate?  Does ANALYZE
 maintain that value so that it's kept acurrate?

You keep it up to date by ANALYZE-ing at suitable intervals.  It's no
different from any other statistic.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSql under Linux

2003-08-14 Thread Neil Conway
On Wed, Aug 06, 2003 at 03:03:41PM -0300, Wilson A. Galafassi Jr. wrote:
 I'm installing Postgresql under linux for better performance and i want to know how 
 is the best configuration.

 1. What is the best linux distribuition for better performance?

The Linux distribution itself isn't that important, IMHO. Spend some time
selecting the right filesystem (check the archives for threads on this
topic), the right kernel (and perhaps compiling your own from scratch),
perhaps some kernel tuning (I/O scheduler, etc.), and so forth.

 2. Does exists any compilation options to better performance on this machine?

Not compilation options, but there are plenty of configuration settings
you should be tweaking to ensure good performance. You can find a list
of configuration options here:

http://www.postgresql.org/docs/7.3/static/runtime-config.html

-Neil


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] about performance of postgreSQL

2003-08-14 Thread Chris Travers
Hi Xin;

PostgreSQL is configured to run on virutally anything out of the box.
The reason for this is that, like Oracle, the database manager will not
start if it cannot allocate sufficient resources. We take the approach
of ensuring that it will start so you can tune it.

I would recomment trying to take a close look at many of the posts on
the Performance list (searching the archives) and paying attention to
things such as effective_cache_size and shared_buffers. If these don't
answer your questions, ask this list again.

Best Wishes,
Chris Travers


xin fu wrote:

 Dear master:
 I have learned postgreSQL for serveral days, now i meet some problems.
 when I use a TPCC(Transaction Processing Performance Council) test
 program to test the performance of postgreSQL , postgreSQL works very
 slowly, it almost need 1 minute to finish a transaction, and the CPU
 percent is almost 100%,
 the test environment is :
 OS: redhat 9.0(ext3, default configurations)
 Server: postgre7.3.4(default configurations) , PIII 800M, 1G Memory
 Client: tpcc test program,using ODBC API, PIII 800M, 1G Memory
 when using SQLServer, it can work on a workload of 40 Warehouse,
 but postgreSQL can not work even on 1 warehouse. I think there must be
 some problem with my postgreSQL, can you help me?
 I am in china, and my english is very poor, but i hope you can give
 me some advice, thanks.


 
 *Do You Yahoo!?*
  +
 http://cn.rd.yahoo.com/mail_cn/tag/?http://cn.promo.yahoo.com/minisite/messenger1/





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to force PostgreeSQL to work faster?

2003-08-14 Thread Tomka Gergely
Hi!

Please send me the test db and the queries, with precise information
maybe the developers can help.

-- 
Tomka Gergely
S most - vajon barbrok nlkl mi lesz velnk?
k mgiscsak megolds voltak valahogy...



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] How to efficiently duplicate a whole schema?

2003-08-14 Thread Sebastien Lemieux
Hi,

  I'm running on Redhat 7.2 with postgresql 7.3.2 and I have two schema in
the same database 'db' and 'db_dev'.  Both contain a set of 20 tables for
a total of less than 50 Mb of data each (on the order of 50k rows in
total).  Once in a while (often these days!), I need to synchronize the
dev version from the production 'db'.  Currently, I do this by setting
constraints to deferred, deleting everything in db_dev, then issue a serie
of insert ... select ... to copy data from each table in db to the
equivalent table in db_dev.

  This approach used to run in less than 30 seconds in MySQL, but in 
PostgreSQL it currently takes around 30 minutes.  The postmaster process 
is running at 100% cpu all the time.  I enclosed all the delete statement 
in one transaction and all the insert statements in a second transaction.  
All the time is taken at the commit of both transaction.

  Is there a more straightforward way to synchronize a development 
database to a production one?  Is there anyway to increase the performance 
of this delete/insert combination?  I've got indexes and constraints on 
most tables, could that be the problem?  At some point in the future, I 
will also need to make a copy of a whole schema ('db' into 'db_backup'), 
what would be an efficient way to do that?

  These are the parameters I've adjusted in the postgresql.conf:

max_connections = 16
shared_buffers = 3000
max_fsm_relations = 2000
max_fsm_pages = 2
sort_mem = 2
vacuum_mem = 2
effective_cache_size = 15000

  And this is the memory state of the machine:

[EMAIL PROTECTED] free
 total   used   free sharedbuffers cached
Mem:   20594722042224  17248  24768 1157121286572
-/+ buffers/cache: 6399401419532
Swap:  2096440 4909681605472

thanks,

-- 
Sebastien Lemieux
Bioinformatics, post-doc
Elitra-canada


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Some vacuum tuning help

2003-08-14 Thread Christopher Browne
Matthew T. O'Connor wrote:
 Fair point, my only concern is that a backend integrated
 pg_autovacuum would be radically different from the current libpq
 based client application.

Unfortunately, a configurable-via-tables pg_autovacuum is also going
to be quite different from the current unconfigurable version.

If we were to make it configurable, I would suggest doing so via
specifying a database and schema into which it would then insert a set
of tables to provide whatever information was considered worth
'fiddling' with.

But at that point, it makes sense to add in quite a bit of
configurable behaviour, such as:

 - Specifying that certain tables should _never_ be automatically 
vacuumed.

 - Establishing a queue of tables that pg_autovacuum plans to
vacuum, so that users could add in desired vacuums (after the
other stuff being handled, force in a vacuum of app_table_foo).
That way, vacuums can be 'forced in' without introducing the
possibility that multiple vacuums might be done at once...

 - Making information about what vacuums have been done/planned
persistent across runs of pg_autovacuum, and even across
shutdowns of the DBMS.

This changes behaviour enough that I'm not sure it's the same
program as the unconfigurable version.  Almost every option would be
substantially affected by the logic:

 if (CONFIG_DATA_IN_DB) {
   /* Logic path that uses data in Vacuum Schema */
 } else {
   /* More banal logic */
 }

If I can store configuration in the database, then I'd like to also
make up a view or two, and possibly even base the logic used on views
that combine configuration tables with system views.  In effect, that
makes for a _third_ radically different option.
-- 
output = reverse(ofni.smrytrebil @ enworbbc)
http://dev6.int.libertyrms.com/
Christopher Browne
(416) 646 3304 x124 (land)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] On Linux Filesystems

2003-08-14 Thread Andrew Sullivan
On Mon, Aug 11, 2003 at 10:58:18PM -0400, Christopher Browne wrote:
 1.  Nobody has gone through any formal proofs, and there are few
 systems _anywhere_ that are 100% reliable.  

I think the problem is that ext2 is known to be not perfectly crash
safe.  That is, fsck on reboot after a crash can cause, in some
extreme cases, recently-fscynced data to end up in lost+found/.  The
data may or may not be recoverable from there.

I don't think anyone would object to such a characterisation of ext2. 
It was not designed, ever, for perfect data safety -- it was designed
as a reasonably good compromise for most cases.  _Every_ filesystem
entails some compromises.  This happens to be the one entailed by
ext2.

For production use with valuable data, for my money (or, more
precisely, my time when a system panics for no good reason), it is
always worth the additional speed penalty to use something like
metadata journalling.  Maybe others have more time to spare.

 perhaps even including performance metrics for *BSD.  That, not
 Linux-baiting, is the answer...

I didn't see anyone Linux-baiting.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How to efficiently duplicate a whole schema?

2003-08-14 Thread scott.marlowe
On Wed, 6 Aug 2003, Tom Lane wrote:

 scott.marlowe [EMAIL PROTECTED] writes:
  On Wed, 6 Aug 2003, Tom Lane wrote:
  One obvious question is whether you have your foreign keys set up
  efficiently in the first place.  As a rule, the referenced and
  referencing columns should have identical datatypes and both should
  be indexed.  (PG will often let you create foreign key constraints
  that don't meet these rules ... but performance will suffer.)
 
  Is this one of those things that should spit out a NOTICE when it happens? 
  I.e. when a table is created with a references and uses a different type 
  than the parent, would it be a good idea to issue a NOTICE: parent and 
  child fields are not of the same type
 
 I could see doing that for unequal data types, but I'm not sure if it's
 reasonable to do it for lack of index.  Usually you won't have created
 the referencing column's index yet when you create the FK constraint,
 so any warning would just be noise.  (The referenced column's index *is*
 checked for, since we require it to be unique.)

Sure.  I wasn't thinking of the index issue anyway, just the type 
mismatch.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Fri, 8 Aug 2003, Andrew Sullivan wrote:

 On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:
  
  Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
  (ext2?) with the partition that holds your data and WAL.
 
 I would give you exactly the opposite advice: _never_ use a
 non-journalling fs for your data and WAL.  I suppose if you can
 afford to lose some transactions, you can do without journalling. 
 Otherwise, you're just borrowing trouble, near as I can tell.

I'd argue that a reliable filesystem (ext2) is still better than a 
questionable journaling filesystem (ext3 on kernels 2.4.20).

This isn't saying to not use jounraling, but I would definitely test it 
under load first to make sure it's not gonna lose data or get corrupted.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Jonathan Gardner
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Friday 08 August 2003 03:28, mixo wrote:
 I have just installed redhat linux 9 which ships with Pg
 7.3.2. Pg has to be setup so that data inserts (blobs) should
 be able to handle at least 8M at a time. The machine has
 two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
 a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
 What would be the recomended setup for good performance
 considering that the db will have about 15 users for
 9 hours in a day, and about 10 or so users throughout the day
 who wont be conistenly using the db.


Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
(ext2?) with the partition that holds your data and WAL.

Consider having a seperate partition for the WAL as well.

These are things that are more difficult to change later on. Everything else 
is tweaking.

Is it absolutely necessary to store 8MB files in the database? I find it 
cumbersome. Storing them on a file server has been a better alternative for 
me.

- -- 
Jonathan Gardner [EMAIL PROTECTED]
Live Free, Use Linux!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.1 (GNU/Linux)

iD8DBQE/M9J0WgwF3QvpWNwRAlT5AJ9EmDourbCiqj7MFOqfBospc2dW7gCfZKz0
JQjn/2KAeh1SPJfN601LoFg=
=PW6k
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Sean Chittenden
   Well, yeah.  But given the Linux propensity for introducing major
   features in minor releases (and thereby introducing all the
   attendant bugs), I'd think twice about using _any_ Linux feature
   until it's been through a major version (e.g. things introduced in
   2.4.x won't really be stable until 2.6.x) -- and even there one is
   taking a risk[1].
  
  Dudes, seriously - switch to FreeBSD :P
 
 Yeah, it's nice to have a BUG FREE OS huh? ;^)
 
 And yes, I've used FreeBSD, it's quite good, but I kept getting the
 feeling it wasn't quite done.  Especially the installation
 documentation.

While the handbook isn't the same as reading the actual source or the
only FreeBSD documentation, it certainly is quite good (to the point
that publishers see small market to publish FreeBSD books because the
documentation provided by the project is so good), IMHO.

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/

If anyone on this list has any issues with the documentation, please
take them up with me _privately_ and I will do my best to either
address or correct the problem.

Now, back to our regularly scheduled and on topic programming...  -sc

-- 
Sean Chittenden
(PostgreSQL|FreeBSD).org - The Power To Serve

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Yaroslav Mazurak
		Hi All!

	I have installed PostgreSQL 7.3.2 on FreeBSD 4.7, running on PC with 
CPU Pentium II 400MHz and 384Mb RAM.

	Problem is that SQL statement (see below) is running too long. With 
current WHERE clause 'SUBSTR(2, 2) IN ('NL', 'NM') return 25 records. 
With 1 record, SELECT time is about 50 minutes and takes approx. 120Mb 
RAM. With 25 records SELECT takes about 600Mb of memory and ends after 
about 10 hours with error: Memory exhausted in AllocSetAlloc(32).

	***
	How can I speed up processing? Why query (IMHO not too complex) 
executes so long? :(
	***

	Information about configuration, data structures and table sizes see 
below. Model picture attached.

	Current postgresql.conf settings (some) are:

=== Cut ===
max_connections = 8
shared_buffers = 8192
max_fsm_relations = 256
max_fsm_pages = 65536
max_locks_per_transaction = 16
wal_buffers = 256
sort_mem = 131072
vacuum_mem = 16384
checkpoint_segments = 4
checkpoint_timeout = 300
commit_delay = 32000
commit_siblings = 4
fsync = false
enable_seqscan = false

effective_cache_size = 65536
=== Cut ===
	SELECT statement is:

SELECT  showcalc('B00204', dd, r020, t071) AS s04
FROMv_file02wide
WHERE   a011 = 3
AND inrepdate(data)
AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM')
AND r030 = 980;
	Query plan is:

QUERY PLAN 

--
 Aggregate  (cost=174200202474.99..174200202474.99 rows=1 width=143)
   -  Hash Join  (cost=174200199883.63..174200202474.89 rows=43 width=143)
 Hash Cond: (outer.id_k041 = inner.id_k041)
 -  Hash Join  (cost=174200199880.57..174200202471.07 rows=43 
width=139)
   Hash Cond: (outer.id_r030 = inner.id_r030)
   -  Hash Join  (cost=174200199865.31..174200202410.31 
rows=8992 width=135)
 Hash Cond: (outer.id_r020 = inner.id_r020)
 -  Hash Join 
(cost=174200199681.91..174200202069.55 rows=8992 width=124)
   Hash Cond: (outer.id_dd = inner.id_dd)
   -  Merge Join 
(cost=174200199676.04..174200201906.32 rows=8992 width=114)
 Merge Cond: (outer.id_v = inner.id_v)
 Join Filter: ((outer.data = CASE 
WHEN (inner.dataa IS NOT NULL) THEN inner.dataa WHEN (outer.data 
IS NOT NULL) THEN outer.data ELSE NULL::date END) AND (outer.data = 
CASE WHEN (inner.datab IS NOT NULL) THEN inner.datab WHEN 
(outer.data IS NOT NULL) THEN outer.data ELSE NULL::date END))
 -  Sort  (cost=42528.39..42933.04 
rows=161858 width=65)
   Sort Key: filexxr.id_v
   -  Hash Join 
(cost=636.25..28524.10 rows=161858 width=65)
 Hash Cond: (outer.id_obl 
= inner.id_obl)
 -  Hash Join 
(cost=632.67..25687.99 rows=161858 width=61)
   Hash Cond: 
(outer.id_r = inner.id_r)
   -  Index Scan using 
index_file02_k041 on file02  (cost=0.00..18951.63 rows=816093 width=32)
   -  Hash 
(cost=615.41..615.41 rows=6903 width=29)
 -  Index Scan 
using index_filexxr_a011 on filexxr  (cost=0.00..615.41 rows=6903 width=29)
   Index 
Cond: (id_a011 = 3)
   Filter: 
inrepdate(data)
 -  Hash  (cost=3.47..3.47 
rows=43 width=4)
   -  Index Scan using 
kod_obl_pkey on kod_obl obl  (cost=0.00..3.47 rows=43 width=4)
 -  Sort 
(cost=174200157147.65..174200157150.57 rows=1167 width=49)
   Sort Key: dov_tvbv.id_v
   -  Merge Join 
(cost=0.00..174200157088.20 rows=1167 width=49)
 Merge Cond: 
(outer.id_bnk = inner.id_bnk)
 -  Index Scan using 
dov_bank_pkey on dov_bank  (cost=0.00..290100261328.45 rows=1450 width=13)
   Filter: (subplan)
   SubPlan
 -  Materialize 
(cost=10090.02..10090.02 rows=29 width=11)
   -  Seq Scan 
on dov_bank  

Re: [PERFORM] How can I Improve performance in Solaris?

2003-08-14 Thread ingrid martinez
The query that execute is only inserts, I use a batch of 300 and then do
commit.

insert into FLOWS values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

and

postgresql.conf looks like this


#
# PostgreSQL configuration file
# -
#
# This file consists of lines of the form
#
#   name = value
#
# (The `=' is optional.) White space is collapsed, comments are
# introduced by `#' anywhere on a line.  The complete list of option
# names and allowed values can be found in the PostgreSQL
# documentation.  The commented-out settings shown in this file
# represent the default values.

# Any option can also be given as a command line switch to the
# postmaster, e.g., 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.


#


#
#   Connection Parameters
#
#tcpip_socket = false
#ssl = false

#max_connections = 32

#port = 5432
#hostname_lookup = false
#show_source_port = false

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777

#virtual_host = ''

#krb_server_keyfile = ''


#
#   Shared Memory Size
#
#shared_buffers = 64# 2*max_connections, min 16
#max_fsm_relations = 100# min 10, fsm is free space map
#max_fsm_pages = 1  # min 1000, fsm is free space map
#max_locks_per_transaction = 64 # min 10
#wal_buffers = 8# min 4

#
#   Non-shared Memory Sizes
#
#sort_mem = 512 # min 32
#vacuum_mem = 8192  # min 1024


#
#   Write-ahead log (WAL)
#
#wal_files = 0 # range 0-64
#wal_sync_method = fsync   # the default varies across platforms:
#  # fsync, fdatasync, open_sync, or open_datasync
#wal_debug = 0 # range 0-16
#commit_delay = 0  # range 0-10
#commit_siblings = 5   # range 1-1000
#checkpoint_segments = 3   # in logfile segments (16MB each), min 1
#checkpoint_timeout = 300  # in seconds, range 30-3600
#fsync = true


#
#   Optimizer Parameters
#
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true

#ksqo = false

#effective_cache_size = 1000  # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025


#
#   GEQO Optimizer Parameters
#
#geqo = true
#geqo_selection_bias = 2.0 # range 1.5-2.0
#geqo_threshold = 11
#geqo_pool_size = 0# default based on #tables in query, range
128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed


#
#   Debug display
#
#silent_mode = false

#log_connections = false
#log_timestamp = false
#log_pid = false

#debug_level = 0 # range 0-16

#debug_print_query = false
#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false

# requires USE_ASSERT_CHECKING
#debug_assertions = true


#
#   Syslog
#
# requires ENABLE_SYSLOG
#syslog = 0 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'


#
#   Statistics
#
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_query_stats = false

# requires BTREE_BUILD_STATS
#show_btree_build_stats = false


#
#   Access statistics collection
#
#stats_start_collector = true
#stats_reset_on_server_start = true
#stats_command_string = false
#stats_row_level = false
#stats_block_level = false


#
#   Lock Tracing
#
#trace_notify = false

# requires LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0


#
#   Misc
#
#dynamic_library_path = '$libdir'
#australian_timezones = false
#authentication_timeout = 60# min 1, max 600
#deadlock_timeout = 1000
#default_transaction_isolation = 'read committed'
#max_expr_depth = 1 # min 10
#max_files_per_process = 1000   # min 25
#password_encryption = false
#sql_inheritance = true
#transform_null_equals = false





- Original Message - 
From: Andrew Sullivan [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Wednesday, August 13, 2003 9:32 AM
Subject: Re: [PERFORM] How can I Improve performance in Solaris?


 On Wed, Aug 13, 2003 at 09:03:31AM -0500, ingrid martinez wrote:
  I want to know, how can I improve the performance of postgres, I
  have a java class thar inserts register every 30 min but is very
  slow.

 What does the query do?  How is postgres configured?

 A

 -- 
 
 Andrew Sullivan 204-4141 Yonge Street
 Liberty RMS   Toronto, Ontario Canada
 [EMAIL PROTECTED]  M2P 2A8
  +1 416 646 3304 x110


 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Partly.  The numbers suggest that in ANALYZE's default sample of 3000
 rows, it's only finding about a dozen non-null tgroup_ids (yielding the
 0.996 null_frac value); and that in one case all dozen are different and
 in the other case there are two duplicates.  It would help if you
 boosted the stats target for this column by a factor of 10.  (You can
 do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
 show that you did so.)

Hmmm.  No dice.   I raised the selectivity to 1000, which increased n_distinct 
to 108, which is pretty close to accurate.  However, the planner still 
insists on using a seq scan on case_clients unless I drop random_page_cost to 
1.5 (which is up from 1.2 but still somewhat unreasonable).

 But the other part of the problem is that in 7.2, the join selectivity
 estimator is way off when you are joining a unique column (like the pkey
 on the other side) to a column with a very large fraction of nulls.
 We only discovered this recently; it's fixed as of 7.3.3:

OK, I'll talk to the client about upgrading.

-- 
-Josh Berkus

__AGLIO DATABASE SOLUTIONS___
Josh Berkus
   Complete information technology  [EMAIL PROTECTED]
and data management solutions   (415) 565-7293
   for law firms, small businesses   fax 621-2533
and non-profit organizations.   San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] partitioning for postgresql

2003-08-14 Thread scott.marlowe
On Wed, 6 Aug 2003, Wilson A. Galafassi Jr. wrote:

 hello!!!
 what is suggested partitioning schema for postgresql??
 the size of my db is 5BG and i have 36GB scsi disk!

The first recommendation is to run Postgresql on a RAID set for 
reliability.  

I'm assuming you're building a machine and need to put both the OS and 
Postgresql database on that one disk.

If that's the case, just put the OS on however you like (lotsa different 
ways to partition for the OS) and leave about 30 gig for Postgresql to run 
in, then just put the whole database $PGDATA directory on that partition.

I'd recommend running ext3 with meta data journaling only for speed,
security, and ease of setup and use.  XFS is the next choice, which is a 
little harder to setup, as it's not included in most distros, but is 
definitely faster than ext3 at most stuff.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Tue, 2003-08-12 at 13:39, Bruce Momjian wrote:
 OK, I got some hard evidence.  Here is a discussion on the Linux kernel
 mailing list with postings from Allen Cox (ac Linux kernels) and Stephen
 Tweedie (ext3 author).
 
   http://www.tux.org/hypermail/linux-kernel/1999week14/subject.html#start
 
 Search for softupdates and ext2.
 
 Here is the original email in the thread:
 
   http://www.tux.org/hypermail/linux-kernel/1999week14/0498.html
 
 Summary is at:
 
   http://www.tux.org/hypermail/linux-kernel/1999week14/0571.html
 
 and conclusion in:
 
   http://www.tux.org/hypermail/linux-kernel/1999week14/0504.html
 
 I now remember the issue --- ext2 makes all disk changes asynchonously
 (unless you mount it via sync, which is slow).  This means that the file
 system isn't always consistent on disk.  
 
 UFS has always sync metadata (file/directory creation) to the disk so
 the disk was always consistent, but doesn't sync the data to the disk,
 for performance reasons.  With soft updates, the metadata writes are
 delayed, and written to disk in an order that keeps the file system
 consistent.

 Is this enough evidence, or should I keep researching?

This is all 4 years old, though.  Isn't that why the ext3 layer was
created, and filesystems like reiserFS, XFS and (kinda) JFS were added
to Linux?

 ---
 
 Neil Conway wrote:
  On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
   I don't use Linux and was just repeating what I had heard from others,
   and read in postings.  I don't have any first-hand experience with ext2
   (except for a laptop I borrowed that wouldn't boot after being shut
   off), but others on this mailing list have said the same thing.
  
  Right, and I understand the need to answer users asking about
  which filesystem to use, but I'd be cautious of bad-mouthing
  another OSS project without any hard evidence to back up our
  claim (of course if we have such evidence, then fine -- I
  just haven't seen it). It would be like $SOME_LARGE_OSS
  project saying Don't use our project with PostgreSQL, as
  [EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
  UnixWare -- kind of annoying, right?
  
(a) ext3 does metadata-only journalling by default
   
   If that is true, why was I told people have to mount their ext3 file
   systems with metadata-only.  Again, I have no experience myself, but why
   are people telling me this?
  
  Perhaps they were suggesting that people mount ext2 using
  data=writeback, rather than the default of data=ordered.
  
  BTW, I've heard from a couple different people that using
  ext3 with data=journalled (i.e. enabling journalling of both
  data and metadata) actually makes PostgreSQL faster, as
  it means that ext3 can skip PostgreSQL's fsync request
  since ext3's log is flushed to disk already. I haven't
  tested this myself, however.
  
  -Neil

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| Man, I'm pretty.  Hoo Hah!  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How can I Improve performance in Solaris?

2003-08-14 Thread scott.marlowe
More than likely you are suffering from an affliction known as type 
mismatch.  This is listed as tip 9 here on the performance list (funny, it 
was sent at the bottom of your reply :-)

What happens is that when you do:

select * from some_table where id=123;

where id is a bigint the query planner assumes you must want 123 
cast to int4, which doesn't match int8 (aka bigint) and uses a sequential 
scan to access that row.  I.e. it reads the whole table in.

You can force the planner to do the right thing here in a couple of ways:

select * from some_table where id=123::bigint;

-- OR --

select * from some_table where id='123';

On Wed, 13 Aug 2003, ingrid martinez wrote:

 the primary key is   flidload
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Neil Conway
On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
 I don't use Linux and was just repeating what I had heard from others,
 and read in postings.  I don't have any first-hand experience with ext2
 (except for a laptop I borrowed that wouldn't boot after being shut
 off), but others on this mailing list have said the same thing.

Right, and I understand the need to answer users asking about
which filesystem to use, but I'd be cautious of bad-mouthing
another OSS project without any hard evidence to back up our
claim (of course if we have such evidence, then fine -- I
just haven't seen it). It would be like $SOME_LARGE_OSS
project saying Don't use our project with PostgreSQL, as
[EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
UnixWare -- kind of annoying, right?

  (a) ext3 does metadata-only journalling by default
 
 If that is true, why was I told people have to mount their ext3 file
 systems with metadata-only.  Again, I have no experience myself, but why
 are people telling me this?

Perhaps they were suggesting that people mount ext2 using
data=writeback, rather than the default of data=ordered.

BTW, I've heard from a couple different people that using
ext3 with data=journalled (i.e. enabling journalling of both
data and metadata) actually makes PostgreSQL faster, as
it means that ext3 can skip PostgreSQL's fsync request
since ext3's log is flushed to disk already. I haven't
tested this myself, however.

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Analyze makes queries slow...

2003-08-14 Thread Stef
Hi all,

I posted this problem on the sql list, and was referred to this list in stead.
I have attached an sql statement that normally runs under 3 minutes.
That is, until I vacuum analyze the database (or just the tables in the query),
then the same query runs longer than 12 hours, and I have to kill it.

However 90% of queries are faster after analyzing on this database,
there are two or three, including this one that takes for ever.

I have tried to reverse engineer the explain plan from before analyzing,
to come up with an sql statement, using proper joins, to force the planner
to do the original join, but although I came close, I never got the same 
result as the original query.

I suspect that this might be caused by some of the crazy indexes that 
were built on some of these tables, but I can't really do much about that,
unless I can come up with a very good reason to nuke them.

I also attached the create table statements for all the tables, as well
as a row count of each.

Can somebody help me with guidelines or something similar, 
to understand exactly what is happening in the explain plan.

TIA
Stefan

 Aggregate  (cost=52.00..61.64 rows=32 width=241)
   -  Group  (cost=52.00..57.62 rows=321 width=241)
 -  Sort  (cost=52.00..52.80 rows=321 width=241)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Merge Join  (cost=36.38..38.62 rows=321 width=241)
 Merge Cond: ((outer.group_cde = inner.group_cde) AND 
(outer.brn_code = inner.brn_code))
 Join Filter: ((outer.price_tmpl_hdr_cde = 
inner.price_tmpl_hdr_cde) AND (outer.price_tmpl_hdr_reg = 
inner.price_tmpl_hdr_reg))
 -  Nested Loop  (cost=0.00..1407212.08 rows=63 width=179)
   Join Filter: ((inner.sku_mst_cde = outer.sku) AND 
(inner.group_cde = outer.group_code))
   -  Merge Join  (cost=0.00..1405644.89 rows=315 width=135)
 Merge Cond: (outer.group_code = inner.group_code)
 Join Filter: (outer.sku = inner.sku)
 -  Nested Loop  (cost=0.00..4826563.70 rows=8694 
width=108)
   -  Index Scan using master_fpp_values_idx2 on 
master_fpp_values m  (cost=0.00..3766902.34 rows=215650 width=54)
 Filter: (fpp_code = '200307'::text)
   -  Index Scan using pk_supplier_price on 
supplier_price ss  (cost=0.00..4.90 rows=1 width=54)
 Index Cond: ((ss.group_cde = 
outer.group_code) AND (ss.sku_mst_cde = outer.sku) AND (ss.supplier_cde = 
outer.supplier_code))
 -  Index Scan using master_sku_descr_idx3 on 
master_sku_descr s  (cost=0.00..2535.04 rows=10758 width=27)
   Filter: (control_code = '0'::text)
   -  Index Scan using idx_sku_price on sku_price sk  
(cost=0.00..4.96 rows=1 width=44)
 Index Cond: ((sk.group_cde = outer.group_cde) AND 
(sk.sku_mst_cde = outer.sku_mst_cde) AND (sk.price_tmpl_hdr_cde = 
outer.price_tmpl_hdr_cde) AND (sk.price_tmpl_hdr_reg = outer.price_tmpl_hdr_reg))
 -  Sort  (cost=36.38..36.87 rows=198 width=62)
   Sort Key: p.group_cde, p.branch_cde
   -  Hash Join  (cost=18.46..28.82 rows=198 width=62)
 Hash Cond: (outer.brn_code = inner.branch_cde)
 -  Merge Join  (cost=13.94..20.34 rows=198 width=33)
   Merge Cond: (outer.country_code = 
inner.from_ctry)
   -  Index Scan using master_branch_descr_idx4 
on master_branch_descr b  (cost=0.00..33.12 rows=198 width=15)
   -  Sort  (cost=13.94..13.95 rows=4 width=18)
 Sort Key: f.from_ctry
 -  Index Scan using forex_idx1 on forex 
f  (cost=0.00..13.90 rows=4 width=18)
   Index Cond: ((to_ctry = 
'ZAF'::text) AND (fpp_code = '200307'::text))
 -  Hash  (cost=4.02..4.02 rows=202 width=29)
   -  Seq Scan on price_tmpl_det p  
(cost=0.00..4.02 rows=202 width=29)
(34 rows)

 Aggregate  (cost=163.58..163.61 rows=1 width=699)
   -  Group  (cost=163.58..163.60 rows=1 width=699)
 -  Sort  (cost=163.58..163.58 rows=1 width=699)
   Sort Key: m.group_code, m.sku, m.brn_code, m.stktype_code, 
ss.supplier_price_curr_cost, sk.sku_price_curr_cost
   -  Nested Loop  (cost=115.56..163.57 rows=1 width=699)
 Join Filter: ((outer.sku = inner.sku) 

Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Andrew Sullivan
On Tue, Aug 12, 2003 at 02:39:19PM -0400, Bill Moran wrote:
 Meaning ... just tell it a raw partition to keep the data on and
 Postgre would create its own filesystem ... obviously, doing that
 would allow Postgre to bypass all the failings of all filesystems
 and rely entirely apon its own rules.
 
 Or are modern filesystems advanced enough that doing something like
 that would lose more than it would gain?

The latter, mostly.  This has been debated repeatedly on -hackers. 
If you want raw access, then you have to implement some other kind
of specialised filesystem of your own.  And you have to have all
sorts of nice tools to cope with the things that (for instance) fsck
handles.  I think the reaction of most developers has been, Why
reinvent the wheel?

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bill Moran
Shridhar Daithankar wrote:
On 11 Aug 2003 at 23:42, Ron Johnson wrote:


On Mon, 2003-08-11 at 19:50, Christopher Kings-Lynne wrote:

Well, yeah.  But given the Linux propensity for introducing major
features in minor releases (and thereby introducing all the
attendant bugs), I'd think twice about using _any_ Linux feature
until it's been through a major version (e.g. things introduced in
2.4.x won't really be stable until 2.6.x) -- and even there one is
taking a risk[1].
Dudes, seriously - switch to FreeBSD :P
But, like, we want a *good* OS... 8-0


Joke aside, I guess since postgresql is pretty much reliant on file system for 
basic file functionality, I guess it's time to test Linux 2.6 and compare it.

And don't forget, for large databases, there is still XFS out there which is 
probably the ruler at upper end..
This is going to push the whole thing a little off-topic, but I'm curious to
know the answer.
Has it ever been proposed or attemped to run PostgreSQL without any filesystem
(or any other database for that matter ...).
Meaning ... just tell it a raw partition to keep the data on and Postgre would
create its own filesystem ... obviously, doing that would allow Postgre to
bypass all the failings of all filesystems and rely entirely apon its own
rules.
Or are modern filesystems advanced enough that doing something like that would
lose more than it would gain?
Just thinking out loud.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] On Linux Filesystems

2003-08-14 Thread Christopher Browne
Bruce Momjian commented:

 Uh, the ext2 developers say it isn't 100% reliable ... I mentioned
 it while I was visiting Red Hat, and they didn't refute it.

1.  Nobody has gone through any formal proofs, and there are few
systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
spacecraft to software bugs, so nobody will be making such rash claims
about ext2.

2.  Several projects have taken on the task of introducing journalled
filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
existed long before they had any relationship with Linux.)

Participants in such projects certainly have interest in presenting
the notion that they provide improved reliability over ext2.

3.  There is no apologist for ext2 that will either (stupidly and
futilely) claim it to be flawless.  Nor is there substantial interest
in improving it; the sort people that would be interested in that sort
of thing are working on the other FSes.

This also means that there's no one interested in going into the
guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
formally reliable.

4.  It would be silly to minimize the impact of commercial interest.
RHAT has been paying for the development of a would-be ext2 successor.
For them to refute your comments wouldn't be in their interests.

Note that these are warm and fuzzy comments, the whole lot.  The
80-some thousand lines of code involved in ext2, ext3, reiserfs, and
jfs are no more amenable to absolute mathematical proof of reliability
than the corresponding BSD FFS code.

6. Such efforts would be futile, anyways.  Disks are mechanical
devices, and, as such, suffer from substantial reliability issues
irrespective of the reliability of the software.  I have lost sleep on
too many occasions due to failures of:
 a) Disk drives,
 b) Disk controllers [the worst Oracle failure I encountered resulted
from this], and
 c) OS memory management.

I used ReiserFS back in its bleeding edge days, and find myself a
lot more worried about losing data to flakey disk controllers.

It frankly seems insulting to focus on ext2 in this way when:

 a) There aren't _hard_ conclusions to point to, just soft ones;

 b) The reasons for you hearing vaguely negative things about ext2
are much more likely political than they are technical.

I wish there were more hard and fast conclusions to draw, to be able
to conclusively say that one or another Linux filesystem was
unambiguously preferable for use with PostgreSQL.  There are not
conclusive metrics, either in terms of speed or of some notion of
reliability.  I'd expect ReiserFS to be the poorest choice, and for
XFS to be the best, but I only have fuzzy reasons, as opposed to
metrics.

The absence of measurable metrics of the sort is _NOT_ a proof that
(say) FreeBSD is conclusively preferable, whatever your own
preferences (I'll try to avoid characterizing it as prejudices, as
that would be unkind) may be.  That would represent a quite separate
debate, and one that doesn't belong here, certainly not on a thread
where the underlying question was Which Linux FS is preferred?

If the OSDB TPC-like benchmarks can get packaged up well enough to
easily run and rerun them, there's hope of getting better answers,
perhaps even including performance metrics for *BSD.  That, not
Linux-baiting, is the answer...
-- 
select 'cbbrowne' || '@' || 'acm.org';
http://www.ntlug.org/~cbbrowne/sap.html
(eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
  ; will pick up previous value to var set!-ed,
  ; the unassigned object.
-- from BBN-CL's cl-parser.scm

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Also, there doesn't seem to be any way in 7.2 for me to find out what the 
 current statistics target for a column is.   What am I missing?

There still isn't a handy command for it --- you have to look at
pg_attribute.attstattarget for the column.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton
On Thursday 07 August 2003 17:30, Yaroslav Mazurak wrote:
 Hi All!


   First, thanks for answers!

 Richard Huxton wrote:
  On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:
 IIRC there is a limit on filesystem cache on freeBSD. 300MB by default.
 If that is the case, you might have to raise it to make
 effective_cache_size really effective..
 
 Try various sysctls says nothing for me. I want use *all available
 RAM* (of course, without needed for OS use) for PostgreSQL.
 
  PG will be using the OS' disk caching.

   I think all applications using OS disk caching. ;)
   Or you want to say that PostgreSQL tuned for using OS-specific cache
 implementation?
   Do you know method for examining real size of OS filesystem cache? If I
 understood right, PostgreSQL dynamically use all available RAM minus
 shared_buffers minus k * sort_mem minus effective_cache_size?
   I want configure PostgreSQL for using _maximum_ of available RAM.

PG's memory use can be split into four areas (note - I'm not a developer so 
this could be wrong).
1. Shared memory - vital so that different connections can communicate with 
each other. Shouldn't be too large, otherwise PG spends too long managing its 
shared memory rather than working on your queries.
2. Sort memory - If you have to sort results during a query it will use up to 
the amount you define in sort_mem and then use disk if it needs any more. 
This is for each sort.
3. Results memory - If you're returning 8000 rows then PG will assemble these 
and send them to the client which also needs space to store the 8000 rows.
4. Working memory - to actually run the queries - stack and heap space to keep 
track of its calculations etc.

Your best bet is to start off with some smallish reasonable values and step 
them up gradually until you don't see any improvement. What is vital is that 
the OS can cache enough disk-space to keep all your commonly used tables and 
indexes in memory - if it can't then you'll see performance drop rapidly as 
PG has to keep accessing the disk.

For the moment, I'd leave the settings roughly where they are while we look at 
the query, then once that's out of the way we can fine-tune the settings.

[snip suggestion to break the query down]
   Yes, you're right. I've tested a few statements and obtain interesting
 results.
   SELECT * FROM v_file02wide WHERE... executes about 34 seconds.
   SELECT showcalc(...); executes from 0.7 seconds (without recursion) up
 to 6.3 seconds if recursion is used! :(
   This mean, that approximate execute time for fully qualified SELECT
 with about 8K rows is... about 13 hours! :-O

Hmm - not good.

   Hence, problem is in my function showcalc:

That's certainly the place to start, although we might be able to do something 
with v_file02wide later.

 CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
 NUMERIC(16)) RETURNS NUMERIC(16)
 LANGUAGE SQL STABLE AS '
 -- Parameters: code, dd, r020, t071
   SELECT COALESCE(
   (SELECT sc.koef * $4
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod = $1
   AND NOT SUBSTR(acc_mask, 1, 1) = ''[''
   AND SUBSTR(acc_mask, 1, 4) = $3
   AND SUBSTR(acc_mask, 5, 1) = SUBSTR($2, 1, 1)),
   (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
 LENGTH(acc_mask) - 2), $2, $3, $4), 0))
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod = $1
   AND SUBSTR(acc_mask, 1, 1) = ''[''),
   0) AS showing;
 ';

   BTW, cross join , with WHERE clause don't improve performance
 relative to NATURAL JOIN.
   Additionally, with user-defined function beginchar (SUBSTR(..., 1, 1)),
 used for indexing, showcalc executes about 16 seconds. With function
 SUBSTR the same showcalc executes 6 seconds.

Fair enough - substr should be fairly efficient.

[snip explanation of table structures and usage]

I'm not going to claim I understood everything in your explanation, but there 
are a couple of things I can suggest. However, before you go and do any of 
that, can I ask you to post an EXPLAIN ANALYSE of two calls to your 
showcalc() function (once for a simple account, once for one with recursion)? 
You'll need to cut and paste the query as standard SQL since the explain 
won't look inside the function body.

OK - bear in mind that these suggestions are made without the benefit of the 
explain analyse:

1. You could try splitting out the various tags of your mask into different 
fields - that will instantly eliminate all the substr() calls and might make 
a difference. If you want to keep the mask for display purposes, we could 
build a trigger to keep it in sync with the separate flags.

2. Use a calculations table and build your results step by step. So - 
calculate all 

Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Peter Darley
Josh,
I'm sure that you've thought of this, but it sounds like you may not have
done an analyze in your new DB.
Thanks,
Peter Darley

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Josh Berkus
Sent: Monday, August 11, 2003 3:48 PM
To: Ron Johnson; PgSQL Performance ML
Subject: Re: [PERFORM] Odd problem with performance in duplicate
database


Ron,

 If the databases are on different machines, maybe the postgres.conf
 or pg_hba.conf files are different, and the buffer counts is affect-
 ing the optimizer?

The databases are on the same machine, using the same postmaster.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes:
 Rod Taylor wrote:
 On Fri, 2003-08-08 at 14:53, Andrew Sullivan wrote:
 I would give you exactly the opposite advice: _never_ use a
 non-journalling fs for your data and WAL.  I suppose if you can
 afford to lose some transactions, you can do without journalling. 
 Otherwise, you're just borrowing trouble, near as I can tell.
 
 Agreed.. WAL cannot recover something when WAL no longer exists due to a
 filesystem corruption.

 It is true that ext2 isn't good because the file system may not recover,
 but BSD UFS isn't a journalled file system, but does guarantee file
 system recovery after a crash --- it is especially good using soft
 updates.

The main point here is that the filesystem has to be able to take care
of itself; we expect it not to lose any files or forget where the data
is.  If it wants to use journalling to accomplish that, fine.

Journalling file contents updates, as opposed to filesystem metadata,
should be redundant with what we do in WAL.  So I'd recommend
journalling metadata only, if that option is available (and if Postgres
stuff is the only stuff on the disk...)

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Kings-Lynne
 Well, yeah.  But given the Linux propensity for introducing major
 features in minor releases (and thereby introducing all the
 attendant bugs), I'd think twice about using _any_ Linux feature
 until it's been through a major version (e.g. things introduced in
 2.4.x won't really be stable until 2.6.x) -- and even there one is
 taking a risk[1].

Dudes, seriously - switch to FreeBSD :P

Chris


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] How can I Improve performance in Solaris?

2003-08-14 Thread Andrew Sullivan
On Wed, Aug 13, 2003 at 10:17:45AM -0500, ingrid martinez wrote:
 The query that execute is only inserts, I use a batch of 300 and then do
 commit.
 
 insert into FLOWS values(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)

Are there any foreign keys, c?

 
 and
 
 postgresql.conf looks like this

[ . . .]

The configuration is the default.  You'll certainly want to increase
the shared memory and fiddle with some of the other usual pieces. 
There is some discussion of the config file at
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html. 
Unless the INSERTs are causing SELECTs, though, I can't see what
exactly might be causing you so much difficulty.

A

-- 

Andrew Sullivan 204-4141 Yonge Street
Liberty RMS   Toronto, Ontario Canada
[EMAIL PROTECTED]  M2P 2A8
 +1 416 646 3304 x110


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] How can I Improve performance in Solaris?

2003-08-14 Thread ingrid martinez
the primary key is   flidload


- Original Message - 
From: scott.marlowe [EMAIL PROTECTED]
To: ingrid martinez [EMAIL PROTECTED]
Cc: Andrew Sullivan [EMAIL PROTECTED];
[EMAIL PROTECTED]
Sent: Wednesday, August 13, 2003 11:47 AM
Subject: Re: [PERFORM] How can I Improve performance in Solaris?


 On Wed, 13 Aug 2003, ingrid martinez wrote:

  Floes table looks like this
 
  Table flows
  Column|   Type   | Modifiers
  --+--+---
   flidload | bigint   | not null
   firsttime| bigint   |
   fldestpeeraddress| character varying(30)|
   fldesttransaddress   | bigint   |
   fldesttranstype  | smallint |
   fldfromoctets| bigint   |
   fldscodepoint| smallint |
   fldtooctets  | bigint   |
   flfrompdus   | bigint   |
   flid | text |
   flidrule | bigint   |
   flsourcepeeraddress  | character varying(30)|
   flsourcetransaddress | bigint   |
   flsourcetranstype| smallint |
   fltime   | timestamp with time zone |
   fltopdus | bigint   |
   lasttime | bigint   |
   sourceinterface  | smallint |
   destinterface| smallint |
   sourceasn| smallint |
   destasn  | smallint |
  Primary key: flows_pkey

 Which columns are in the pkey?


 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Richard Welty
On Tue, 12 Aug 2003 13:09:42 -0700 Josh Berkus [EMAIL PROTECTED] wrote:
 This idea has been discussed numerous times on the HACKERS list, and is
 a 
 (pretty much) closed issue.   While Oracle and SQL Server use their own 
 filesystems, PostgreSQL will not because:
...
 2) The filesystem projects out there are (mostly) well-staffed and are 
 constantly advancing using specialized technology and theory.  There's
 no way 
 that the PostgreSQL team can do a better job in our spare time.

i consider this a fair answer, but i have a slightly different question to
ask, inspired by my discussions with a good friend who is a top notch
Informix DBA.

there are advantages to being able to split the database across a slew of
disk drives. if we accept the notion of using the native OS filesystem on
each, it would seem that being able to direct various tables and indices to
specific drives might be a valuble capability. i know that i could go into
/var/lib/pgsql/data/base and fan the contents out, but this is unweildy and
impractical. has any consideration been given to providing a way to manage
such a deployment?

or is it the judgement of the hackers community that a monsterous raid-10
array offers comparable performance?

i forget how large the data store on my friend's current project is, but
i'll check. knowing the size and transaction rate he's dealing with might
put a finer point on this discussion.

richard
--
Richard Welty [EMAIL PROTECTED]
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Christopher Kings-Lynne
 there are advantages to being able to split the database across a slew of
 disk drives. if we accept the notion of using the native OS filesystem on
 each, it would seem that being able to direct various tables and indices
to
 specific drives might be a valuble capability. i know that i could go into
 /var/lib/pgsql/data/base and fan the contents out, but this is unweildy
and
 impractical. has any consideration been given to providing a way to manage
 such a deployment?

We've got a little bunch of us tinkering with a tablespace implementation.
However, it's been staller for a while now.

Chris


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] How Many Inserts Per Transactions

2003-08-14 Thread scott.marlowe
On Tue, 5 Aug 2003, Trevor Astrope wrote:

 I was wondering if anyone found a sweet spot regarding how many inserts to 
 do in a single transaction to get the best performance? Is there an 
 approximate number where there isn't any more performance to be had or 
 performance may drop off?
 
 It's just a general question...I don't have any specific scenario, other
 than there are multiple backends doing many inserts.

I've found that after 1,000 or so inserts, there's no great increase in 
speed.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Josh Berkus
Greg,

 FWIW, Informix can be run using a cooked (Unix) file for storing data or
 it uses raw disk space and bypasses the ordinary (high level) UNIX
 controllers and does its own reads/writes. About 10 times faster and safer.
 Of course, itmay have taken a lot of programmer time to make that solid.
 But the performance gains are significant.

Yes, but it's still slower than PostgreSQL on medium-end hardware.  ;-)

This idea has been discussed numerous times on the HACKERS list, and is a 
(pretty much) closed issue.   While Oracle and SQL Server use their own 
filesystems, PostgreSQL will not because:

1) It would interfere with our cross-platform compatibility.  PostgreSQL runs 
on something like 20 OSes.

2) The filesystem projects out there are (mostly) well-staffed and are 
constantly advancing using specialized technology and theory.  There's no way 
that the PostgreSQL team can do a better job in our spare time.

3) Development of our own filesystem would then require PostgreSQL to create 
and maintain a whole hardware compatibility library, and troubleshoot 
problems on exotic hardware and wierd RAID configurations.

4) A database FS also often causes side-effect problems; for example, one 
cannot move or copy a SQL Server partition without destroying it.

Of course, that could all change if some corp with deep pockets steps in an 
decides to create a postgresFS and funds and staffs the effort 100%.  But 
it's unlikely to be a priority for the existing development team any time in 
the forseeable future.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] EXTERNAL storage and substring on long strings

2003-08-14 Thread Tom Lane
Scott Cain [EMAIL PROTECTED] writes:
 A few days ago, I asked for advice on speeding up substring queries on
 the GENERAL mailing list.  Joe Conway helpfully pointed out the ALTER
 TABLE STORAGE EXTERNAL documentation.  After doing the alter,
 the queries got slower!  Here is the background:

Ah-hah, I've sussed it ... you didn't actually change the storage
representation.  You wrote:

 Now, I'll change the storage:
   alter table feature alter column residues set storage external;
 To make sure that really happens, I run an update on feature:
   update feature set residues = residues where feature_id8;
 and then VACUUM ANALYZE again.

This sounds good --- in fact, I think we all just accepted it when we
read it --- but in fact *that update didn't decompress the toasted data*.
The tuple toaster sees that the same toasted value is being stored back
into the row, and so it just re-uses the existing toasted data; it does
not stop to notice that the column storage preference has changed.

To actually get the storage to change, you need to feed the value
through some function or operator that will decompress it.  Then it
won't get recompressed when it's stored.  One easy way (since this
is a text column) is

update feature set residues = residues || '' where feature_id8;

To verify that something really happened, try doing VACUUM VERBOSE on
the table before and after.  The quoted number of tuples in the toast
table should rise substantially.

I did the following comparisons on the test data you made available,
using two tables in which one has default storage and one has external
(not compressed) storage:

scott=# \timing
Timing is on.
scott=# select length (dna) from edna;
  length
---
 245203899
(1 row)

Time: 1.05 ms
scott=# select length (dna) from ddna;
  length
---
 245203899
(1 row)

Time: 1.11 ms
scott=# select length(substring(dna from 100 for 2)) from edna;
 length

  2
(1 row)

Time: 30.43 ms
scott=# select length(substring(dna from 100 for 2)) from ddna;
 length

  2
(1 row)

Time: 37383.02 ms
scott=#

So it looks like the external-storage optimization for substring() does
work as expected, once you get the data into the right format ...

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Tom,
 Okay, here's our problem:
 
 live DB: tgroup_id  n_distinct = -1
 
 test DN: tgroup_id  n_distinct = 11
 
 The former estimate actually means that it thinks tgroup_id is a unique
 column, whereas the latter says there are only 11 distinct values in the
 column.  I assume the former is much nearer to the truth (how many rows
 in cases, and how many distinct tgroup_id values)?

 The real case is that there are 113 distinct tgroup_ids, which cover
 about 10% of the population of cases.  The other 90% is NULL.  The
 average tgroup_id is shared between 4.7 cases.

 So this seems like sampling error.

Partly.  The numbers suggest that in ANALYZE's default sample of 3000
rows, it's only finding about a dozen non-null tgroup_ids (yielding the
0.996 null_frac value); and that in one case all dozen are different and
in the other case there are two duplicates.  It would help if you
boosted the stats target for this column by a factor of 10.  (You can
do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
show that you did so.)

But the other part of the problem is that in 7.2, the join selectivity
estimator is way off when you are joining a unique column (like the pkey
on the other side) to a column with a very large fraction of nulls.
We only discovered this recently; it's fixed as of 7.3.3:

2003-04-15 01:18  tgl

* src/backend/utils/adt/selfuncs.c (REL7_3_STABLE): eqjoinsel's
logic for case where MCV lists are not present should account for
NULLs; in hindsight this is obvious since the code for the
MCV-lists case would reduce to this when there are zero entries in
both lists.  Per example from Alec Mitchell.

Possibly you could backpatch that into 7.2, although I'd think an update
to 7.3.4 would be a more profitable use of time.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] about performance of postgreSQL

2003-08-14 Thread xin fu
Dear master:
 I have learned postgreSQL for serveral days, now i meet some problems. when I use aTPCC(Transaction Processing Performance Council) test program to test the performance of postgreSQL , postgreSQL works very slowly, it almost need 1 minute to finish a transaction, and the CPU percent is almost 100%,
the test environment is :
 OS: redhat 9.0(ext3, default configurations)
 Server: postgre7.3.4(default configurations) , PIII 800M, 1G Memory
 Client: tpcc test program,using ODBC API,PIII 800M, 1G Memory
 when using SQLServer, it can work on a workload of 40 Warehouse,
but postgreSQL can not work even on 1 warehouse. I think there must be
some problem with my postgreSQL, can you help me?
I am in china, and my english is very poor, but i hope you can give 
me some advice, thanks.Do You Yahoo!?
 +

Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Partly.  The numbers suggest that in ANALYZE's default sample of 3000
 rows, it's only finding about a dozen non-null tgroup_ids (yielding the
 0.996 null_frac value); and that in one case all dozen are different and
 in the other case there are two duplicates.  It would help if you
 boosted the stats target for this column by a factor of 10.  (You can
 do that in 7.2, btw --- IIRC the only problem is that a pg_dump won't
 show that you did so.)

Also, there doesn't seem to be any way in 7.2 for me to find out what the 
current statistics target for a column is.   What am I missing?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Tue, 12 Aug 2003, Neil Conway wrote:

 On Tue, Aug 12, 2003 at 12:52:46AM -0400, Bruce Momjian wrote:
  I don't use Linux and was just repeating what I had heard from others,
  and read in postings.  I don't have any first-hand experience with ext2
  (except for a laptop I borrowed that wouldn't boot after being shut
  off), but others on this mailing list have said the same thing.
 
 Right, and I understand the need to answer users asking about
 which filesystem to use, but I'd be cautious of bad-mouthing
 another OSS project without any hard evidence to back up our
 claim (of course if we have such evidence, then fine -- I
 just haven't seen it). It would be like $SOME_LARGE_OSS
 project saying Don't use our project with PostgreSQL, as
 [EMAIL PROTECTED] had data corruption with PostgreSQL 6.3 on
 UnixWare -- kind of annoying, right?

Wow, you put my thoughts exactly into words for me, thanks Neil.

   (a) ext3 does metadata-only journalling by default
  
  If that is true, why was I told people have to mount their ext3 file
  systems with metadata-only.  Again, I have no experience myself, but why
  are people telling me this?
 
 Perhaps they were suggesting that people mount ext2 using
 data=writeback, rather than the default of data=ordered.
 
 BTW, I've heard from a couple different people that using
 ext3 with data=journalled (i.e. enabling journalling of both
 data and metadata) actually makes PostgreSQL faster, as
 it means that ext3 can skip PostgreSQL's fsync request
 since ext3's log is flushed to disk already. I haven't
 tested this myself, however.

Now that you mention it, that makes sense.  I might have to test ext3 now 
that the 2.6 kernel is on the way, i.e. the 2.4 kernel should be settling 
down by now.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Bruce Momjian

Uh, the ext2 developers say it isn't 100% reliable --- at least that is
that was told.  I don't know any personally, but I mentioned it while I
was visiting Red Hat, and they didn't refute it.

Now, the failure window might be quite small, but I have seen it happen
myself, and have heard it from others.

---

Reece Hart wrote:
 On Mon, 2003-08-11 at 15:16, Bruce Momjian wrote:
 
  That _would_ work if ext2 was a reliable file system --- it is not.
 
 
 Bruce-
 
 I'd like to know your evidence for this. I'm not refuting it, but I'm a
 7 year linux user (including several clusters, all of which have run
 ext2 or ext3) and keep a fairly close ear to kernel newsgroups,
 announcements, and changelogs. I am aware that there have very
 occasionally been corruption problems, but my understanding is that
 these are fixed (and quickly). In any case, I'd say that your assertion
 is not widely known and I'd appreciate some data or references.
 
 As for PostgreSQL on ext2 and ext3, I recently switched from ext3 to
 ext2 (Stephen Tweedy was insightful to facilitate this backward
 compatibility). I did this because I had a 45M row update on one table
 that was taking inordinate time (killed after 10 hours), even though
 creating the database from backup takes ~4 hours including indexing (see
 pgsql-perform post on 2003/07/22). CPU usage was ~2% on an otherwise
 unloaded, fast, SCSI160 machine. vmstat io suggested that PostgreSQL was
 writing something on the order of 100x as many blocks as being read. My
 untested interpretation was that the update bookkeeping as well as data
 update were all getting journalled, the journal space would fill, get
 sync'd, then repeat. In effect, all blocks were being written TWICE just
 for the journalling, never mind the overhead for PostgreSQL
 transactions. This emphasizes that journals probably work best with
 short burst writes and syncing during lulls rather than sustained
 writes.
 
 I ended up solving the update issue without really updating, so ext2
 timings aren't known. So, you may want to test this yourself if you're
 concerned.
 
 -Reece
 
 
 -- 
 Reece Hart, Ph.D.   [EMAIL PROTECTED], http://www.gene.com/
 Genentech, Inc. 650/225-6133 (voice), -5389 (fax)
 Bioinformatics and Protein Engineering
 1 DNA Way, MS-93http://www.in-machina.com/~reece/
 South San Francisco, CA  94080-4990 [EMAIL PROTECTED], GPG: 0x25EC91A0

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] On Linux Filesystems

2003-08-14 Thread Bruce Momjian

As I remember, there were clear cases that ext2 would fail to recover,
and it was known to be a limitation of the file system implementation. 
Some of the ext2 developers were in the room at Red Hat when I said
that, so if it was incorrect, they would hopefully have spoken up.  I
addressed the comments directly to them.

To be recoverasble, you have to be careful how you sync metadata to
disk.  All the journalling file systems, and the BSD UFS do that.  I am
told ext2 does not.  I don't know much more than that.

As I remember years ago, ext2 was faster than UFS, but it was true
because ext2 didn't guarantee failure recovery.  Now, with UFS soft
updates, the have similar performance characteristics, but UFS is still
crash-safe.

However, I just tried google and couldn't find any documented evidence
that ext2 isn't crash-safe, so maybe I am wrong.

---

Christopher Browne wrote:
 Bruce Momjian commented:
 
  Uh, the ext2 developers say it isn't 100% reliable ... I mentioned
  it while I was visiting Red Hat, and they didn't refute it.
 
 1.  Nobody has gone through any formal proofs, and there are few
 systems _anywhere_ that are 100% reliable.  NASA has occasionally lost
 spacecraft to software bugs, so nobody will be making such rash claims
 about ext2.
 
 2.  Several projects have taken on the task of introducing journalled
 filesystems, most notably ext3 (sponsored by RHAT via Stephen Tweedy)
 and ReiserFS (oft sponsored by SuSE).  (I leave off JFS/XFS since they
 existed long before they had any relationship with Linux.)
 
 Participants in such projects certainly have interest in presenting
 the notion that they provide improved reliability over ext2.
 
 3.  There is no apologist for ext2 that will either (stupidly and
 futilely) claim it to be flawless.  Nor is there substantial interest
 in improving it; the sort people that would be interested in that sort
 of thing are working on the other FSes.
 
 This also means that there's no one interested in going into the
 guaranteed-to-be-unsung effort involved in trying to prove ext2 to be
 formally reliable.
 
 4.  It would be silly to minimize the impact of commercial interest.
 RHAT has been paying for the development of a would-be ext2 successor.
 For them to refute your comments wouldn't be in their interests.
 
 Note that these are warm and fuzzy comments, the whole lot.  The
 80-some thousand lines of code involved in ext2, ext3, reiserfs, and
 jfs are no more amenable to absolute mathematical proof of reliability
 than the corresponding BSD FFS code.
 
 6. Such efforts would be futile, anyways.  Disks are mechanical
 devices, and, as such, suffer from substantial reliability issues
 irrespective of the reliability of the software.  I have lost sleep on
 too many occasions due to failures of:
  a) Disk drives,
  b) Disk controllers [the worst Oracle failure I encountered resulted
 from this], and
  c) OS memory management.
 
 I used ReiserFS back in its bleeding edge days, and find myself a
 lot more worried about losing data to flakey disk controllers.
 
 It frankly seems insulting to focus on ext2 in this way when:
 
  a) There aren't _hard_ conclusions to point to, just soft ones;
 
  b) The reasons for you hearing vaguely negative things about ext2
 are much more likely political than they are technical.
 
 I wish there were more hard and fast conclusions to draw, to be able
 to conclusively say that one or another Linux filesystem was
 unambiguously preferable for use with PostgreSQL.  There are not
 conclusive metrics, either in terms of speed or of some notion of
 reliability.  I'd expect ReiserFS to be the poorest choice, and for
 XFS to be the best, but I only have fuzzy reasons, as opposed to
 metrics.
 
 The absence of measurable metrics of the sort is _NOT_ a proof that
 (say) FreeBSD is conclusively preferable, whatever your own
 preferences (I'll try to avoid characterizing it as prejudices, as
 that would be unkind) may be.  That would represent a quite separate
 debate, and one that doesn't belong here, certainly not on a thread
 where the underlying question was Which Linux FS is preferred?
 
 If the OSDB TPC-like benchmarks can get packaged up well enough to
 easily run and rerun them, there's hope of getting better answers,
 perhaps even including performance metrics for *BSD.  That, not
 Linux-baiting, is the answer...
 -- 
 select 'cbbrowne' || '@' || 'acm.org';
 http://www.ntlug.org/~cbbrowne/sap.html
 (eq? 'truth 'beauty)  ; to avoid unassigned-var error, since compiled code
   ; will pick up previous value to var set!-ed,
   ; the unassigned object.
 -- from BBN-CL's cl-parser.scm
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  

Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Tue, 12 Aug 2003, Christopher Kings-Lynne wrote:

  Well, yeah.  But given the Linux propensity for introducing major
  features in minor releases (and thereby introducing all the
  attendant bugs), I'd think twice about using _any_ Linux feature
  until it's been through a major version (e.g. things introduced in
  2.4.x won't really be stable until 2.6.x) -- and even there one is
  taking a risk[1].
 
 Dudes, seriously - switch to FreeBSD :P

Yeah, it's nice to have a BUG FREE OS huh? ;^)

And yes, I've used FreeBSD, it's quite good, but I kept getting the 
feeling it wasn't quite done.  Especially the installation documentation.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Some vacuum tuning help

2003-08-14 Thread Neil Conway
On Wed, Aug 06, 2003 at 12:45:34AM -0400, Tom Lane wrote:
 For core code, the answer would be a big NYET.  We do not do feature
 additions in point releases, only bug fixes.  While contrib code is more
 under the author's control than the core committee's control, I'd still
 say that you'd be making a big mistake to not follow that basic
 guideline.  People expect release x.y.z+1 to be the same as x.y.z except
 for bug fixes.  Introducing any new bugs into x.y.z+1 would cause a
 large loss in your credibility.

... and since contrib packages are distributed along with PG, it would
also be a loss to PG's credibility. IMHO, core should disallow feature
additions in point releases for contrib modules, as well as the core
code, except for very unusual situations. If contrib authors don't like
this facet of our release engineering process, they can always
distribute their code via some other outlet (gborg, SF, etc.).

-Neil


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Ron Johnson
On Wed, 2003-08-13 at 10:46, Josh Berkus wrote:
 Jeff,
 
[snip]
  The other advantage (which I hinted to above) with raw disks is being able
  to optimize queries to take advantage of it.  Informix is multithreaded
  and it will spawn off multiple readers to do say, a seq scan (and merge
  the results at the end).
 
 I like this idea.  Has it ever been discussed for PostgreSQL?  Hmmm  we'd 
 need to see some tests demonstrating that this approach was still a technical 
 advantage given the improvements in RAID  and FS technology since Informix 
 was designed.

Wouldn't PG 1st need horizontal partitioning, and as a precursor to
that, tablespaces?

-- 
+---+
| Ron Johnson, Jr.Home: [EMAIL PROTECTED]   |
| Jefferson, LA  USA|
|   |
| Man, I'm pretty.  Hoo Hah!  |
|Johnny Bravo   |
+---+



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Richard Welty
On Wed, 13 Aug 2003 09:48:18 +0800 Christopher Kings-Lynne [EMAIL PROTECTED] wrote:
 We've got a little bunch of us tinkering with a tablespace
 implementation.
 However, it's been staller for a while now.

interesting. i'm involved in the very early stages of a startup that is
likely to do a prototype using Java and PostgreSQL.

tablespace and replication are issues that would weigh heavily in a
decision to stick with PostgreSQL after the prototype.

richard
--
Richard Welty [EMAIL PROTECTED]
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Christopher Browne
[EMAIL PROTECTED] (Jeff) writes:
 On Tue, 12 Aug 2003, Christopher Browne wrote:
 Are you _certain_ that's still true?  Have you a metric that shows
 Informix being 10x faster on a modern system?  That would be quite
 surprising...

 We were forced (for budget reason) to switch from raw disk to cooked
 files on our informix db. We took a huge hit - about 5-6x slower.
 Granted part of that was because informix takes number of spindles,
 etc into account when generating query plans and the fact running
 UPDATE STATISTICS (think Vacuum analyze) on the version we run locks
 the table exclusively. And it is unacceptable to have our main
 table unavailable for hours and hours while the update runs. (For
 the record: its a 8cpu sun e4500 running sol2.6.  The raw disks were
 on a hitachi fibre array and the cooked files were on a raid5
 (scsi). Forget how many spindles in the raid.  There were 20 raw
 disks)

Sounds like what you were forced to do was to do TWO things:

 1.  Switch from raw disk to cooked files, and
 2.  Switch from a fibre array to a RAID array

You're attributing the 5-6x slowdown to 1., when it seems likely that
2. is a far more significant multiple.

What with there being TWO big changes that took place that might be
expected to affect performance, it seems odd to attribute a
factor-of-many change to just one aspect of that.

 Informix, etc. have spent a lot of time and money working on it.
 They also have the advantage of having many paid fulltime developers
 who are doing this for a job, not as a weekend hobby (Compared to
 the what? 2-3 full time PG developers).

flame on
Sure, and I'm sure the PG developers hardly know _anything_ about
implementing databases, either.
flame off

Certainly IBM (who bought Informix) has lots of time and money to
devote to enhancements.  But I think you underestimate the time,
skill, and effort involved with PG work.  It's quite typical for
people to imagine free software projects to basically be free-wheeling
efforts mostly involving guys that still have acne that haven't much
knowledge of the area.  Reality, for the projects that are of some
importance, is staggeringly different from that.  The number of people
with graduate degrees tends to surprise everyone.

The developers may not have time to add frivolous things to the
system, like building sophisticated Java-based GUI installers, XML
processors, or such.  That does, however, improve their focus, and so
PostgreSQL does not suffer from the way Oracle has fifty different
bundlings most of which nobody understands.

 The other advantage (which I hinted to above) with raw disks is
 being able to optimize queries to take advantage of it.  Informix is
 multithreaded and it will spawn off multiple readers to do say, a
 seq scan (and merge the results at the end).

 So if you have a table across say, 3 disks and you need to do a seq
 scan it will spawn three readers to do the read. Result: nice and
 fast (Yes, It may not always spawn the three readers, only when it
 thinks it will be a good thing to do)

Andrew Sullivan's fairly regular response is that he tried (albeit not
VASTLY extensively) to distinguish between disks when working with
fibre arrays, and he couldn't measure an improvement in shifting WAL
(the OBVIOUS thing to shift) to separate disks.

There's a lot of guesswork as to precisely why that result falls out.

One of the better guesses seems to be that if you've got enough
battery-backed memory cache on the array, that lets updates get pushed
to cache so fast that it doesn't too much matter which disk they hit.

If you've got enough spindles, and build much of the array in a
striped manner, you'll get data splitting across disks without having
to specify any table options to force it to happen.

You raise a good point vis-a-vis the thought of spawning multiple
readers; that could conceivably be a useful approach to improve
performance for very large queries.  If you could stripe the tables
in some manner so they could be doled out to multiple worker
processes, that could indeed provide some benefits.  If there are
three workers, they might round-robin to grab successive pages from
the table to do their work, and then end with a merge step.

That's probably a 7.7 change, mind you :-), but once other simpler
approaches to making the engine faster have been exhausted, that's the
sort of thing to look into next.

 I think for PG the effort would be much better spent on other
 features...  like replication and whatnot.

At this point, sure.
-- 
let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];;
http://www3.sympatico.ca/cbbrowne/lisp.html
Using Java  as a general purpose application  development language is
like  going big  game  hunting  armed with  Nerf  weapons. 
-- Author Unknown

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Filesystems WAS: Perfomance Tuning

2003-08-14 Thread Gavin Sherry
On Tue, 12 Aug 2003, Bruce Momjian wrote:

 
 I think Gavin Sherry is working on this.  I am CC'ing him.
 
 ---

Yes I am working on this. I am about 50% of the way through the patch but
have been held up with other work. For those who are interested, it
basically allow:

1) creation of different 'storage' locations. Tables and indexes can be
created in different storage locations. Storage locations can also be
assigned to schemas and databases. Tables and indexes will default to the
schema storage location if STORAGE 'store name' is not provided to CREATE
 This will cascade to the default database storage location if
the schema was not created with STORAGE 'store name'.

2) the patch will allow different storage locations to have different
rand_cost parameters passed to the planner.

3) the patch *will not* address issues concerning quotas, resource
management, WAL/clog, temp or sort spaces.

Will keep everyone posted if/when I finish.

Thanks,

Gavin


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Folks,

More followup on this:

The crucial difference between the two execution plans is this clause:

test db has:
-  Seq Scan on case_clients  (cost=0.00..3673.48 rows=11274 width=11) (actual 
time=0.02..302.20 rows=8822 loops=855)

whereas live db has:
-  Index Scan using idx_caseclients_case on case_clients  (cost=0.00..5.10 
rows=1 width=11) (actual time=0.03..0.04 rows=1 loops=471)

using an enable_seqscan = false fixes this, but is obviously not a long-term 
solution.   

I've re-created the test system from an immediate copy of the live database, 
and checked that the the main tables and indexes were reproduced faithfully.

Lowering random_page_cost seems to do the trick.  But I'm still mystified; why 
would one identical database pick a different plan than its copy?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Jacek Rembisz
On Mon, Aug 11, 2003 at 03:58:41PM +0200, Stef wrote:

 I have attached an sql statement that normally runs under 3 minutes.
 That is, until I vacuum analyze the database (or just the tables in the query),
 then the same query runs longer than 12 hours, and I have to kill it.

Hmm, I have noticed similar problem with a query with order by ... limit 
clause.Although it runs only 10 times slower after analyze :)

The query joins one big table (20 000 rows) with several small tables
(200-4000 rows) than order by primary key of big table limit 20

Without this order by ... limit clause the query is 5 times faster after
analyze.

Looking into explain analyze outputs:
1. Before vacuum analyze  a planer chooses nested loop, the  deepest is:
  -  Nested Loop  (cost=0.00..116866.54 rows=19286 width=96) (actual time=0.14..1.39 
rows=21 loops=1)
  -  Index Scan Backward using big_table_pkey on big_table k  (cost=0.00..1461.15 
rows=19286 width=52) (actual time=0.07..0.47 rows=21 loops=1)
  -  Index Scan using 4000rows_table_pkey on 4000rows_table zs  (cost=0.00..5.97 
rows=1 width=44) (actual time=0.02..0.02 rows=0 loops=21)

2. After analyze uses hashjoins

When I remove this order by limit clause the query after analyze takes 
the same time and the query before analyze is much more slower.

I won't blame the planer. How he could learn that he should first 
take those 20 rows and than perform joins? There is a where clause
with complex exists(subquery) condition regarding one of big_table fields,
but removing this condition does not change the query plan.

Pure joining without any additional conditions and only primary key of big 
table in select clause runs 4 times slower then whole query before 
vacuuum analyze :)

Does in all the planer take in the consideration the limit clause?

Probably I'm missing something. I don't know much about the planer.

Finaly I have redesigned the query.

Regards,
Jacek


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Odd problem with performance in duplicate database

2003-08-14 Thread Josh Berkus
Tom,

 Let's see the pg_stats rows for case_clients in both databases.  The
 entries for trial_groups might be relevant too.

My reading is that the case is borderline; that is, becuase the correlation 
is about 10-20% higher on the test database (since it was restored clean 
from backup) the planner is resorting to a seq scan.

At which point the spectre of random_page_cost less than 1.0 rears its ugly 
head again.  Because the planner seems to regard this as a borderline case, 
but it's far from borderline ... index scan takes 260ms, seq scan takes 
244,000ms.   Yet my random_page_cost is set pretty low already, at 1.5.

It seems like I'd have to set random_page_cost to less than 1.0 to make sure 
that the planner never used a seq scan.  Which kinda defies the meaning of 
the setting.

*sigh* wish the client would pay for an upgrade 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread scott.marlowe
On Mon, 11 Aug 2003, Bruce Momjian wrote:

 scott.marlowe wrote:
  On Fri, 8 Aug 2003, Andrew Sullivan wrote:
  
   On Fri, Aug 08, 2003 at 09:40:20AM -0700, Jonathan Gardner wrote:

Redhat puts ext3 on by default. Consider switching to a non-journaling FS 
(ext2?) with the partition that holds your data and WAL.
   
   I would give you exactly the opposite advice: _never_ use a
   non-journalling fs for your data and WAL.  I suppose if you can
   afford to lose some transactions, you can do without journalling. 
   Otherwise, you're just borrowing trouble, near as I can tell.
  
  I'd argue that a reliable filesystem (ext2) is still better than a 
  questionable journaling filesystem (ext3 on kernels 2.4.20).
  
  This isn't saying to not use jounraling, but I would definitely test it 
  under load first to make sure it's not gonna lose data or get corrupted.
 
 That _would_ work if ext2 was a reliable file system --- it is not.
 
 This is the problem of Linux file systems --- they have unreliable, and
 journalled, with nothing in between, except using a journalling file
 system and having it only journal metadata.

Never the less, on LINUX, which is what we use, it is by far more reliable 
than ext3 or reiserfs.  In four years of use I've lost zero files to any 
of its bugs.  Of course, maybe that's RedHat patching the kernel for me or 
something. :-)  they seem to hire some pretty good hackers.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Neil Conway
On Mon, Aug 11, 2003 at 06:59:30PM -0400, Bruce Momjian wrote:
 Uh, the ext2 developers say it isn't 100% reliable --- at least that is
 that was told.  I don't know any personally, but I mentioned it while I
 was visiting Red Hat, and they didn't refute it.

IMHO, if we're going to say don't use X on production PostgreSQL
systems, we need to have some better evidene than no one has
said anything to the contrary, and I heard X is bad. If we can't
produce such evidence, we shouldn't say anything at all, and users
can decide what to use for themselves.

(Not that I'm agreeing or disagreeing about ext2 in particular...)

  My
  untested interpretation was that the update bookkeeping as well as data
  update were all getting journalled, the journal space would fill, get
  sync'd, then repeat. In effect, all blocks were being written TWICE just
  for the journalling, never mind the overhead for PostgreSQL
  transactions.

Journalling may or may not have been the culprit, but I doubt everything
was being written to disk twice:

(a) ext3 does metadata-only journalling by default

(b) PostgreSQL only fsyncs WAL records to disk, not the data itself

-Neil


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Analyze makes queries slow...

2003-08-14 Thread Tom Lane
Stef [EMAIL PROTECTED] writes:
 = Could we see the results of EXPLAIN ANALYZE, rather than just EXPLAIN,
 = for the un-analyzed case? 

 Attached the output of this.

Hmm... not immediately obvious where it's going wrong.  Could you try
this (after ANALYZE):

set enable_mergejoin to off;
explain analyze   ... query ...

If it finishes in a reasonable amount of time, send the explain output.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread matt

 Mem: 71M Active, 23M Inact, 72M Wired, 436K Cache, 48M Buf, 208M Free
 Swap: 368M Total, 2852K Used, 366M Free
 
   It's right that I can figure that I can use 384M (total RAM) - 72M 
 (wired) - 48M (buf) = 264M for PostgreSQL.
   Hence, if I set effective_cache_size to 24M (3072 8K blocks), 
 reasonable value (less than 240M, say 48M) for sort_mem, some value for 
 shared_buffers (i.e. 24M, or 6144 4K blocks (FreeBSD), or 3072 8K blocks 
 (PostgreSQL)), and rest of RAM 264M (total free with OS cache) - 24M 
 (reserved for OS cache) - 48M (sort) - 24M (shared) = 168M PostgreSQL 
 allocate dynamically by himself?

Totally, utterly the wrong way around.

Start with 384M, subtract whatever is in use by other processes,
excepting kernel disk cache, subtract your PG shared buffers, subtract
(PG proc size + PG sort mem)*(max number of PG processes you need to run
- should be same as max_connections if thinking conservatively), leave
some spare room so you can ssh in without swapping, and *the remainder*
is what you should set effective_cache_size to.  This is all in the
docs.

The key thing is:  set effective_cache_size *last*.  Note that Postgres
assumes your OS is effective at caching disk blocks, so if that
assumption is wrong you lose performance.

Also, why on _earth_ would you need 48MB for sort memory?  Are you
seriously going to run a query that returns 48M of data and then sort
it, on a machine with 384M of RAM?

M




---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Perfomance Tuning

2003-08-14 Thread Paul Thomas
On 08/08/2003 11:28 mixo wrote:
I have just installed redhat linux 9 which ships with Pg
7.3.2. Pg has to be setup so that data inserts (blobs) should
be able to handle at least 8M at a time. The machine has
two P III 933MHz CPU's, 1.128G RAM (512M*2 + 128M), and
a 36 Gig hd with 1 Gig swap and 3 equal size ext3 partitions.
What would be the recomended setup for good performance
considering that the db will have about 15 users for
9 hours in a day, and about 10 or so users throughout the day
who wont be conistenly using the db.


It doesn't sound like a particlarly heavy loading to me. I'd start off 
with something like

shared_buffers = 2000
sort_mem = 1024
max_coonections = 100
and see how it performs under normal business loading.

--
Paul Thomas
+--+-+
| Thomas Micro Systems Limited | Software Solutions for the Smaller 
Business |
| Computer Consultants | 
http://www.thomas-micro-systems-ltd.co.uk   |
+--+-+

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton
On Thursday 07 August 2003 09:24, Yaroslav Mazurak wrote:
  IIRC there is a limit on filesystem cache on freeBSD. 300MB by default.
  If that is the case, you might have to raise it to make
  effective_cache_size really effective..

   Try various sysctls says nothing for me. I want use *all available
 RAM* (of course, without needed for OS use) for PostgreSQL.

PG will be using the OS' disk caching.

   While idle time top says:

 Mem: 14M Active, 1944K Inact, 28M Wired, 436K Cache, 48M Buf, 331M Free
 Swap: 368M Total, 17M Used, 352M Free, 4% Inuse

   After 1 minute of EXPLAIN ANALYZE SELECT SUM(showcalc('B00204', dd,
 r020, t071)) FROM v_file02wide WHERE a011 = 3 AND inrepdate(data) AND
 b030 IN (SELECT b030 FROM dov_bank WHERE dov_bank_box_22(box) IN ('NL',
 'NM')) AND r030 = 980; executing:

 Mem: 64M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 221M Free
 Swap: 368M Total, 3192K Used, 365M Free

PID USERNAMEPRI NICE  SIZERES STATETIME   WCPUCPU
 COMMAND 59063 postgres 49   0 65560K 55492K RUN  1:06 94.93% 94.63%
 postgres

   After 12 minutes of query executing:

 Mem: 71M Active, 17M Inact, 72M Wired, 436K Cache, 48M Buf, 214M Free
 Swap: 368M Total, 3192K Used, 365M Free

PID USERNAMEPRI NICE  SIZERES STATETIME   WCPUCPU
 COMMAND 59063 postgres 56   0 73752K 62996K RUN 12:01 99.02% 99.02%
 postgres

   I suspect that swap-file size is too small for my query... but query
 isn't too large, about 8K rows only. :-|

Looks fine - PG isn't growing too large and your swap usage seems steady. We 
can try upping the sort memory later, but given the amount of data you're 
dealing with I'd guess 64MB should be fine.

I think we're going to have to break the query down a little and see where the 
issue is.

What's the situation with:
EXPLAIN ANALYZE SELECT some_field FROM v_file02wide WHERE a011 = 3 AND 
inrepdate(data) AND b030 IN (SELECT b030 FROM dov_bank WHERE 
dov_bank_box_22(box) IN ('NL', 'NM')) AND r030 = 980;

and:
EXPLAIN ANALYZE SELECT SUM(showcalc(parameters)) FROM something simple

Hopefully one of these will run in a reasonable time, and the other will not. 
Then we can examine the slow query in more detail. Nothing from your previous 
EXPLAIN (email of yesterday 13:42) looks unreasonable but something must be 
going wild in the heart of the query, otherwise you wouldn't be here.

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Tom Lane
Yaroslav Mazurak [EMAIL PROTECTED] writes:
 fsync = false

 I'd turn fsync back on - unless you don't mind losing your data after a crash.

   This is temporary performance solution - I want get SELECT query result
 first, but current performance is too low.

Disabling fsync will not help SELECT performance one bit.  It would only
affect transactions that modify the database.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] PostgreSQL performance problem - tuning

2003-08-14 Thread Richard Huxton
On Wednesday 06 August 2003 08:34, Yaroslav Mazurak wrote:
   Hi All!


   I have installed PostgreSQL 7.3.2 on FreeBSD 4.7, running on PC with
 CPU Pentium II 400MHz and 384Mb RAM.

Version 7.3.4 is just out - probably worth upgrading as soon as it's 
convenient.

   Problem is that SQL statement (see below) is running too long. With
 current WHERE clause 'SUBSTR(2, 2) IN ('NL', 'NM') return 25 records.
 With 1 record, SELECT time is about 50 minutes and takes approx. 120Mb
 RAM. With 25 records SELECT takes about 600Mb of memory and ends after
 about 10 hours with error: Memory exhausted in AllocSetAlloc(32).
[snip]

   Current postgresql.conf settings (some) are:

 === Cut ===
 max_connections = 8

 shared_buffers = 8192
 max_fsm_relations = 256
 max_fsm_pages = 65536
 max_locks_per_transaction = 16
 wal_buffers = 256

 sort_mem = 131072
This sort_mem value is *very* large - that's 131MB for *each sort* that gets 
done. I'd suggest trying something in the range 1,000-10,000. What's probably 
happening with the error above is that PG is allocating ridiculous amounts of 
memory, the machines going into swap and everything eventually grinds to a 
halt.

 vacuum_mem = 16384

 checkpoint_segments = 4
 checkpoint_timeout = 300
 commit_delay = 32000
 commit_siblings = 4
 fsync = false

I'd turn fsync back on - unless you don't mind losing your data after a crash.

 enable_seqscan = false

Don't tinker with these in a live system, they're only really for 
testing/debugging.

 effective_cache_size = 65536

So you typically get about 256MB cache usage in top/free?

 === Cut ===

   SELECT statement is:

 SELECTshowcalc('B00204', dd, r020, t071) AS s04
 FROM  v_file02wide
 WHERE a011 = 3
   AND inrepdate(data)
   AND SUBSTR(ncks, 2, 2) IN ('NL', 'NM')
   AND r030 = 980;

Hmm - mostly views and function calls, OK - I'll read on.

 (cost=174200202474.99..174200202474.99 rows=1 width=143) -  Hash Join 
 ^^^
This is a BIG cost estimate and you've got lots more like them. I'm guessing 
it's because of the sort_mem / enable_seqscan settings you have. The numbers 
don't make sense to me - it sounds like you've pushed the cost estimator into 
a very strange corner.

   Function showcalc definition is:

 CREATE OR REPLACE FUNCTION showcalc(VARCHAR(10), VARCHAR(2), VARCHAR(4),
 NUMERIC(16)) RETURNS NUMERIC(16)
 LANGUAGE SQL AS '
 -- Parameters: code, dd, r020, t071
   SELECT COALESCE(
   (SELECT sc.koef * $4
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod LIKE $1
   AND NOT SUBSTR(acc_mask, 1, 1) LIKE ''[''
   AND SUBSTR(acc_mask, 1, 4) LIKE $3
   AND SUBSTR(acc_mask, 5, 1) LIKE SUBSTR($2, 1, 1)),
Obviously, you could use = for these 3 rather than LIKE^^^
Same below too.

   (SELECT SUM(sc.koef * COALESCE(showcalc(SUBSTR(acc_mask, 2,
 LENGTH(acc_mask) - 2), $2, $3, $4), 0))
   FROM showing AS s NATURAL JOIN showcomp AS sc
   WHERE s.kod LIKE $1
   AND SUBSTR(acc_mask, 1, 1) LIKE ''[''),
   0) AS showing;
 ';

   View v_file02wide is:

 CREATE VIEW v_file02wide AS
 SELECT id_a011 AS a011, data, obl.ko, obl.nazva AS oblast, b030,
 banx.box AS ncks, banx.nazva AS bank,
   epr.dd, r020, r030, a3, r030.nazva AS valuta, k041,
   -- Sum equivalent in national currency
   t071 * get_kurs(id_r030, data) AS t070,
   t071
 FROM v_file02 AS vf02
   JOIN kod_obl AS obl USING(id_obl)
   JOIN (dov_bank NATURAL JOIN dov_tvbv) AS banx
   ON banx.id_v = vf02.id_v
   AND data BETWEEN COALESCE(banx.dataa, data)
   AND COALESCE(banx.datab, data)
   JOIN ek_pok_r AS epr USING(id_dd)
   JOIN kl_r020 USING(id_r020)
   JOIN kl_r030 AS r030 USING(id_r030)
   JOIN kl_k041 USING(id_k041);
You might want to rewrite the view so it doesn't use explicit JOIN statements, 
i.e FROM a,b WHERE a.id=b.ref rather than FROM a JOIN b ON id=ref
At the moment, this will force PG into making the joins in the order you write 
them (I think this is changed in v7.4)


   Function inrepdate is:

 CREATE OR REPLACE FUNCTION inrepdate(DATE) RETURNS BOOL
 LANGUAGE SQL AS '
   -- Returns true if given date is in repdate
   SELECT (SELECT COUNT(*) FROM repdate
   WHERE $1 BETWEEN COALESCE(data1, CURRENT_DATE)
   AND COALESCE(data2, CURRENT_DATE))

0;

You can probably replace this with:
  SELECT true FROM repdate WHERE $1 ...
You'll need to look at where it's used though.

[snip table sizes]
   Table has indexes almost for all selected fields.

That's not going to help you for the SUBSTR(...) stuff, although you could use 
functional indexes (see manuals/list archives for details).

First thing is 

[PERFORM] partitioning for postgresql

2003-08-14 Thread Wilson A. Galafassi Jr.



hello!!!
what is suggested partitioning schema for 
postgresql??
the size of my db is 5BG and i have 36GB scsi 
disk!
thanks
wilson


Re: [PERFORM] Odd performance results - more info

2003-08-14 Thread Stephan Szabo

On Tue, 5 Aug 2003, Medora Schauer wrote:

 I hope this piques someones curiosity.  I'd really like to know
 what is going on here...

I think you're getting caught by the typing of constants preventing
index scans.

  UPDATE shot_record SET trace_count = %d  \
  WHERE shot_line_num = %d  \
AND shotpoint = %d  \
AND index = %d ,
  0, shotline, shotpoint + i, 0);

I believe that the int constants are going to generally be treated as
int4.  If you're comparing them to an int8 you're not going to get
an index scan probably.  Try explicitly casting the constants to
the appropriate type: CAST(%d AS int8).


 snprintf(buffer, sizeof(buffer),
  UPDATE shot_record SET trace_count = %d  \
  WHERE shot_line_num = %f  \
AND shotpoint = %f  \
AND index = %d ,
  0, (float)shotline, (float)shotpoint + (float)i, 0);

Same general issue here, I think the floats are going to get treated
as float8 in 7.1, so you'll probably need an explicit cast.

As Joe said, try explain on the queries for more details.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-14 Thread Joe Conway
Scott Cain wrote:
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is
109M.
Thanks. I'll grab a copy from home later today and see if I can find 
some time to poke at it.

Joe



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]