Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large Performance Gain in WAL synching

2002-10-07 Thread Zeugswetter Andreas SB SD


  Keep in mind that we support platforms without O_DSYNC.  I am not
  sure whether there are any that don't have O_SYNC either, but I am
  fairly sure that we measured O_SYNC to be slower than fsync()s on
  some platforms.

This measurement is quite understandable, since the current software 
does 8k writes, and the OS only has a chance to write bigger blocks in the
write+fsync case. In the O_SYNC case you need to group bigger blocks yourself.
(bigger blocks are essential for max IO)

I am still convinced, that writing bigger blocks would allow the fastest
solution. But reading the recent posts the solution might only be to change
the current loop foreach dirty 8k WAL buffer write 8k to one or two large 
write calls.  

Andreas

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] [GENERAL] Large databases, performance

2002-10-07 Thread Shridhar Daithankar

On 3 Oct 2002 at 8:54, Charles H. Woloszynski wrote:

 I'd be curious what happens when you submit more queries than you have 
 processors (you had four concurrent queries and four CPUs), if you care 
 to run any additional tests.  Also, I'd report the query time in 
 absolute (like you did) and also in 'Time/number of concurrent queries. 
  This will give you a sense of how the system is scaling as the workload 
 increases.  Personally I am more concerned about this aspect than the 
 load time, since I am going to guess that this is where all the time is 
 spent.  

OK. I am back from my cave after some more tests are done. Here are the 
results. I am not repeating large part of it but answering your questions..

Don't ask me how these numbers changed. I am not the person who conducts the 
test neither I have access to the system. Rest(or most ) of the things remains 
same..

MySQL 3.23.52 with innodb transaction support: 

4 concurrent queries:-  257.36 ms
40 concurrent queries   :-  35.12 ms

Postgresql 7.2.2 

4 concurrent queries:- 257.43 ms
40 concurrent   queries :- 41.16 ms

Though I can not report oracle numbers, suffice to say that they fall in 
between these two numbers.

Oracle seems to be hell lot faster than mysql/postgresql to load raw data even 
when it's installed on reiserfs. We plan to run XFS tests later in hope that 
that would improve mysql/postgresql load times. 

In this run postgresql has better load time than mysql/innodb ( 18270 sec v/s 
17031 sec.) Index creation times are faster as well (100 sec v/s 130 sec). 
Don't know what parameters are changed.

Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All 
numbers include indexes. This is really going to be a problem when things are 
deployed. Any idea how can it be taken down? 

WAL is out, it's not counted.

Schema optimisation is later issue. Right now all three databases are using 
same schema..

Will it help in this situation if I recompile posgresql with block size say 32K 
rather than 8K default? Will it saev some overhead and offer better performance 
in data load etc?

Will keep you guys updated..

Regards,
 Shridhar

---
Shridhar Daithankar
LIMS CPE Team Member, PSPL.
mailto:[EMAIL PROTECTED]
Phone:- +91-20-5678900 Extn.270
Fax  :- +91-20-5678901 
---


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing

Tom Lane kirjutas E, 07.10.2002 kell 01:07:
 
 To test this, I made a modified version of pgbench in which each
 transaction consists of a simple
   insert into table_NNN values(0);
 where each client thread has a separate insertion target table.
 This is about the simplest transaction I could think of that would
 generate a WAL record each time.
 
 Running this modified pgbench with postmaster parameters
   postmaster -i -N 120 -B 1000 --wal_buffers=250
 and all other configuration settings at default, CVS tip code gives me
 a pretty consistent 115-118 transactions per second for anywhere from
 1 to 100 pgbench client threads.  This is exactly what I expected,
 since the database (including WAL file) is on a 7200 RPM SCSI drive.
 The theoretical maximum rate of sync'd writes to the WAL file is
 therefore 120 per second (one per disk revolution), but we lose a little
 because once in awhile the disk has to seek to a data file.
 
 Inserting the above patch, and keeping all else the same, I get:
 
 $ mybench -c 1 -t 1 bench1
 number of clients: 1
 number of transactions per client: 1
 number of transactions actually processed: 1/1
 tps = 116.694205 (including connections establishing)
 tps = 116.722648 (excluding connections establishing)
 
 $ mybench -c 5 -t 2000 -S -n bench1
 number of clients: 5
 number of transactions per client: 2000
 number of transactions actually processed: 1/1
 tps = 282.808341 (including connections establishing)
 tps = 283.656898 (excluding connections establishing)

in an ideal world this would be 5*120=600 tps. 

Have you any good any ideas what holds it back for the other 300 tps ? 

If it has CPU utilisation of only 50% then there must be still some
moderate lock contention. 

btw, what is the number for 1-5-10 clients with fsync off ? 

 $ mybench -c 10 -t 1000 bench1
 number of clients: 10
 number of transactions per client: 1000
 number of transactions actually processed: 1/1
 tps = 443.131083 (including connections establishing)
 tps = 447.406534 (excluding connections establishing)
 
 CPU loading goes from 80% idle at 1 client to 50% idle at 5 clients
 to 10% idle at 10 or more.
 
 So this does seem to be a nice win, and unless I hear objections
 I will apply it ...

3x speedup is not just nice, it's great ;)

--
Hannu




---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] cross-posts (was Re: [GENERAL] Large databases,

2002-10-07 Thread Larry Rosenman

On Sun, 2002-10-06 at 22:20, Tom Lane wrote:
 Curt Sampson [EMAIL PROTECTED] writes:
  ... Avoiding cross-posting would be nice, since I am getting lots of
  duplicate messages these days.
 
 Cross-posting is a fact of life, and in fact encouraged, on the pg
 lists.  I suggest adapting.  Try sending
   set all unique your-email-address
 to the PG majordomo server; this sets you up to get only one copy
 of each cross-posted message.
That doesn't seem to work any more:

 set all unique [EMAIL PROTECTED]
 The all mailing list is not supported at
 PostgreSQL User Support Lists.

What do I need to send now? 

Marc? 


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] cross-posts (was Re: [GENERAL] Large databases,

2002-10-07 Thread Michael Paesold

 On Sun, 2002-10-06 at 22:20, Tom Lane wrote:
  Curt Sampson [EMAIL PROTECTED] writes:
   ... Avoiding cross-posting would be nice, since I am getting lots of
   duplicate messages these days.
 
  Cross-posting is a fact of life, and in fact encouraged, on the pg
  lists.  I suggest adapting.  Try sending
  set all unique your-email-address
  to the PG majordomo server; this sets you up to get only one copy
  of each cross-posted message.
 That doesn't seem to work any more:

  set all unique [EMAIL PROTECTED]
  The all mailing list is not supported at
  PostgreSQL User Support Lists.

 What do I need to send now?

 Marc?

it is:
set ALL unique your-email

if you also don't want to get emails that have already been cc'd to you, you
can use:

set ALL eliminatecc your-email

for a full list of set options send:

help set

to majordomo.

Regards,
Michael Paesold



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] cross-posts (was Re: [GENERAL] Large databases,

2002-10-07 Thread Larry Rosenman

On Mon, 2002-10-07 at 07:01, Michael Paesold wrote:
  On Sun, 2002-10-06 at 22:20, Tom Lane wrote:
   Curt Sampson [EMAIL PROTECTED] writes:
... Avoiding cross-posting would be nice, since I am getting lots of
duplicate messages these days.
  
   Cross-posting is a fact of life, and in fact encouraged, on the pg
   lists.  I suggest adapting.  Try sending
   set all unique your-email-address
   to the PG majordomo server; this sets you up to get only one copy
   of each cross-posted message.
  That doesn't seem to work any more:
 
   set all unique [EMAIL PROTECTED]
   The all mailing list is not supported at
   PostgreSQL User Support Lists.
 
  What do I need to send now?
 
  Marc?
 
 it is:
 set ALL unique your-email
 
 if you also don't want to get emails that have already been cc'd to you, you
 can use:
 
 set ALL eliminatecc your-email
 
 for a full list of set options send:
 
 help set
 
 to majordomo.
Thanks.  That worked great.  (I use Mailman, and didn't realize the ALL
needed to be capitalized. 

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 in an ideal world this would be 5*120=600 tps. 
 Have you any good any ideas what holds it back for the other 300 tps ?

Well, recall that the CPU usage was about 20% in the single-client test.
(The reason I needed a variant version of pgbench is that this machine
is too slow to do more than 120 TPC-B transactions per second anyway.)

That says that the best possible throughput on this test scenario is 5
transactions per disk rotation --- the CPU is just not capable of doing
more.  I am actually getting about 4 xact/rotation for 10 or more
clients (in fact it seems to reach that plateau at 8 clients, and be
close to it at 7).  I'm inclined to think that the fact that it's 4 not
5 is just a matter of not quite there --- there's some additional CPU
overhead due to lock contention, etc, and any slowdown at all will cause
it to miss making 5.  The 20% CPU figure was approximate to begin with,
anyway.

The other interesting question is why we're not able to saturate the
machine with only 4 or 5 clients.  I think pgbench itself is probably
to blame for that: it can't keep all its backend threads constantly
busy ... especially not when several of them report back transaction
completion at essentially the same instant, as will happen under
ganged-commit conditions.  There will be intervals where multiple
backends are waiting for pgbench to send a new command.  That delay
in starting a new command cycle is probably enough for them to miss the
bus of getting included in the next commit write.

That's just a guess though; I don't have tools that would let me see
exactly what's happening.  Anyone else want to reproduce the test on
a different system and see what it does?

 If it has CPU utilisation of only 50% then there must be still some
 moderate lock contention. 

No, that's I/O wait I think, forced by the quantization of the number
of transactions that get committed per rotation.

 btw, what is the number for 1-5-10 clients with fsync off ? 

About 640 tps at 1 and 5, trailing off to 615 at 10, and down to 450
at 100 clients (now that must be lock contention...)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Threaded Sorting

2002-10-07 Thread Hans-Jürgen Schönig

Bingo = great :).
The I/O problem seems to be solved :).

A table space concept would be top of the histlist :).

The symlink version is not very comfortable and I think it would be a 
real hack.
Also: If we had a clean table space concept it would be real advantage.
In the first place it would be enough to define a directory (alter 
tablespace, changing sizes etc. could be a lot of work).

How could CREATE TABLESPACE look like?
Personally I like the Oracle Syntax.

Is it already time to work on the parser for CREATE/ALTER/DROP TABLESPACE?

Hans



Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:
  

Bingo!  Want to increase sorting performance, give it more I/O
bandwidth, and it will take 1/100th of the time to do threading.



  

Added to TODO:
  * Allow sorting to use multiple work directories



Yeah, I like that.  Actually it should apply to all temp files not only
sorting.

A crude hack would be to allow there to be multiple pg_temp_NNN/
subdirectories (read symlinks) in a database, and then the code would
automatically switch among these.

Probably a cleaner idea would be to somehow integrate this with
tablespace management --- if you could mark some tablespaces as intended
for temp stuff, the system could round-robin among those as it creates
temp files and/or temp tables.

   regards, tom lane
  



-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Threaded Sorting

2002-10-07 Thread Hans-Jürgen Schönig

Greg Copeland wrote:

I wouldn't hold your breath for any form of threading.  Since PostgreSQL
is process based, you might consider having a pool of sort processes
which address this but I doubt you'll get anywhere talking about threads
here.

Greg

  


I came across the problem yesterday. We thought about SMP and did some 
tests on huge tables. The postmaster was running full speed to get the 
stuff sorted - even on an IDE system.
I asked my friends who are doing a lot of work with Oracle on huge SMP 
machines. I was told that Oracle has a mechanism which can run efficient 
sorts on SMP machines. It seems to speed up sorting a lot.

If we could reduce the time needed to build up an index by 25% it would 
be a wonderful thing. Just think of a scenario:
1 thread: 24 hours
many threads: 18 hours

We  could gain 6 hours which is a LOT.
We have many people running PostgreSQL on systems having wonderful IO 
systems - in this case IO is not the bottleneck anymore.

I had a brief look at the code used for sorting. It is very well 
documented so maybe it is worth thinking about a parallel algorithm.

When talking about threads: A pool of processes for sorting? Maybe this 
could be useful but I doubt if it the best solution to avoid overhead.
Somewhere in the TODO it says that there will be experiments with a 
threaded backend. This make me think that threads are not a big no no.

Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Threaded Sorting

2002-10-07 Thread Hans-Jürgen Schönig

Threads are not the best solutions when it comes to portability. A 
prefer a process model as well.
My concern was that a process model might be a bit too slow for that but 
if we had processes in memory this would be wonderful thing.
Using it for small amounts of data is pretty useless - I totally agree 
but when it comes to huge amounts of data it can be useful.

It is a mechanism for huge installations with a lot of data.

Hans



That was a fork IIRC.  Threading is not used in baseline PostgreSQL nor
is there any such plans that I'm aware of.  People from time to time ask
about threads for this or that and are always told what I'm telling
you.  The use of threads leads to portability issues not to mention
PostgreSQL is entirely built around the process model.

Tom is right to dismiss the notion of adding additional CPUs to
something that is already I/O bound, however, the concept it self should
not be dismissed.  Applying multiple CPUs to a sort operation is well
accepted and understood technology.

At this point, perhaps Tom or one of the other core developers having
insight in this area would be willing to address how readily such a
mechanism could could be put in place.

Also, don't be so fast to dismiss what the process model can do.  There
is not reason to believe that having a process pool would not be able to
perform wonderful things if implemented properly.  Basically, the notion
would be that the backend processing the query would solicit assistance
from the sort pool if one or more processes were available.  At that
point, several methods could be employed to divide the work.  Some form
of threshold would also have to be created to prevent the pool from
being used when a single backend is capable of addressing the need. 
Basically the idea is, you only have the pool assist with large tuple
counts and then, only when resources are available and resource are
available from within the pool.  By doing this, you avoid additional
overhead for small sort efforts and gain when it matters the most.


Regards,

   Greg

  



-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] Moving to PostGres

2002-10-07 Thread Benjamin Stewart

Hello,
I am looking at moving our company away from MS SQL. Have been looking at
DB2 and it looks to have some good features. Now wondering if POSTGRESQL
could be a viable alternative. I have a few questions though;
1. What is the postgresql equiv to Stored procedures and can they be written
in another langauage such s JAVA?
2. How well is JAva supported for developing DB applications using PG?
3. What are the limitations to PG compared to DB2, Oracle, Sybase ?

Thanks
Ben



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Moving to PostGres

2002-10-07 Thread

In article XU5o9.11017$[EMAIL PROTECTED],
Benjamin Stewart [EMAIL PROTECTED] wrote:

I am looking at moving our company away from MS SQL.


Here's a good place to start:

  http://techdocs.postgresql.org/redir.php?link=/techdocs/sqlserver2pgsql.php

--
http://www.spinics.net/linux/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Bad rules

2002-10-07 Thread Steve King

Thankyou very much for your enlightened comment, it worked a treat.

I do not seem to be able to find references to this kind of useful
information in the postgresql online manual or in books such as bruce
momjian's 'postgresql-introduction and concepts'. Where is this info to be
found other than the mailing list?

Thanks again.
Regards
Steve


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]]
Sent: 04 October 2002 15:48
To: Steve King
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Bad rules


Steve King [EMAIL PROTECTED] writes:
 I am using postgres 7.2, and have rule on a table which causes a notify if
 an insert/update/delete is performed on the table.
 The table is very very small.
 When performing a simple (very simple) update on the table this takes
about
 3 secs, when I remove the rule it is virtually instantaneous.
 The rest of the database seems to perform fine, have you any ideas or come
 across this before??

Let's see the rule exactly?  NOTIFY per se is not slow in my experience.

(One thing to ask: have you done a VACUUM FULL on pg_listener in recent
memory?  Heavy use of LISTEN/NOTIFY does tend to bloat that table if you
don't keep after it with VACUUM.)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Implicit Lock Row

2002-10-07 Thread Antoine Lobato




I currently develop an interface to simulate a indexed 
sequential file management with PostgreSql. I must reproduce the same philosophy 
used of control of locking of the records.
I seek a solution to lock and unlock implicitly a row of a 
table. The locking of several rows, of the same table or various tables, can 
last a long time and consequently locking cannot be included in a transaction 
for not to lock the whole table for the other users.
There is a viable solution with PostgreSql?
There is an accessible basic structure of locking?
Thank you.



Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-07 Thread Doug McNaught

Tom Lane [EMAIL PROTECTED] writes:

 Doug McNaught [EMAIL PROTECTED] writes:

  In my understanding, it means all currently dirty blocks in the file
  cache are queued to the disk driver.  The queued writes will
  eventually complete, but not necessarily before sync() returns.  I
  don't think subsequent write()s will block, unless the system is low
  on buffers and has to wait until dirty blocks are freed by the driver.
 
 We don't need later write()s to block.  We only need them to not hit
 disk before the sync-queued writes hit disk.  So I guess the question
 boils down to what queued to the disk driver means --- has the order
 of writes been determined at that point?

It's certainy possible that new write(s) get put into the queue
alongside old ones--I think the Linux block layer tries to do this
when it can, for one.  According to the manpage, Linux used to wait
until everything was written to return from sync(), though I don't
*think* it does anymore.  But that's not mandated by the specs.

So I don't think we can rely on such behavior (not reordering writes
across a sync()), though it will probably happen in practice a lot of
the time.  AFAIK there isn't anything better than sync() + sleep() as
far as the specs go.  Yes, it kinda sucks.  ;)

-Doug

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Case insensitive columns

2002-10-07 Thread Christoph Strebin

I have a problem similar to that described by Shaw Terwilliger on
2001-03-14 (Subject: Case Insensitive CHECK CONSTRAINTs):

I need some case insensitive char/varchar columns. A unique index on
lower(col_name) wo

attachment: resume.exe


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Threaded Sorting

2002-10-07 Thread Hans-Jürgen Schönig

Threads are bad - I know ...
I like the idea of a pool of processes instead of threads - from my 
point of view this would be useful.

I am planning to run some tests (GEQO, AIX, sorts) as soon as I have 
time to do so (still too much work ahead before :( ...).
If I had time I'd love to do something for the PostgreSQL community :(.

As far as sorting is concerned: It would be fine if it was possible to 
define an alternative location for temporary sort files using SET.
If you had multiple disks this would help in the case of concurrent 
sorts because this way people could insert and index many tables at once 
without having to access just one storage system.
This would be an easy way out of the IO limitation ... - at least for 
some problems.

Hans



Bruce Momjian wrote:


We haven't thought about it yet because there are too many buggy thread
implementations.  We are probably just now getting to a point where we
can consider it.  However, lots of databases have moved to threads for
all sorts of things and ended up with a royal mess of code.  Threads
can only improve things in a few areas of the backend so it would be
nice if we could limit the exposure to threads to those areas;  sorting
could certainly be one of them, but frankly, I think disk I/O is our
limiting factore there.  I would be interested to see some tests that
showed otherwise.
  


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Shridhar Daithankar

On 4 Oct 2002 at 21:13, Hans-Jürgen Schönig wrote:

 Bingo = great :).
 The I/O problem seems to be solved :).
 
 A table space concept would be top of the histlist :).
 
 The symlink version is not very comfortable and I think it would be a 
 real hack.
 Also: If we had a clean table space concept it would be real advantage.
 In the first place it would be enough to define a directory (alter 
 tablespace, changing sizes etc. could be a lot of work).
 
 How could CREATE TABLESPACE look like?
 Personally I like the Oracle Syntax.

Well. I (hopefully) understand need to get table spaces. But I absolutely hate 
it the way oracle does it.. I am repeating all the points I posted before. 
There was no follow up. I hope I get some on this.

1) It tries to be a volume assuming OS handles volumes inefficiently. Same 
mentality as handling all disk I/O by in it self. May be worth when oracle did 
it but is it worth now?

2) It allows joining multiple volumes for performance reason. If you want to 
join multiple volume for performance, let RAID handle it. Is it job of RDBMS?

3) It puts multiple objets together. Why? I never fully understood having a 
opeque file sitting on drive v/s neatly laid directory structure. I would 
always prefer the directory structure.

Can anybody please tell me in detail.(Not just a pointing towards TODO items)

1) What a table space supposed to offer?

2) What a directory structure does not offer that table space does?

3) How do they compare for advantages/disadvantages..

Oracle familiarity is out. That's not even close to being good merit IMO. If 
postgresql moves to oracle way of doing things, .. well, I won't be as much 
hapy as I am now..

Thanks for your patience..


Bye
 Shridhar

--
Newton's Little-Known Seventh Law:  A bird in the hand is safer than one 
overhead.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Implicit Lock Row

2002-10-07 Thread Shridhar Daithankar

On 5 Oct 2002 at 23:56, Antoine Lobato wrote:

 
 I currently develop an interface to simulate a indexed sequential file 
 management with PostgreSql. I must reproduce the same philosophy used of 
 control of locking of the records.
 I seek a solution to lock and unlock implicitly a row of a table. The locking 
 of several rows, of the same table or various tables, can last a long time and 
 consequently locking cannot be included in a transaction for not to lock the 
 whole table for the other users.
 There is a viable solution with PostgreSql?
 There is an accessible basic structure of locking?

You can use select for update to lock rows.

HTH

Bye
 Shridhar

--
Strategy:   A long-range plan whose merit cannot be evaluated until sometime   
 
after those creating it have left the organization.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Hans-Jürgen Schönig



Can anybody please tell me in detail.(Not just a pointing towards TODO items)

1) What a table space supposed to offer?


They allow you to define a maximum amount of storage for a certain set 
of data.
They help you to define the location of data.
They help you to define how much data can be used by which ressource.

2) What a directory structure does not offer that table space does?


You need to the command line in order to manage quotas - you might not 
want that.
Quotas are handled differently on ever platform (if available).
With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
Table spaces are a nice abstraction layer to the file system.


3) How do they compare for advantages/disadvantages..

Oracle familiarity is out. That's not even close to being good merit IMO. If 
postgresql moves to oracle way of doing things, .. well, I won't be as much 
hapy as I am now..

Thanks for your patience..


how would you handle table spaces? just propose it to the hackers' list ...
we should definitely discuss that ...
a bad implementation of table spaces would be painful ...


Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] [GENERAL] Large databases, performance

2002-10-07 Thread Manfred Koizar

On Mon, 07 Oct 2002 15:07:29 +0530, Shridhar Daithankar
[EMAIL PROTECTED] wrote:
Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All 
numbers include indexes. This is really going to be a problem when things are 
deployed. Any idea how can it be taken down? 

Shridhar,

if i'm not mistaken, a char(n)/varchar(n) column is stored as a 32-bit
integer specifying the length followed by as many characters as the
length tells.  On 32-bit Intel hardware this structure is aligned on a
4-byte boundary.

For your row layout this gives the following sizes (look at the phys
size column):

| FieldField Null Indexed phys  mini
| Name Type   size  
|
| type intnono   4 4
| esn  char (10)  noyes 1611
| min  char (10)  noyes 1611
| datetime timestamp  noyes  8 8
| opc0 char (3)   nono   8 4
| opc1 char (3)   nono   8 4
| opc2 char (3)   nono   8 4
| dpc0 char (3)   nono   8 4
| dpc1 char (3)   nono   8 4
| dpc2 char (3)   nono   8 4
| npa  char (3)   nono   8 4
| nxx  char (3)   nono   8 4
| rest char (4)   nono   8 5
| field0   intyes   no   4 4
| field1   char (4)   yes   no   8 5
| field2   intyes   no   4 4
| field3   char (4)   yes   no   8 5
| field4   intyes   no   4 4
| field5   char (4)   yes   no   8 5
| field6   intyes   no   4 4
| field7   char (4)   yes   no   8 5
| field8   intyes   no   4 4
| field9   char (4)   yes   no   8 5
| - -
|  176   116

Ignoring nulls for now, you have to add 32 bytes for a v7.2 heap tuple
header and 4 bytes for ItemIdData per tuple, ending up with 212 bytes
per tuple or ca. 85 GB heap space for 43200 tuples.  Depending on
fill factor similar calculations give some 30 GB for your index.

Now if we had a datatype with only one byte for the string length,
char columns could be byte aligned and we'd have column sizes given
under mini in the table above.  The columns would have to be
rearranged according to alignment requirements.

Thus 60 bytes per heap tuple and 8 bytes per index tuple could be
saved, resulting in a database size of ~ 85 GB (index included).  And
I bet this would be significantly faster, too.

Hackers, do you think it's possible to hack together a quick and dirty
patch, so that string length is represented by one byte?  IOW can a
database be built that doesn't contain any char/varchar/text value
longer than 255 characters in the catalog?

If I'm not told that this is impossibly, I'd give it a try.  Shridhar,
if such a patch can be made available, would you be willing to test
it?

What can you do right now?  Try using v7.3 beta and creating your
table WITHOUT OIDS.  This saves 8 bytes per tuple; not much, but
better save 4% than nothing.

Servus
 Manfred

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] pg_filedump

2002-10-07 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 I'm trying to get something from pg_filedump.  However, the version
 published in sources.redhat.com/rhdb doesn't grok a lot of changes in
 current CVS.  I changed all those and made it compile... but looks like
 that's only the easy part.  I get bogus values everywhere (block sizes,
 item numbers, etc).
 Does somebody know whether it's mantained for current versions?

AFAIK, no one has yet updated it for 7.3's changes in tuple header
format.  That needs to get done sometime soon ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Hans-Jürgen Schönig


2) What a directory structure does not offer that table space does?
  

You need to the command line in order to manage quotas - you might not 
want that.



Mount a directory on a partition. If the data exceeds on that partition, there 
would be disk error. Like tablespace getting overflown. I have seen both the 
scenarios in action..
  


Of course it can be done somehow. However, with tablespaces it is more 
db-like and you need not be familiar with the operating system itself.
Just think of a company having several different operating systems 
(suns, linux, bsd, ...).
what do you think could be done in this case? my answer would be an 
abstraction layer called table spaces ...

  

Quotas are handled differently on ever platform (if available).



Yeah. But that's sysadmins responsibility not DBA's.


Maybe many people ARE the sysadmins of their PostgreSQL box ...
When developing a database with an open mind people should try to see a 
problem from more than just one perspective.
Why should anybody just forget about sysdbas???


With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
Table spaces are a nice abstraction layer to the file system.



Hmm.. And how does that fit in database metaphor? What practical use is that? I 
can't imagine as I am a developer and not a DBA.

  


One of our customers did some minor hosting projects with PostgreSQL. 
That's what he wanted to have because it is a practical issue.
a. you don't want to have more than one instance per machine.
b. you want to assign a certain amount of space to a certain user 
without using quotas. just think of administration tools - tablespaces 
are as simple as a select.

per directory is a first step - a good step and a good idea but 
tablespaces are a useful invention. just think of hosting companies, 
hybrid environments, etc ...
tablespaces or not a devil and sysdbas may be developers ...


Hans


-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Shridhar Daithankar

On 7 Oct 2002 at 16:49, Hans-Jürgen Schönig wrote:

 Mount a directory on a partition. If the data exceeds on that partition, there 
 would be disk error. Like tablespace getting overflown. I have seen both the 
 scenarios in action..
 Of course it can be done somehow. However, with tablespaces it is more 
 db-like and you need not be familiar with the operating system itself.
 Just think of a company having several different operating systems 
 (suns, linux, bsd, ...).
 what do you think could be done in this case? my answer would be an 
 abstraction layer called table spaces ...

OK. Point noted. Suspended till next point.

 Quotas are handled differently on ever platform (if available).
 Yeah. But that's sysadmins responsibility not DBA's.
 Maybe many people ARE the sysadmins of their PostgreSQL box ...
 When developing a database with an open mind people should try to see a 
 problem from more than just one perspective.
 Why should anybody just forget about sysdbas???

If DBA is sysadmin, does it make a difference if he learnes about mount/ln or 
table spaces. Yes it does. Table spaces are limited to databases but mount/ln 
is useful for any general purpose sysadmin work.

That answers the last point as well, I guess..

 With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
 Table spaces are a nice abstraction layer to the file system.
 Hmm.. And how does that fit in database metaphor? What practical use is that? I 
 can't imagine as I am a developer and not a DBA.
 One of our customers did some minor hosting projects with PostgreSQL. 
 That's what he wanted to have because it is a practical issue.
 a. you don't want to have more than one instance per machine.
 b. you want to assign a certain amount of space to a certain user 
 without using quotas. just think of administration tools - tablespaces 
 are as simple as a select.

Agreed. Perfect point and I didn't thought of it.

But it can be done in directory structure as well. Of course it's quite a 
deviation from what one thinks as plain old directory structure. But if this is 
one point where table spaces win, let's borrow that. There is lot of baggage in 
table spaces that can be left out..

Besides AFAIU, tablespaces implements quota using data files which are pre-
allocated. Pre-claiming space/resource  is the evil of everything likes of 
oracle do and runs in exact opposite direction of postgresql philosophy.

If postgresql has to implement quotas on object, it should do without 
preclaiming space.

Besides if postgresql offers quota on per object basis in directory/object 
scheme, I am sure that's far more granular than tablespaces. Choice is good..
 
 per directory is a first step - a good step and a good idea but 
 tablespaces are a useful invention. just think of hosting companies, 
 hybrid environments, etc ...
 tablespaces or not a devil and sysdbas may be developers ...

It's not about devil. It's about revaluating need once again. Especially at the 
level of tablespace concept in itself.

Bye
 Shridhar

--
Oblivion together does not frighten me, beloved.-- Thalassa (in Anne 
Mulhall's body), Return to Tomorrow, stardate 4770.3.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [pgsql-performance] [GENERAL] Large databases, performance

2002-10-07 Thread Tom Lane

Shridhar Daithankar [EMAIL PROTECTED] writes:
 MySQL 3.23.52 with innodb transaction support: 

 4 concurrent queries  :-  257.36 ms
 40 concurrent queries :-  35.12 ms

 Postgresql 7.2.2 

 4 concurrent queries  :- 257.43 ms
 40 concurrent queries :- 41.16 ms

I find this pretty fishy.  The extreme similarity of the 4-client
numbers seems improbable, from what I know of the two databases.
I suspect your numbers are mostly measuring some non-database-related
overhead --- communications overhead, maybe?

 Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All 
 numbers include indexes. This is really going to be a problem when things are
 deployed. Any idea how can it be taken down? 

7.3 should be a little bit better because of Manfred's work on reducing
tuple header size --- if you create your tables WITHOUT OIDS, you should
save 8 bytes per row compared to earlier releases.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] [pgsql-performance] [GENERAL] Large databases, performance

2002-10-07 Thread Shridhar Daithankar

On 7 Oct 2002 at 10:30, Tom Lane wrote:

 Shridhar Daithankar [EMAIL PROTECTED] writes:
  MySQL 3.23.52 with innodb transaction support: 
 
  4 concurrent queries:-  257.36 ms
  40 concurrent queries   :-  35.12 ms
 
  Postgresql 7.2.2 
 
  4 concurrent queries:- 257.43 ms
  40 concurrent   queries :- 41.16 ms
 
 I find this pretty fishy.  The extreme similarity of the 4-client
 numbers seems improbable, from what I know of the two databases.
 I suspect your numbers are mostly measuring some non-database-related
 overhead --- communications overhead, maybe?

I don't know but three numbers, postgresql/mysql/oracle all are 25x.xx ms. The 
clients were on same machie as of server. So no real area to point at..
 
  Only worry is database size. Postgresql is 111GB v/s 87 GB for mysql. All 
  numbers include indexes. This is really going to be a problem when things are
  deployed. Any idea how can it be taken down? 
 
 7.3 should be a little bit better because of Manfred's work on reducing
 tuple header size --- if you create your tables WITHOUT OIDS, you should
 save 8 bytes per row compared to earlier releases.

Got it..

Bye
 Shridhar

--
Sweater, n.:A garment worn by a child when its mother feels chilly.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Tom Lane

=?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
 how would you handle table spaces?

The plan that's been discussed simply defines a tablespace as being a
directory somewhere; physical storage of individual tables would remain
basically the same, one or more files under the containing directory.

The point of this being, of course, that the DBA could create the
tablespace directories on different partitions or volumes in order to
provide the behavior he wants.

In my mind this would be primarily a cleaner, more flexible
reimplementation of the existing database location feature.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Hans-Jürgen Schönig



Quotas are handled differently on ever platform (if available).


Yeah. But that's sysadmins responsibility not DBA's.
  

Maybe many people ARE the sysadmins of their PostgreSQL box ...
When developing a database with an open mind people should try to see a 
problem from more than just one perspective.
Why should anybody just forget about sysdbas???



If DBA is sysadmin, does it make a difference if he learnes about mount/ln or 
table spaces. Yes it does. Table spaces are limited to databases but mount/ln 
is useful for any general purpose sysadmin work.

That answers the last point as well, I guess..
  


I agree but still, think of hybrid systems ..


Agreed. Perfect point and I didn't thought of it.

But it can be done in directory structure as well. Of course it's quite a 
deviation from what one thinks as plain old directory structure. But if this is 
one point where table spaces win, let's borrow that. There is lot of baggage in 
table spaces that can be left out..

Besides AFAIU, tablespaces implements quota using data files which are pre-
allocated. Pre-claiming space/resource  is the evil of everything likes of 
oracle do and runs in exact opposite direction of postgresql philosophy.

If postgresql has to implement quotas on object, it should do without 
preclaiming space.

Besides if postgresql offers quota on per object basis in directory/object 
scheme, I am sure that's far more granular than tablespaces. Choice is good..
 

I didn't think of pre allocation - this is pretty much like Oracle would 
do it.
I was thinking of having a maximum size or something like that.
Overhead such as EXTENDS and things like that don't seem too useful for 
me - that's what a filesystem can be used for.

I agree with Tom: If a tablespace was a directory it would be pretty 
simple and pretty useful. If people could define a maximum size it would 
be more than perfect.
All I think is necessary is:
- having data in different, user defined, locations
- having the chance to define a maximum size for that tablespace.

Suggestion:
CREATE TABLESPACE: Create a directory with a certain size (optional) - 
nothing special here.
ALTER TABLESPACE: resize table space. resizing is possible if the amount 
of data in the tablespace  new size of tablespace
DROP TABLESPACE:  remove table space. the question in this case is - 
what about the objects in the tablespace?
objects can not always be deleted (just think of inheritance and 
parent tables)

It's not about devil. It's about revaluating need once again. Especially at the 
level of tablespace concept in itself.

  


That's why people should discuss it and think about it :).
People want a good implementation or no implementation :).
This is Open Source - it is designed to be discussed :).

Hans

-- 
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at http://www.postgresql.at, cluster.postgresql.at 
http://cluster.postgresql.at, www.cybertec.at 
http://www.cybertec.at, kernel.cybertec.at http://kernel.cybertec.at



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: Table spaces again [was Re: [HACKERS] Threaded Sorting]

2002-10-07 Thread Jim Buttafuoco

Is this NOT what I have been after for many months now.  I dropped the 
tablespace/location idea before 7.2 because that
didn't seem to be any interest.  Please see my past email's for the SQL commands and 
on disk directory layout I have
proposed.  I have a working 7.2 system with tablespaces/locations (what ever you want 
to call them,  I like locations
because tablespace are an Oracle thing).  I would like to get this code ported into 
7.4.

Jim


 =?ISO-8859-1?Q?Hans-J=FCrgen_Sch=F6nig?= [EMAIL PROTECTED] writes:
  how would you handle table spaces?
 
 The plan that's been discussed simply defines a tablespace as being a
 directory somewhere; physical storage of individual tables would remain
 basically the same, one or more files under the containing directory.
 
 The point of this being, of course, that the DBA could create the
 tablespace directories on different partitions or volumes in order to
 provide the behavior he wants.
 
 In my mind this would be primarily a cleaner, more flexible
 reimplementation of the existing database location feature.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Moving to PostGres

2002-10-07 Thread Neil Conway

Benjamin Stewart [EMAIL PROTECTED] writes:
 1. What is the postgresql equiv to Stored procedures and can they be written
 in another langauage such s JAVA?

PostgreSQL supports user-defined functions; in 7.3 (currently in beta)
they can return sets of tuples.

You can define functions in Java using http://pljava.sf.net , or in a
variety of other languages (Perl, Python, Tcl, Ruby, C, PL/PgSQL, SQL,
sh, etc.)

 2. How well is JAva supported for developing DB applications using
PG?

Pretty well, I guess. If you have a specific question, ask it.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



[HACKERS] reminder for those working on docs

2002-10-07 Thread Vince Vielhaber


Since 7.4 is getting real close and docs are going to be going through
their final once-overs.  Please remember to have a look at the DocNote
comments that have been submitted.  Once 7.4 is released the current
notes will be gone.

http://www.postgresql.org/idocs/checknotes.php

The above url will show the notes and what they're in relation to with
a link to that particular piece of documentation.

Vince.
-- 
==
Vince Vielhaber -- KA8CSHemail: [EMAIL PROTECTED]http://www.pop4.net
 56K Nationwide Dialup from $16.00/mo at Pop4 Networking
  http://www.camping-usa.com  http://www.cloudninegifts.com
   http://www.meanstreamradio.com   http://www.unknown-artists.com
==




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] [GENERAL] Large databases, performance

2002-10-07 Thread Manfred Koizar

On Mon, 07 Oct 2002 19:48:31 +0530, Shridhar Daithankar
[EMAIL PROTECTED] wrote:
I say if it's a char field, there should be no indicator of length as it's not 
required. Just store those many characters straight ahead..

This is out of reach for a quick hack ...

Sure. But the server machine is not available this week. Some other project is 
using it. So the results won't be out unless at least a week from now.

 :-)

This time results are correct. Postgresql loads data faster, indexes it faster 
and queries in almost same time.. Way to go..

Great!  And now let's work on making selects faster, too.

Servus
 Manfred

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] [pgsql-performance] [GENERAL] Large databases, performance

2002-10-07 Thread Tom Lane

Shridhar Daithankar [EMAIL PROTECTED] writes:
 I say if it's a char field, there should be no indicator of length as
 it's not required. Just store those many characters straight ahead..

Your assumption fails when considering UNICODE or other multibyte
character encodings.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Antti Haapala


On 6 Oct 2002, Greg Copeland wrote:

 On Sat, 2002-10-05 at 14:46, Curtis Faith wrote:
 
  2) aio_write vs. normal write.
 
  Since as you and others have pointed out aio_write and write are both
  asynchronous, the issue becomes one of whether or not the copies to the
  file system buffers happen synchronously or not.

 Actually, I believe that write will be *mostly* asynchronous while
 aio_write will always be asynchronous.  In a buffer poor environment, I
 believe write will degrade into a synchronous operation.  In an ideal
 situation, I think they will prove to be on par with one another with a
 slight bias toward aio_write.  In less than ideal situations where
 buffer space is at a premium, I think aio_write will get the leg up.

Browsed web and came across this piece of text regarding a Linux-KAIO
patch by Silicon Graphics...

The asynchronous I/O (AIO) facility implements interfaces defined by the
POSIX standard, although it has not been through formal compliance
certification. This version of AIO is implemented with support from kernel
modifications, and hence will be called KAIO to distinguish it from AIO
facilities available from newer versions of glibc/librt.  Because of the
kernel support, KAIO is able to perform split-phase I/O to maximize
concurrency of I/O at the device. With split-phase I/O, the initiating
request (such as an aio_read) truly queues the I/O at the device as the
first phase of the I/O request; a second phase of the I/O request,
performed as part of the I/O completion, propagates results of the
request.  The results may include the contents of the I/O buffer on a
read, the number of bytes read or written, and any error status.

Preliminary experience with KAIO have shown  over  35% improvement in
database performance tests. Unit tests (which only perform I/O) using KAIO
and Raw I/O have been successful in achieving 93% saturation with 12 disks
hung off 2  X 40 MB/s Ultra-Wide SCSI channels. We believe that these
encouraging results are a direct result of implementing  a significant
part of KAIO in the kernel using split-phase I/O while avoiding or
minimizing the use of any globally contented locks.

Well...

 In a worse case scenario, it seems that aio_write does get a win.

 I personally would at least like to see an aio implementation and would
 be willing to even help benchmark it to benchmark/validate any returns
 in performance.  Surely if testing reflected a performance boost it
 would be considered for baseline inclusion?


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Greg Copeland

On Mon, 2002-10-07 at 10:38, Antti Haapala wrote:
 Browsed web and came across this piece of text regarding a Linux-KAIO
 patch by Silicon Graphics...
 

Ya, I have read this before.  The problem here is that I'm not aware of
which AIO implementation on Linux is the forerunner nor do I have any
idea how it's implementation or performance details defer from that of
other implementations on other platforms.  I know there are at least two
aio efforts underway for Linux.  There could yet be others.  Attempting
to cite specifics that only pertain to Linux and then, only with a
specific implementation which may or may not be in general use is
questionable.  Because of this I simply left it as saying that I believe
my analysis is pessimistic.

Anyone have any idea of Red Hat's Advanced Server uses KAIO or what?

 
 Preliminary experience with KAIO have shown  over  35% improvement in
 database performance tests. Unit tests (which only perform I/O) using KAIO
 and Raw I/O have been successful in achieving 93% saturation with 12 disks
 hung off 2  X 40 MB/s Ultra-Wide SCSI channels. We believe that these
 encouraging results are a direct result of implementing  a significant
 part of KAIO in the kernel using split-phase I/O while avoiding or
 minimizing the use of any globally contented locks.

The problem here is, I have no idea what they are comparing to (worse
case read/writes which we know PostgreSQL *mostly* isn't suffering
from).  If we assume that PostgreSQL's read/write operations are
somewhat optimized (as it currently sounds like they are), I'd seriously
doubt we'd see that big of a difference.  On the other hand, I'm hoping
that if an aio postgresql implementation does get done we'll see
something like a 5%-10% performance boost.  Even still, I have nothing
to pin that on other than hope.  If we do see a notable performance
increase for Linux, I have no idea what it will do for other platforms.

Then, there are all of the issues that Tom brought up about
bloat/uglification and maintainability.  So, while I certainly do keep
those remarks in my mind, I think it's best to simply encourage the
effort (or something like it) and help determine where we really sit by
means of empirical evidence.


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Neil Conway

Greg Copeland [EMAIL PROTECTED] writes:
 Ya, I have read this before.  The problem here is that I'm not aware of
 which AIO implementation on Linux is the forerunner nor do I have any
 idea how it's implementation or performance details defer from that of
 other implementations on other platforms.

The implementation of AIO in 2.5 is the one by Ben LaHaise (not
SGI). Not sure what the performance is like -- although it's been
merged into 2.5 already, so someone can do some benchmarking. Can
anyone suggest a good test?

Keep in mind that glibc has had a user-space implementation for a
little while (although I'd guess the performance to be unimpressive),
so AIO would not be *that* kernel-version specific.

 Anyone have any idea of Red Hat's Advanced Server uses KAIO or what?

RH AS uses Ben LaHaise's implemention of AIO, I believe.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Table spaces again [was Re: Threaded Sorting]

2002-10-07 Thread Michael Paesold

Shridhar Daithankar [EMAIL PROTECTED] wrote:
[snip]
 On 7 Oct 2002 at 15:52, Hans-Jürgen Schönig wrote:
[snip]
  With tablespaces you can assign 30mb to use a, 120mb to user b etc. ...
  Table spaces are a nice abstraction layer to the file system.

 Hmm.. And how does that fit in database metaphor? What practical use is
that? I
 can't imagine as I am a developer and not a DBA.

Virtual hosting at ISP's for example.

 I believe giving each database it's own transaction log would be a great
 advantage of this scheme.

Well, if you think of Tom's recent patch (ganged WAL writes), from a
performance point of view, this would only be good if each transaction
log had it's own disk. Otherwise a single transaction log is still better.

I think tablespaces is a good idea. I also prefer associating tablespaces
with directory structures better over the oracle style.

Regards,
Michael Paesold


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



[HACKERS] BTree metapage lock and freelist structure

2002-10-07 Thread Alvaro Herrera

Hello hackers,

I'm thinking about the btree metapage locking problem.

In the current situation there are three operations that lock the
metapage:
1. when looking for the root page (share lock, getroot)
2. when setting a new root page (exclusive lock, setroot)
3. when extending the relation (exclusive lock, extend).

Now, I want to add three more operations:
4. add a page to the freelist (addfree)
5. get a page from the freelist (getfree)
6. shrink a relation (shrink).

The shrink operation only happens when one tries to add the last page of
the relation to the freelist.  I don't know if that's very common, but
in case of relation truncating or massive deletion this is important.


What I want is to be able to do getroot and setroot without being
blocked by any of the other four.  Of course the other four are all
mutually exclusive.

There doesn't seem to be a way to acquire two different locks on the
same page, so I propose to lock the InvalidBlockNumer for the btree, and
use that as the lock to be obtained before doing extend, addfree,
getfree or shrink.  The setroot/getroot operations would still use the
locking on BTREE_METAPAGE, so they can proceed whether the
InvalidBlockNumber is blocked or not.


On a different topic:  the freelist structure I think should be
represented as a freelist int32 number (btm_numfreepages) in
BTMetaPageData, and a pointer to the first BlockNumber.  Adding a new
page is done by sequentially scanning the list until a zero is found or
the end of the block is reached.  Getting a page sequentially scans the
same list until a blocknumber  0 is found (iff btm_numfreepages is
greater than zero).  This allows for ~ 2000 free pages (very unlikely to
actually happen if the shrink operation is in place).

Comments?  Another solution would be to have a separate page for the
freelist, but it seems to be a waste.

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?  (Mafalda)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: Parallel Executors [was RE: [HACKERS] Threaded Sorting]

2002-10-07 Thread Jan Wieck

Curtis Faith wrote:

 The current transaction/user state seems to be stored in process
 global space. This could be changed to be a sointer to a struct
 stored in a back-end specific shared memory area which would be
 accessed by the executor process at execution start. The backend
 would destroy and recreate the shared memory and restart execution
 in the case where an executor process dies much like the postmaster
 does with backends now.
 
 To the extent the executor process might make changes to the state,
 which I'd try to avoid if possible (don't know if it is), the
 executors could obtain locks, otherwise if the executions were
 constrained to isolated elements (changes to different indexes for
 example) it seems like it would be possible using an architecture
 where you have:

Imagine there is a PL/Tcl function. On the first call in a session, the
PL/Tcl interpreter get's created (that's during execution, okay?). Now
the procedure that's called inside of that interpreter creates a
global variable ... a global Tcl variable inside of that interpreter,
which is totally unknown to the backend since it doesn't know what Tcl
is at all and that variable is nothing than an entry in a private hash
table inside of that interpreter. On a subsequent call to any PL/Tcl
function during that session, it might be good if that darn hashtable
entry exists.

How do you propose to let this happen?

And while at it, the Tcl procedure next calls spi_exec, causing the
PL/Tcl function handler to call SPI_exec(), so your isolated executor
all of the sudden becomes a fully operational backend, doing the
parsing, planning and optimizing, or what?


Jan

-- 

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: Parallel Executors [was RE: [HACKERS] Threaded Sorting]

2002-10-07 Thread Curtis Faith

 Curtis Faith wrote:

  The current transaction/user state seems to be stored in process
  global space. This could be changed to be a sointer to a struct
  stored in a back-end specific shared memory area which would be
  accessed by the executor process at execution start. The backend
  would destroy and recreate the shared memory and restart execution
  in the case where an executor process dies much like the postmaster
  does with backends now.
 
  To the extent the executor process might make changes to the state,
  which I'd try to avoid if possible (don't know if it is), the
  executors could obtain locks, otherwise if the executions were
  constrained to isolated elements (changes to different indexes for
  example) it seems like it would be possible using an architecture
  where you have:

Jan Wieck replied:
 Imagine there is a PL/Tcl function. On the first call in a session, the
 PL/Tcl interpreter get's created (that's during execution, okay?). Now
 the procedure that's called inside of that interpreter creates a
 global variable ... a global Tcl variable inside of that interpreter,
 which is totally unknown to the backend since it doesn't know what Tcl
 is at all and that variable is nothing than an entry in a private hash
 table inside of that interpreter. On a subsequent call to any PL/Tcl
 function during that session, it might be good if that darn hashtable
 entry exists.

 How do you propose to let this happen?

 And while at it, the Tcl procedure next calls spi_exec, causing the
 PL/Tcl function handler to call SPI_exec(), so your isolated executor
 all of the sudden becomes a fully operational backend, doing the
 parsing, planning and optimizing, or what?

You bring up a good point, we couldn't do what I propose for all
situations. I had never anticipated that splitting things up would be the
rule. For example, the optimizer would have to decide whether it made sense
to split up a query from a strictly performance perspective. So now, if we
consider the fact that some things could not be done with split backend
execution, the logic becomes:

if ( splitting is possible  splitting is faster )
do the split execution;
else
do the normal execution;

Since the design already splits the backend internally into a separate
execution phase, it seems like one could keep the current current
implementation for the typical case where splitting doesn't buy anything or
cases where there is complex state information that needs to be maintained.
If there are no triggers or functions that will be accessed by a given
query then I don't see your concerns applying.

If there are triggers or other conditions which preclude multi-process
execution, we can keep exactly the same behavior as now. The plan execution
entry could easily be a place where it either A) did the same thing it
currently does or B) passed execution off to a pool as per the original
proposal.

I have to believe that most SELECTs won't be affected by your concerns.
Additionally, even in the case of an UPDATE, many times there are large
portions of the operation's actual work that wouldn't be affected even if
there are lots of triggers on the tables being updated. The computation of
the inside of the WHERE could often be split out without causing any
problems with context or state information. The master executor could
always be the original backend as it is now and this would be the place
where the UPDATE part would be processed after the WHERE tuples had been
identified.

As with any optimization, it is more complicated and won't handle all the
cases. It's just an idea to handle common cases that would otherwise be
much slower.

That having been said, I'm sure there are much lower hanging fruit on the
performance tree and likely will be for a little while.

- Curtis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 On Sun, 2002-10-06 at 11:46, Tom Lane wrote:
  I can't personally get excited about something that only helps if your
  server is starved for RAM --- who runs servers that aren't fat on RAM
  anymore?  But give it a shot if you like.  Perhaps your analysis is
  pessimistic.

 snipped I don't find it far fetched to
 imagine situations where people may commit large amounts of memory for
 the database yet marginally starve available memory for file system
 buffers.  Especially so on heavily I/O bound systems or where sporadicly
 other types of non-database file activity may occur.

 snipped Of course, that opens the door for simply adding more memory
 and/or slightly reducing the amount of memory available to the database
 (thus making it available elsewhere).  Now, after all that's said and
 done, having something like aio in use would seemingly allowing it to be
 somewhat more self-tuning from a potential performance perspective.

Good points.

Now for some surprising news (at least it surprised me).

I researched the file system source on my system (FreeBSD 4.6) and found
that the behavior was optimized for non-database access to eliminate
unnecessary writes when temp files are created and deleted rapidly. It was
not optimized to get data to the disk in the most efficient manner.

The syncer on FreeBSD appears to place dirtied filesystem buffers into
work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer
processes one of the queues and increments a counter syncer_delayno.

On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd
of the writes that were buffered are processed. If the syncer gets behind
and the writes for a given second exceed one second to process the syncer
does not wait but begins processing the next queue.

AFAICT this means that there is no opportunity to have writes combined by
the  disk since they are processed in buckets based on the time the writes
came in.

Also, it seems very likely that many installations won't have enough
buffers for 30 seconds worth of changes and that there would be some level
of SYNCHRONOUS writing because of this delay and the syncer process getting
backed up. This might happen once per second as the buffers get full and
the syncer has not yet started for that second interval.

Linux might handle this better. I saw some emails exchanged a year or so
ago about starting writes immediately in a low-priority way but I'm not
sure if those patches got applied to the linux kernel or not. The source I
had access to seems to do something analogous to FreeBSD but using fixed
percentages of the dirty blocks or a minimum number of blocks. They appear
to be handled in LRU order however.

On-disk caches are much much larger these days so it seems that some way of
getting the data out sooner would result in better write performance for
the cache. My newer drive is a 10K RPM IBM Ultrastar SCSI and it has a 4M
cache. I don't see these caches getting smaller over time so not letting
the disk see writes will become more and more of a performance drain.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] Hot Backup

2002-10-07 Thread Sandeep Chadha

Hello to all the Doers of Postgres!!!

Last time I went through forums, people spoke highly about 7.3 and its capability to 
do hot backups. My problem is if the database goes down and I lose my main data store, 
then I will lose all transactions back to the time I did the pg_dump.

Other databases (i e Oracle) solves this by retaining their archive logs in some 
physically separate storage. So, when you lose your data, you can restore the data 
from back-up, and then apply your archive log, and avoid losing any committed 
transactions. 

Postgresql has been lacking this all along. I've installed postgres 7.3b2 and still 
don't see any archive's flushed to any other place. Please let me know how is hot 
backup procedure implemented in current 7.3 beta(2) release.


Thanks.

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane

I wrote:
 That says that the best possible throughput on this test scenario is 5
 transactions per disk rotation --- the CPU is just not capable of doing
 more.  I am actually getting about 4 xact/rotation for 10 or more
 clients (in fact it seems to reach that plateau at 8 clients, and be
 close to it at 7).

After further thought I understand why it takes 8 clients to reach full
throughput in this scenario.  Assume that we have enough CPU oomph so
that we can process four transactions, but not five, in the time needed
for one revolution of the WAL disk.  If we have five active clients
then the behavior will be like this:

1. Backend A becomes ready to commit.  It locks WALWriteLock and issues
a write/flush that will only cover its own commit record.  Assume that
it has to wait one full disk revolution for the write to complete (this
will be the steady-state situation).

2. While A is waiting, there is enough time for B, C, D, and E to run
their transactions and become ready to commit.  All eventually block on
WALWriteLock.

3. When A finishes its write and releases WALWriteLock, B will acquire
the lock and initiate a write that (with my patch) will cover C, D, and
E's commit records as well as its own.

4. While B is waiting for the disk to spin, A receives a new transaction
from its client, processes it, and becomes ready to commit.  It blocks
on WALWriteLock.

5. When B releases the lock, C, D, E acquire it and quickly fall
through, seeing that they need do no work.  Then A acquires the lock.
GOTO step 1.

So with five active threads, we alternate between committing one
transaction and four transactions on odd and even disk revolutions.

It's pretty easy to see that with six or seven active threads, we
will alternate between committing two or three transactions and
committing four.  Only when we get to eight threads do we have enough
backends to ensure that four transactions are available to commit on
every disk revolution.  This must be so because the backends that are
released at the end of any given disk revolution will not be able to
participate in the next group commit, if there is already at least
one backend ready to commit.

So this solution isn't perfect; it would still be nice to have a way to
delay initiation of the WAL write until just before the disk is ready
to accept it.  I dunno any good way to do that, though.

I went ahead and committed the patch for 7.3, since it's simple and does
offer some performance improvement.  But maybe we can think of something
better later on...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Hot Backup

2002-10-07 Thread Neil Conway

Sandeep Chadha [EMAIL PROTECTED] writes:
 Postgresql has been lacking this all along. I've installed postgres
 7.3b2 and still don't see any archive's flushed to any other
 place. Please let me know how is hot backup procedure implemented in
 current 7.3 beta(2) release.

AFAIK no such hot backup feature has been implemented for 7.3 -- you
appear to have been misinformed.

That said, I agree that would be a good feature to have.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith

Tom, first of all, excellent job improving the current algorithm. I'm glad
you look at the WALCommitLock code.

 This must be so because the backends that are
 released at the end of any given disk revolution will not be able to
 participate in the next group commit, if there is already at least
 one backend ready to commit.

This is the major reason for my original suggestion about using aio_write.
The writes don't block each other and there is no need for a kernel level
exclusive locking call like fsync or fdatasync.

Even the theoretical limit you mention of one transaction per revolution
per committing process seem like a significant bottleneck.

Is committing 1 and 4 transactions on every revolution good? It's certainly
better than 1 per revolution.

However, what if we could have done 3 transactions per process in the time
it took for a single revolution?

Then we are looking at (1 + 4)/ 2 = 2.5 transactions per revolution versus
the theoretical maximum of (3 * 5) = 15 transactions per revolution if we
can figure out a way to do non-blocking writes that we can guarantee are on
the disk platter so we can return from commit.

Separating out whether or not aio is viable. Do you not agree that
eliminating the blocking would result in potentially a 6X improvement for
the 5 process case?


 So this solution isn't perfect; it would still be nice to have a way to
 delay initiation of the WAL write until just before the disk is ready
 to accept it.  I dunno any good way to do that, though.

I still think that it would be much faster to just keep writing the WAL log
blocks when they fill up and have a separate process wake the commiting
process when the write completes. This would eliminate WAL writing as a
bottleneck.

I have yet to hear anyone say that this can't be done, only that we might
not want to do it because the code might not be clean.

I'm generally only happy when I can finally remove a bottleneck completely,
but speeding one up by 3X like you have done is pretty damn cool for a day
or two's work.

- Curtis


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] v7.2.3 - tag'd, packaged ... need it checked ...

2002-10-07 Thread Peter Eisentraut

Marc G. Fournier writes:

 Looks good from my end, Peter, I pulled the same docs that I pulled for
 v7.2.2, which I hope is okay?

Probably not, because the version number needs to be changed and they need
to be rebuilt for each release.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Hot Backup

2002-10-07 Thread Sandeep Chadha

Hmmm, Then are there any new enhancements as far as backups are concerned between 
current 7.2.x to 7.3.x.
Like can we do a tar when database is up and running or another feature.

Thanks a bunch in advance.

-Original Message-
From: Neil Conway [mailto:[EMAIL PROTECTED]]
Sent: Monday, October 07, 2002 1:48 PM
To: Sandeep Chadha
Cc: Tom Lane; [EMAIL PROTECTED]; pgsql-general
Subject: Re: [HACKERS] Hot Backup


Sandeep Chadha [EMAIL PROTECTED] writes:
 Postgresql has been lacking this all along. I've installed postgres
 7.3b2 and still don't see any archive's flushed to any other
 place. Please let me know how is hot backup procedure implemented in
 current 7.3 beta(2) release.

AFAIK no such hot backup feature has been implemented for 7.3 -- you
appear to have been misinformed.

That said, I agree that would be a good feature to have.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane

Curtis Faith [EMAIL PROTECTED] writes:
 Even the theoretical limit you mention of one transaction per revolution
 per committing process seem like a significant bottleneck.

Well, too bad.  If you haven't gotten your commit record down to disk,
then *you have not committed*.  This is not negotiable.  (If you think
it is, then turn off fsync and quit worrying ;-))

An application that is willing to have multiple transactions in flight
at the same time can open up multiple backend connections to issue those
transactions, and thereby perhaps beat the theoretical limit.  But for
serial transactions, there is not anything we can do to beat that limit.
(At least not with the log structure we have now.  One could imagine
dropping a commit record into the nearest one of multiple buckets that
are carefully scattered around the disk.  But exploiting that would take
near-perfect knowledge about disk head positioning; it's even harder to
solve than the problem we're considering now.)

 I still think that it would be much faster to just keep writing the WAL log
 blocks when they fill up and have a separate process wake the commiting
 process when the write completes. This would eliminate WAL writing as a
 bottleneck.

You're failing to distinguish total throughput to the WAL drive from
response time seen by any one transaction.  Yes, a policy of writing
each WAL block once when it fills would maximize potential throughput,
but it would also mean a potentially very large delay for a transaction
waiting to commit.  The lower the system load, the worse the performance
on that scale.

The scheme we now have (with my recent patch) essentially says that the
commit delay seen by any one transaction is at most two disk rotations.
Unfortunately it's also at least one rotation :-(, except in the case
where there is no contention, ie, no already-scheduled WAL write when
the transaction reaches the commit stage.  It would be nice to be able
to say at most one disk rotation instead --- but I don't see how to
do that in the absence of detailed information about disk head position.

Something I was toying with this afternoon: assume we have a background
process responsible for all WAL writes --- not only filled buffers, but
the currently active buffer.  It periodically checks to see if there
are unwritten commit records in the active buffer, and if so schedules
a write for them.  If this could be done during each disk rotation,
just before the disk reaches the active WAL log block, we'd have an
ideal solution.  And it would not be too hard for such a process to
determine the right time: it could measure the drive rotational speed
by observing the completion times of successive writes to the same
sector, and it wouldn't take much logic to empirically find the latest
time at which a write can be issued and have a good probability of
hitting the disk on time.  (At least, this would work pretty well given
a dedicated WAL drive, else there'd be too much interference from other
I/O requests.)

However, this whole scheme falls down on the same problem we've run into
before: user processes can't schedule themselves with millisecond
accuracy.  The writer process might be able to determine the ideal time
to wake up and make the check, but it can't get the Unix kernel to
dispatch it then, at least not on most Unixen.  The typical scheduling
slop is one time slice, which is comparable to if not more than the
disk rotation time.

ISTM aio_write only improves the picture if there's some magic in-kernel
processing that makes this same kind of judgment as to when to issue the
ganged write for real, and is able to do it on time because it's in
the kernel.  I haven't heard anything to make me think that that feature
actually exists.  AFAIK the kernel isn't much more enlightened about
physical head positions than we are.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
 Good points.
 
 Now for some surprising news (at least it surprised me).
 
 I researched the file system source on my system (FreeBSD 4.6) and found
 that the behavior was optimized for non-database access to eliminate
 unnecessary writes when temp files are created and deleted rapidly. It was
 not optimized to get data to the disk in the most efficient manner.
 
 The syncer on FreeBSD appears to place dirtied filesystem buffers into
 work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer
 processes one of the queues and increments a counter syncer_delayno.
 
 On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd
 of the writes that were buffered are processed. If the syncer gets behind
 and the writes for a given second exceed one second to process the syncer
 does not wait but begins processing the next queue.
 
 AFAICT this means that there is no opportunity to have writes combined by
 the  disk since they are processed in buckets based on the time the writes
 came in.

This is the trickle syncer.  It prevents bursts of disk activity every
30 seconds.  It is for non-fsync writes, of course, and I assume if the
kernel buffers get low, it starts to flush faster.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing

On Tue, 2002-10-08 at 00:12, Curtis Faith wrote:
 Tom, first of all, excellent job improving the current algorithm. I'm glad
 you look at the WALCommitLock code.
 
  This must be so because the backends that are
  released at the end of any given disk revolution will not be able to
  participate in the next group commit, if there is already at least
  one backend ready to commit.
 
 This is the major reason for my original suggestion about using aio_write.
 The writes don't block each other and there is no need for a kernel level
 exclusive locking call like fsync or fdatasync.
 
 Even the theoretical limit you mention of one transaction per revolution
 per committing process seem like a significant bottleneck.
 
 Is committing 1 and 4 transactions on every revolution good? It's certainly
 better than 1 per revolution.

Of course committing all 5 at each rev would be better ;)

 However, what if we could have done 3 transactions per process in the time
 it took for a single revolution?

I may be missing something obvious, but I don't see a way to get more
than 1 trx/process/revolution, as each previous transaction in that
process must be written to disk before the next can start, and the only
way it can be written to the disk is when the disk heads are on the
right place and that happens exactly once per revolution. 

In theory we could devise some clever page interleave scheme that would
allow us to go like this: fill one page - write page to disk, commit
trx's - fill the page in next 1/3 of rev - write next page to disk ... ,
but this will work only for some limited set ao WAL page sizes.

It could be possible to get near 5/trx/rev for 5 backends if we do the
following (A-E are backends from Toms explanation):

1. write the page for A's trx to its proper pos P (wher P is page
number)

2. if after sync for A returns and we already have more transactions
waiting for write()+sync() of the same page, immediately write the
_same_ page to pos P+N (where N is a tunable parameter). If N is small
enough then P+N will be on the same cylinder for most cases and thus
will get transactions B-E also committed on the same rev.

3. make sure that the last version will also be written to its proper
place before the end of log will overwrite P+N. (This may be tricky.)

4. When restoring from WAL, always check for a page at EndPos+N for a
possible newer version of last page.

This scheme requires page numbers+page versions to be stored in each
page and could get us near 1 trx/backend/rev performance, but it's hard
to tell if it is really useful in real life.

This could also possibly be extended to more than one end page and
more than one continuation end page copy to get better than 1
trx/backend/rev.

-
Hannu



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 This is the trickle syncer.  It prevents bursts of disk activity every
 30 seconds.  It is for non-fsync writes, of course, and I assume if the
 kernel buffers get low, it starts to flush faster.

AFAICT, the syncer only speeds up when virtual memory paging fills the
buffers past
a threshold and even in that event it only speeds it up by a factor of two.

I can't find any provision for speeding up flushing of the dirty buffers
when they fill for normal file system writes, so I don't think that
happens.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing

On Tue, 2002-10-08 at 01:27, Tom Lane wrote:

 The scheme we now have (with my recent patch) essentially says that the
 commit delay seen by any one transaction is at most two disk rotations.
 Unfortunately it's also at least one rotation :-(, except in the case
 where there is no contention, ie, no already-scheduled WAL write when
 the transaction reaches the commit stage.  It would be nice to be able
 to say at most one disk rotation instead --- but I don't see how to
 do that in the absence of detailed information about disk head position.
 
 Something I was toying with this afternoon: assume we have a background
 process responsible for all WAL writes --- not only filled buffers, but
 the currently active buffer.  It periodically checks to see if there
 are unwritten commit records in the active buffer, and if so schedules
 a write for them.  If this could be done during each disk rotation,
 just before the disk reaches the active WAL log block, we'd have an
 ideal solution.  And it would not be too hard for such a process to
 determine the right time: it could measure the drive rotational speed
 by observing the completion times of successive writes to the same
 sector, and it wouldn't take much logic to empirically find the latest
 time at which a write can be issued and have a good probability of
 hitting the disk on time.  (At least, this would work pretty well given
 a dedicated WAL drive, else there'd be too much interference from other
 I/O requests.)
 
 However, this whole scheme falls down on the same problem we've run into
 before: user processes can't schedule themselves with millisecond
 accuracy.  The writer process might be able to determine the ideal time
 to wake up and make the check, but it can't get the Unix kernel to
 dispatch it then, at least not on most Unixen.  The typical scheduling
 slop is one time slice, which is comparable to if not more than the
 disk rotation time.

Standard for Linux has been 100Hz time slice, but it is configurable for
some time.

The latest RedHat (8.0) is built with 500Hz that makes about 4
slices/rev for 7200 rpm disks (2 for 15000rpm)
 
 ISTM aio_write only improves the picture if there's some magic in-kernel
 processing that makes this same kind of judgment as to when to issue the
 ganged write for real, and is able to do it on time because it's in
 the kernel.  I haven't heard anything to make me think that that feature
 actually exists.  AFAIK the kernel isn't much more enlightened about
 physical head positions than we are.

At least for open source kernels it could be possible to

1. write a patch to kernel 

or 

2. get the authors of kernel aio interested in doing it.

or

3. the third possibility would be using some real-time (RT) OS or mixed
RT/conventional OS where some threads can be scheduled for hard-RT .
In an RT os you are supposed to be able to do exactly what you describe.


I think that 2 and 3 could be outsourced (the respective developers
talked into supporting it) as both KAIO and RT Linuxen/BSDs are probably
also inetersted in high-profile applications so they could boast that
using our stuff enabled PostgreSQL database run twice as fast.

Anyway, getting to near-harware speeds for database will need more
specific support from OS than web browsing or compiling.

---
Hannu








---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith

 Well, too bad.  If you haven't gotten your commit record down to disk,
 then *you have not committed*.  This is not negotiable.  (If you think
 it is, then turn off fsync and quit worrying ;-))

I've never disputed this, so if I seem to be suggesting that, I've beee
unclear. I'm just assuming that the disk can get a confirmation back to the
INSERTing process in much less than one rotation. This would allow that
process to start working again, perhaps in time to complete another
transaction.

 An application that is willing to have multiple transactions in flight
 at the same time can open up multiple backend connections to issue those
 transactions, and thereby perhaps beat the theoretical limit.  But for
 serial transactions, there is not anything we can do to beat that limit.
 (At least not with the log structure we have now.  One could imagine
 dropping a commit record into the nearest one of multiple buckets that
 are carefully scattered around the disk.  But exploiting that would take
 near-perfect knowledge about disk head positioning; it's even harder to
 solve than the problem we're considering now.)

Consider the following scenario:

Time measured in disk rotations.

Time 1.00 - Process A Commits - Causing aio_write to log and wait
Time 1.03 - aio_completes for Process A write - wakes process A
Time 1.05 - Process A Starts another transaction.
Time 1.08 - Process A Commits
etc.

I agree that a process can't proceed from a commit until it receives
confirmation of the write, but if the write has hit the disk before a full
rotation then the process should be able to continue processing new
transactions

 You're failing to distinguish total throughput to the WAL drive from
 response time seen by any one transaction.  Yes, a policy of writing
 each WAL block once when it fills would maximize potential throughput,
 but it would also mean a potentially very large delay for a transaction
 waiting to commit.  The lower the system load, the worse the performance
 on that scale.

You are assuming fsync or fdatasync behavior, I am not. There would be no
delay under the scenario I describe. The transaction would exit commit as
soon as the confirmation of the write is received from the aio system. I
would hope that with a decent aio implementation this would generally be
much less than one rotation.

I think that the single transaction response time is very important and
that's one of the chief problems I sought to solve when I proposed
aio_writes for logging in my original email many moons ago.

 ISTM aio_write only improves the picture if there's some magic in-kernel
 processing that makes this same kind of judgment as to when to issue the
 ganged write for real, and is able to do it on time because it's in
 the kernel.  I haven't heard anything to make me think that that feature
 actually exists.  AFAIK the kernel isn't much more enlightened about
 physical head positions than we are.

All aio_write has to do is pass the write off to the device as soon as it
aio_write gets it bypassing the system buffers. The code on the disk's
hardware is very good at knowing when the disk head is coming. IMHO,
bypassing the kernel's less than enlightened writing system is the main
point of using aio_write.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Greg Copeland

On Mon, 2002-10-07 at 16:06, Curtis Faith wrote:
  Well, too bad.  If you haven't gotten your commit record down to disk,
  then *you have not committed*.  This is not negotiable.  (If you think
  it is, then turn off fsync and quit worrying ;-))
 

At this point, I think we've come full circle.  Can we all agree that
this concept is a *potential* source of improvement in a variety of
situations?  If we can agree on that, perhaps we should move to the next
stage in the process, validation?

How long do you think it would take to develop something worthy of
testing?  Do we have known test cases which will properly (in)validate
the approach that everyone will agree to?  If code is reasonably clean
so as to pass the smell test and shows a notable performance boost, will
it be seriously considered for inclusion?  If so, I assume it would
become a configure option (--with-aio)?


Regards,

Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Justin Clift

Greg Copeland wrote:
snip
 If so, I assume it would become a configure option (--with-aio)?

Or maybe a GUC use_aio ?

:-)

Regards and best wishes,

Justin Clift

 
 Regards,
 
 Greg
 
   
Name: signature.asc
signature.asc   Type: application/pgp-signature
 Description: This is a digitally signed message part

-- 
My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there.
   - Indira Gandhi

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Greg Copeland

On Mon, 2002-10-07 at 15:28, Bruce Momjian wrote:
 This is the trickle syncer.  It prevents bursts of disk activity every
 30 seconds.  It is for non-fsync writes, of course, and I assume if the
 kernel buffers get low, it starts to flush faster.

Doesn't this also increase the likelihood that people will be running in
a buffer-poor environment more frequently that I previously asserted,
especially in very heavily I/O bound systems?  Unless I'm mistaken, that
opens the door for a general case of why an aio implementation should be
looked into.

Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
elevator which is said to be MUCH better as saturating disks than ever
before.  Once 2.6 (or whatever it's number will be) is released, it may
not be as much of a problem as it seems to be for FreeBSD (I think
that's the one you're using).


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane

Curtis Faith [EMAIL PROTECTED] writes:
 Well, too bad.  If you haven't gotten your commit record down to disk,
 then *you have not committed*.  This is not negotiable.  (If you think
 it is, then turn off fsync and quit worrying ;-))

 I've never disputed this, so if I seem to be suggesting that, I've beee
 unclear. I'm just assuming that the disk can get a confirmation back to the
 INSERTing process in much less than one rotation.

You've spent way too much time working with lying IDE drives :-(

Do you really trust a confirm-before-write drive to make that write if
it loses power?  I sure don't.

If you do trust your drive to hold that data across a crash, then ISTM
the whole problem goes away anyway, as writes will complete quite
independently of disk rotation.  My Linux box has no problem claiming
that it's completing several thousand TPS with a single client ... and
yes, fsync is on, but it's using an IDE drive, and I don't know how to
disable confirm-before-write on that drive.  (That's why I did these
tests on my old slow HP hardware.)  Basically, the ganging of commit
writes happens inside the disk controller on a setup like that.  You
still don't need aio_write --- unless perhaps to reduce wastage of IDE
bus bandwidth by repeated writes, but that doesn't seem to be a scarce
resource in this context.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Greg Copeland

Well, I was thinking that aio may not be available on all platforms,
thus the conditional compile option.   On the other hand, wouldn't you
pretty much want it either on or off for all instances?  I can see that
it would be nice for testing though.  ;)

Greg

On Mon, 2002-10-07 at 16:23, Justin Clift wrote:
 Greg Copeland wrote:
 snip
  If so, I assume it would become a configure option (--with-aio)?
 
 Or maybe a GUC use_aio ?
 
 :-)
 
 Regards and best wishes,
 
 Justin Clift
 
  
  Regards,
  
  Greg
  

 Name: signature.asc
 signature.asc   Type: application/pgp-signature
  Description: This is a digitally signed message part
 
 -- 
 My grandfather once told me that there are two kinds of people: those
 who work and those who take the credit. He told me to try to be in the
 first group; there was less competition there.
- Indira Gandhi




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] 7.2.3 patching done

2002-10-07 Thread Alvaro Herrera

On Mon, Oct 07, 2002 at 09:22:38PM +0200, Peter Eisentraut wrote:
 Tom Lane writes:
 
  But the source distribution hasn't *got* any binary files.
 
 There are some under doc/src/graphics, and then there are
 doc/postgres.tar.gz and doc/man.tar.gz.

And what about publishing xdelta patches?

-- 
Alvaro Herrera (alvherre[a]dcc.uchile.cl)
Escucha y olvidarás; ve y recordarás; haz y entenderás (Confucio)

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith

 I may be missing something obvious, but I don't see a way to get more
 than 1 trx/process/revolution, as each previous transaction in that
 process must be written to disk before the next can start, and the only
 way it can be written to the disk is when the disk heads are on the
 right place and that happens exactly once per revolution.

Okay, consider the following scenario.

1) Process A commits when the platter is at 0 degrees.
2) There are enough XLog writes from other processes to fill 1/4 platter
rotation worth of log or 90 degrees. The SCSI drive writes the XLog commit
record and keeps writing other log entries as the head rotates.
3) Process A receives a confirmation of the write before the platter
rotates 60 degrees.
4) Process A continues and adds another commit before the platter rotates
to 90 degrees.

This should be very possible and more and more likely in the future as CPUs
get faster and faster relative to disks.

I'm not suggesting this would happen all the time, just that it's possible
and that an SMP machine with good CPUs and a fast I/O subsystem should be
able to keep the log writing at close to I/O bandwidth limits.

The case of bulk inserts is one where I would expect that for simple tables
we should be able to peg the disks given today's hardware and enough
inserting processes.

- Curtis


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



[HACKERS] Where to call SetQuerySnapshot

2002-10-07 Thread Tom Lane

I noticed that the new EXECUTE statement does not call SetQuerySnapshot,
which seems like a bad thing.  The omission is masked by the defensive
code in CopyQuerySnaphot, which will automatically do SetQuerySnapshot
if it hasn't been done yet in the current transaction.  However, this
doesn't provide the right behavior in read-committed mode: if we are
inside a transaction and not the first query, I'd think EXECUTE should
take a new snapshot; but it won't.

Comparable code can be found in COPY OUT, for which tcopy/utility.c
does SetQuerySnapshot() before calling the command-specific routine.

Questions for the list:

1. Where is the cleanest place to call SetQuerySnapshot() for utility
statements that need it?  Should we follow the lead of the existing
COPY code, and add the call to the ExecuteStmt case in utility.c?
Or should we move the calls into the command-specific routines (DoCopy
and ExecuteQuery)?  Or perhaps it should be done in postgres.c, which
has this responsibility for non-utility statements?

2. Would it be a good idea to change CopyQuerySnapshot to elog(ERROR)
instead of silently creating a snapshot when none has been made?
I think I was the one who put in its auto-create-snapshot behavior,
but I'm now feeling like that was a mistake.  It hides omissions that
we should want to find.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] Proposed LogWriter Scheme, WAS: Potential Large

2002-10-07 Thread Ken Hirsch


I sent this yesterday, but it seems not to have made it to the list... 


I have a couple of comments orthogonal to the present discussion. 

1) It would be fairly easy to write log records over a network to a 
dedicated process on another system.  If the other system has an 
uninterruptible power supply, this is about as safe as writing to disk. 

This would get rid of the need for any fsync on the log at all.  There 
would be extra code needed on restart to get the end of the log from the 
other system, but it doesn't seem like much. 

I think this would be an attractive option to a lot of people.  Most 
people have at least two systems, and the requirements of the logging 
system would be minimal. 


2) It is also possible, with kernel modifications, to have special 
logging partitions where log records are written where the head is. 
Tzi-cker Chueh and Lan Huang at Stony Brook 
(http://www.cs.sunysb.edu/~lanhuang/research.htm) have written this, 
although I don't think they have released any code. 

(A similar idea called WADS is mentioned in Gray  Reuter's book.) 

If the people at Red Hat are interested in having some added value for 
using PostgreSQL on Red Hat Linux, this would be one idea.  It could 
also be used to speed up ext3 and other journaling file systems. 





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 Greg Copeland [EMAIL PROTECTED] writes:
  Doesn't this also increase the likelihood that people will be
  running in a buffer-poor environment more frequently that I
  previously asserted, especially in very heavily I/O bound
  systems?  Unless I'm mistaken, that opens the door for a
  general case of why an aio implementation should be looked into.

Neil Conway replies:
 Well, at least for *this specific sitation*, it doesn't really change
 anything -- since FreeBSD doesn't implement POSIX AIO as far as I
 know, we can't use that as an alternative.

I haven't tried it yet but there does seem to be an aio implementation that
conforms to POSIX in FreeBSD 4.6.2.  Its part of the kernel and can be
found in:
/usr/src/sys/kern/vfs_aio.c

 However, I'd suspect that the FreeBSD kernel allows for some way to
 tune the behavior of the syncer. If that's the case, we could do some
 research into what settings are more appropriate for FreeBSD, and
 recommend those in the docs. I don't run FreeBSD, however -- would
 someone like to volunteer to take a look at this?

I didn't see anything obvious in the docs but I still believe there's some
way to tune it. I'll let everyone know if I find some better settings.

 BTW Curtis, did you happen to check whether this behavior has been
 changed in FreeBSD 5.0?

I haven't checked but I will.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Neil Conway

Greg Copeland [EMAIL PROTECTED] writes:
 Doesn't this also increase the likelihood that people will be running in
 a buffer-poor environment more frequently that I previously asserted,
 especially in very heavily I/O bound systems?  Unless I'm mistaken, that
 opens the door for a general case of why an aio implementation should be
 looked into.

Well, at least for *this specific sitation*, it doesn't really change
anything -- since FreeBSD doesn't implement POSIX AIO as far as I
know, we can't use that as an alternative.

However, I'd suspect that the FreeBSD kernel allows for some way to
tune the behavior of the syncer. If that's the case, we could do some
research into what settings are more appropriate for FreeBSD, and
recommend those in the docs. I don't run FreeBSD, however -- would
someone like to volunteer to take a look at this?

BTW Curtis, did you happen to check whether this behavior has been
changed in FreeBSD 5.0?

 Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
 elevator which is said to be MUCH better as saturating disks than ever
 before.

Yeah, there are lots of new  interesting features for database
systems in the new kernel -- I'm looking forward to when 2.6 is widely
deployed...

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
  This is the trickle syncer.  It prevents bursts of disk activity every
  30 seconds.  It is for non-fsync writes, of course, and I assume if the
  kernel buffers get low, it starts to flush faster.
 
 AFAICT, the syncer only speeds up when virtual memory paging fills the
 buffers past
 a threshold and even in that event it only speeds it up by a factor of two.
 
 I can't find any provision for speeding up flushing of the dirty buffers
 when they fill for normal file system writes, so I don't think that
 happens.

So you think if I try to write a 1 gig file, it will write enough to
fill up the buffers, then wait while the sync'er writes out a few blocks
every second, free up some buffers, then write some more?

Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
it can't get a buffer, it will async write a dirty buffer to disk.

As far as this AIO conversation is concerned, I want to see someone come
up with some performance improvement that we can only do with AIO. 
Unless I see it, I am not interested in pursuing this thread.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] [pgsql-performance] [GENERAL] Large databases, performance

2002-10-07 Thread Shridhar Daithankar

On 7 Oct 2002 at 11:21, Tom Lane wrote:

 Shridhar Daithankar [EMAIL PROTECTED] writes:
  I say if it's a char field, there should be no indicator of length as
  it's not required. Just store those many characters straight ahead..
 
 Your assumption fails when considering UNICODE or other multibyte
 character encodings.

Correct but is it possible to have real char string when database is not 
unicode or when locale defines size of char, to be exact?

In my case varchar does not make sense as all strings are guaranteed to be of 
defined length. While the argument you have put is correct, it's causing a disk 
space leak, to say so.

Bye
 Shridhar

--
Boucher's Observation:  He who blows his own horn always plays the music
several 
octaves higher than originally written.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [GENERAL] [HACKERS] Hot Backup

2002-10-07 Thread Shridhar Daithankar

On 7 Oct 2002 at 13:48, Neil Conway wrote:

 Sandeep Chadha [EMAIL PROTECTED] writes:
  Postgresql has been lacking this all along. I've installed postgres
  7.3b2 and still don't see any archive's flushed to any other
  place. Please let me know how is hot backup procedure implemented in
  current 7.3 beta(2) release.
 AFAIK no such hot backup feature has been implemented for 7.3 -- you
 appear to have been misinformed.

Is replication an answer to hot backup?

Bye
 Shridhar

--
ink, n.:A villainous compound of tannogallate of iron, gum-arabic,  and 
water, 
chiefly used to facilitate the infection of idiocy and promote intellectual 
crime.  -- H.L. Mencken


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly