date:20060623

Tom Lane wrote:
 Dave Page dpage@vale-housing.co.uk writes:
  though - Magnus 
  I were wondering if Peter's change means we no longer need to ship
  postmaster.exe and postgres.exe with pgInstaller. Presumably 
  we can just use postgres.exe for everything now?
 
  Won't we still need to know if we are called as postmaster or 
  postgres?
 
 No.  The entire point of the recent changes is that the behavior no
 longer depends on the name of the executable, only on the switches.
 
 In the Unix distributions, the only reason to keep the postmaster
 symlink is to avoid breaking old start scripts that invoke postmaster.
 We may be able to drop the symlink eventually, though I see no reason
 to be in a hurry about it.
 
 In the Windows case, I think you'd have to ask if there are any start-
 script-equivalents outside your control that you're worried about
 breaking.  Given the distribution-size penalty you face by having two
 copies, obviously you're more motivated to drop the extra .exe sooner
 than we'll probably do in the Unix distros.

Can't the installer just copy postgres.exe to postmaster.exe during
install?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Andrew Dunstan
 Sent: 22 June 2006 23:09
 To: Tom Lane
 Cc: Bort, Paul; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [CORE] GPL Source and Copyright Questions
 
 
 
 Something has broken Cygwin in the last 18 days ;-(

Is there any real reason to continue to support Cygwin? We've always
said it's not a first class port, and now we have the native port which
is it seems somewhat pointless expending further effort on it.

Regards, Dave.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] postmaster.exe vs postgres.exe (was: CVS HEAD busted on Windows?)

 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
 Sent: 23 June 2006 07:09
 To: Tom Lane
 Cc: Dave Page; Andrew Dunstan; Peter Eisentraut; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] postmaster.exe vs postgres.exe (was: 
 CVS HEAD busted on Windows?)

 Can't the installer just copy postgres.exe to postmaster.exe during
 install?

That's not something that Windows Installer does - we'd have to write
some code to do it at the end of the installation, then call it as a
custom action. Actually it'd probably be fairly trivial, but I'm having
a hard time imagining why anyone would be relying on the existence of
postmaster.exe anyway, unless they were packaging their own release in
which case it's their problem anyhoo.

Regards, Dave.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-23 Thread Magnus Hagander

Andrew Dunstan [EMAIL PROTECTED] writes:
 Frankly this patch has significant infelicities. For example, 
 what is the reason for removing the standard protection 
 against double inclusion that header files should usually 
 have from pg_config.h.win32?

I've got to admit, I don't recall that. It may be an oversight - I keep
mixing up pg_config.h.win32 and port.h and port/win32.h in my head ;-) 

You will notice that the other two of those don't have it.

But it shouldn't be a problem - AFAIK it's only loaded from c.h, and
that one has protection.


 I assume that no test was done to see if this broke Cygwin, 
 despite the fact that if you have a Windows box to test on, 
 checking that you haven't broken Cygwin should not be too 
 difficult..

Given the amount of damage I've seen it do, Cygwin is definitly *not*
making it onto my development machines. So no, I didn't build it on
cygwin. However, I didn't intend for it to break of course - I tried
manual checking. Which I obviously didn't do good enough.

I'll try to set it up in a VM for testing.

 And why do win32 and cygwin now not include at all pg_config_os.h?

It's a way to avoid the step to copy win32\port.h in msvc. configure
copies it to pg_config_os.h. Since for win32 platforms (unfortunatly, at
this point it considers cygwin win32..)that will always be port/win32.h,
it explicitly includes that one instead.


Tom writes:

 Magnus, this was your patch, can you see about fixing the 
 collateral damage to the Cygwin build?

Will do. May be a ocuple of days before I can set up a VM with cygwin,
but I'll try to get it done as soon as I can.

//Magnus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Shared library conflicts

2006-06-23 Thread Thomas Hallgren

There was a discussion some time back concerning the linking of the 
postgres backend. Some libraries where linked although they where not 
needed, mainly because it was convenient. I had a problem with PL/Java 
since a Sun JVM ships with their own version of libz.so (they call it 
libzip.so). Sun fixed the problem, simply by adding a namespace to all 
functions in their own library so I didn't stress the issue any further. 
Now this bites me again. Sun forgot to remove the old functions in their 
libzip.so on the amd64 platform.


I have of course reported this to Sun and I expect it to be fixed in 
future releases of Java but shouldn't this be fixed for PostgreSQL too? 
Perhaps we should consider this a more generic problem. The more 
libraries that are linked with the backend, the greater the risk that 
add-on modules will run into conflicts. What effort would be involved to 
fix this once and for all?


Another related question. What happens when I use --without-zlib? Does 
it have any effect on besides disabling compression for the dump/restore 
utilities? Is there anyway to make it affect the backend only?


Kind Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-23 Thread Michael Meskes

On Thu, Jun 22, 2006 at 11:37:08AM -0400, Tom Lane wrote:
  item #3: Carsten Wolff copyright in informix.c file
  The file informix.c contains a copyright from Carsten Wolff.  Did Carsten 
  directly contribute this file to the PostgreSQL project?
 
 This code was added by Michael Meskes in informix.c rev 1.6
 (2003-05-06).  Michael, any info on the exact copyright status?

Yes. In fact the copyright belongs to credativ GmbH the company that
paid Carsten for his work. As you may or may not know I'm the CEO of
that company and can assure you that his work was contributed to the
PostgreSQL project.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] vacuum, performance, and MVCC

  Good advice, except if the table is huge :-)
 
 ... Then the table shouldn't be designed to be huge.  That represents
 a design error.
[snip]
 This demonstrates that archival material and active data should be
 kept separately.
 
 They have different access patterns; kludging them into the same table
 turns out badly.

Well, then please help me find a better design cause I can't see one...
what we have here is a big membership table of email lists. When
there's a sendout then the memberships of the affected group are heavily
read/updated, otherwise they are idle. None of the memberships is
archive data, they are all active data... the only problem is that they
are so many. Is it so hard to believe that 100 million rows is all
active data, but only used in bursts once per week (that's an example,
some groups are more active, others less) ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread PFC




Well, then please help me find a better design cause I can't see one...
what we have here is a big membership table of email lists. When
there's a sendout then the memberships of the affected group are heavily
read/updated, otherwise they are idle. None of the memberships is
archive data, they are all active data... the only problem is that they
are so many. Is it so hard to believe that 100 million rows is all
active data, but only used in bursts once per week (that's an example,
some groups are more active, others less) ?


	I suppose you have a table memberships (user_id, group_id) or something  
like it ; it should have as few columns as possible ; then try regularly  
clustering on group_id (maybe once a week) so that all the records for a  
particular group are close together. Getting the members of a group to  
send them an email should be faster (less random seeks).


	For tables with very few small fields (like a few integers) the  
26-something bytes row overhead is significant ; MySQL can be faster  
because MyISAM tables have no transaction support and thus have very  
little things to store besides actual row data, and the table can then fit  
in RAM...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

   I suppose you have a table memberships (user_id, group_id) or something 
  
 like it ; it should have as few columns as possible ; then try regularly  
 clustering on group_id (maybe once a week) so that all the records for a  
 particular group are close together. Getting the members of a group to  
 send them an email should be faster (less random seeks).

It is like this, and some more bookkeeping data which must be there...
we could split the table for smaller records or for updatable/stable
fields, but at the end of the day it doesn't make much sense, usually
all the data is needed and I wonder if more big/shallow tables instead
of one big/wider makes sense...

Regularly clustering is out of question as it would render the system
unusable for hours. There's no 0 activity hour we could use for such
stuff. There's always something happening, only the overall load is
smaller at night...

Thanks,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Luke Lonergan

Csaba, 

 Regularly clustering is out of question as it would render 
 the system unusable for hours. There's no 0 activity hour 
 we could use for such stuff. There's always something 
 happening, only the overall load is smaller at night...

We are planning to implement a btree organized table, which keeps the
data organized by storing it with the index and using btree insertion to
store the data.  This will make your design more attractive.

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Shared library conflicts

2006-06-23 Thread William ZHANG


Thomas Hallgren [EMAIL PROTECTED]

 Another related question. What happens when I use --without-zlib? Does it 
 have any effect on besides disabling compression for the dump/restore 
 utilities? Is there anyway to make it affect the backend only?

--without-zlib will affected LIBS and HAVE_LIBZ defined in configure.
I searched HAVE_LIBZ in the source code, and find only pg_dump use it.
I think we can change LIBS to not include -lz in configure, and add -lz
for pg_dump in its Makefile.

With regards,
William ZHANG


 Kind Regards,
 Thomas Hallgren


 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

   http://archives.postgresql.org
 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

 The example is a very active web site, the flow is this:

 query for session information
 process HTTP request
 update session information

 This happens for EVERY http request. Chances are that you won't have
 concurrent requests for the same row, but you may have well over 100
 HTTP
 server processes/threads answering queries in your web server farm.

 You're crazy :)  Use memcache, not the DB :)


I actually have what I consider a better and more complete session handler
system. MCache formally MSession. (http://www.mohawksoft.org/?q=node/8) I
mean, it implements a LOT of nifty features, loadable function modules,
collision safe counters and operators, ability to save session data to
file or SQL database and at varying levels of caching, but that doesn't
mean it is used.

Technologies like memcached and my mcache are a separate data store. Your
session data is not usable anywhere but in your web system. I have gone as
far as to write a session serializer for PHP that outputs XML, a
PostgreSQL plugin that can extract data from the XML session string, and a
set of functions for interfacing mcache with PostgreSQL and I still have a
hard time convincing clients that this is the right way to go.

While we all know session data is, at best, ephemeral, people still want
some sort of persistence, thus, you need a database. For mcache I have a
couple plugins that have a wide range of opitions, from read/write at
startup and shut down, to full write through cache to a database.

In general, my clients don't want this, they want the database to store
their data. When you try to explain to them that a database may not be the
right place to store this data, they ask why, sadly they have little hope
of understanding the nuances and remain unconvinced.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Shared library conflicts

On Fri, Jun 23, 2006 at 10:45:29AM +0200, Thomas Hallgren wrote:
 There was a discussion some time back concerning the linking of the 
 postgres backend. Some libraries where linked although they where not 
 needed, mainly because it was convenient.

AIUI, this was fixed in -HEAD. In the Makefile there's a filter command
to strip libraries not needed. One of them is libz.

From src/backend/Makefile:
# The backend doesn't need everything that's in LIBS, however
LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, 
$(LIBS))

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Shared library conflicts

2006-06-23 Thread Thomas Hallgren

That's good news. Any chance of getting this fix backported to 8.1? Or 
at least, the libz part of it?


Regards,
Thomas Hallgren


Martijn van Oosterhout wrote:

On Fri, Jun 23, 2006 at 10:45:29AM +0200, Thomas Hallgren wrote:
  
There was a discussion some time back concerning the linking of the 
postgres backend. Some libraries where linked although they where not 
needed, mainly because it was convenient.



AIUI, this was fixed in -HEAD. In the Makefile there's a filter command
to strip libraries not needed. One of them is libz.

From src/backend/Makefile:
# The backend doesn't need everything that's in LIBS, however
LIBS := $(filter-out -lz -lreadline -ledit -ltermcap -lncurses -lcurses, 
$(LIBS))

Have a nice day,
  



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] vacuum, performance, and MVCC

  I suppose you have a table memberships (user_id, group_id) or something
 like it ; it should have as few columns as possible ; then try regularly
 clustering on group_id (maybe once a week) so that all the records for a
 particular group are close together. Getting the members of a group to
 send them an email should be faster (less random seeks).

 It is like this, and some more bookkeeping data which must be there...
 we could split the table for smaller records or for updatable/stable
 fields, but at the end of the day it doesn't make much sense, usually
 all the data is needed and I wonder if more big/shallow tables instead
 of one big/wider makes sense...

 Regularly clustering is out of question as it would render the system
 unusable for hours. There's no 0 activity hour we could use for such
 stuff. There's always something happening, only the overall load is
 smaller at night...


Let me ask a question, you have this hundred million row table. OK, how
much of that table is read/write? Would it be posible to divide the
table into two (or more) tables where one is basically static, only
infrequent inserts and deletes, and the other is highly updated?

The big thing in performance is the amount of disk I/O, if you have a
smaller active table with only a single index, then you may be able to cut
your disk I/O time really down. The smaller the row size, the more rows
fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
the bbetter the performance.

Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
frequently because the indexes grow and vacuuming them doesnt remove
everything, sometimes a REINDEX or a drop/recreate is the only way to get
performance back. So if you wait too long between vacuums, your indexes
grow  and spread across more disk blocks than they should and thus use
more disk I/O to search and/or shared memory to cache.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs

On Thu, 2006-06-22 at 14:57 -0300, Diogo Biazus wrote:
 Agree, the project must choose one path as the starting point. But the
 two options can be given in the long run.

I'm acting as Diogo's mentor for the SoC, so I'm trying to let Diogo
discuss his ideas in the community manner without too much steering.

Diogo's ideas are interesting - they aren't the way I would have done it
either, but that doesn't mean we shouldn't consider this alternative
approach.

 I still think that as a starting point the functions inside the
 database are a good option.

Yes, if we use SRF functions for this, ISTM they are the best place for
them. 

 The reasons are: 
 - using SQL to agregate and transform data in any way from the logs.

That is a major point here. If the xlogdump is purely a stand-alone
program that it will be much less functionally rich and as Tom mentions,
there are other reasons for having access to a server.

 - it's easier for the DBA in the other use cases where the cluster is
 still active. 

Good point.

 - give more flexibility for managing the xlogs remotely

Not sure what you mean.

 - I think it's faster to implement and to have a working and usable
 tool.

Why do you think that? It sounds like you've got more work since you
effectively need to rewrite the _desc routines.

 And there is one option to minimize the problem in the failed cluster
 case: the wrapper program could give the option to initdb a temporary
 area when no connection is given, creating a backend just to analyze a
 set of xlogs. 

It seems a reasonable assumption that someone reading PostgreSQL logs
would have access to another PostgreSQL cluster. It obviously needs to
work when the server that originated the logs is unavailable, but that
does not mean that all PostgreSQL systems are unavailable. There's no
need to try to wrap initdb - just note that people would have to have
access to a PostgreSQL system.

 Other option is to start by the standalone tool and create a wrapper
 function inside postgresql that would just call this external program
 and extract data from the xlogs using this program's output (with some
 option to output all data in a CSV format). 

I think this idea is a good one, but we must also consider whether is
can be done effectively within the time available. Is this: can do now
or want to do in future?

The alternative of reinforcing xlogdump needs to be considered more
fully now and quickly, so coding can begin as soon as possible. 
- Diogo: what additional things can you make xlogdump do?
- Tom: can you say more about what you'd like to see from a tool, to
help Diogo determine the best way forward. What value can he add if you
have already written the tool?


Some other considerations:
The biggest difficulty is finding loser transactions - ones that have
not yet committed by the end of the log. You need to do this in both
cases if you want to allow transaction state to be determined precisely
for 100% of transactions; otherwise you might have to have an Unknown
transaction state in addition to the others.

What nobody has mentioned is that connecting to a db to lookup table
names from OIDs is only possible if that db knows about the set of
tables the log files refer to. How would we be certain that the
OID-to-tablename match would be a reliable one?

--
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] vacuum, performance, and MVCC

 Let me ask a question, you have this hundred million row table. OK, how
 much of that table is read/write? Would it be posible to divide the
 table into two (or more) tables where one is basically static, only
 infrequent inserts and deletes, and the other is highly updated?

Well, all of it is read write... some of the data might be updated less
frequently, but there's no way I would know which part of the data is
that. Logically is just the same type of data... so unless I find a way
to continuously move back and forth the data between an archive table
and the live table, based on how active the groups are, I can't imagine
any other way of partitioning it. And that would also mean some quite
big load given the pretty high dynamics of the groups.

 The big thing in performance is the amount of disk I/O, if you have a
 smaller active table with only a single index, then you may be able to cut
 your disk I/O time really down. The smaller the row size, the more rows
 fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
 the bbetter the performance.

I agree, but it is quite hard to achieve that when the data set is both
big AND the partitioning criteria is highly dynamic. Not to mention that
deleting from that table is also a PITA performance-wise, so I wonder
how well the continuous back and forth between the active and inactive
table would do.

 Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
 frequently because the indexes grow and vacuuming them doesnt remove
 everything, sometimes a REINDEX or a drop/recreate is the only way to get
 performance back. So if you wait too long between vacuums, your indexes
 grow  and spread across more disk blocks than they should and thus use
 more disk I/O to search and/or shared memory to cache.

This is nice in theory, but kills performance. I vacuum the big tables
only overnight, otherwise the server is sluggish.

Cheers,
Csaba.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-23 Thread Andrew Dunstan




Dave Page wrote:



Is there any real reason to continue to support Cygwin? We've always
said it's not a first class port, and now we have the native port which
is it seems somewhat pointless expending further effort on it.


 



Some people still use it for development, I believe. Similar arguments 
were used against our having the Win32 port in the first place. 
Personally I am inclined to support whatever platforms we reasonably 
can, and leave the choice to users rather than make it for them.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Zeugswetter Andreas DCP SD


 back and forth the data between an archive table and the live 
 table, based on how active the groups are, I can't imagine 
 any other way of partitioning it. And that would also mean 
 some quite big load given the pretty high dynamics of the groups.

You said the activity comes in bursts per group, so the obvious
partitioning would be per group.
If you have too many groups to have one partition per group you could
try to find some modulo or other rule to spread them into separate
partitions.

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

 Let me ask a question, you have this hundred million row table. OK, how
 much of that table is read/write? Would it be posible to divide the
 table into two (or more) tables where one is basically static, only
 infrequent inserts and deletes, and the other is highly updated?

 Well, all of it is read write... some of the data might be updated less
 frequently, but there's no way I would know which part of the data is
 that. Logically is just the same type of data... so unless I find a way
 to continuously move back and forth the data between an archive table
 and the live table, based on how active the groups are, I can't imagine
 any other way of partitioning it. And that would also mean some quite
 big load given the pretty high dynamics of the groups.

 The big thing in performance is the amount of disk I/O, if you have a
 smaller active table with only a single index, then you may be able to
 cut
 your disk I/O time really down. The smaller the row size, the more rows
 fit into a block. The fewer blocks the less dissk I/O. The less disk I/O
 the bbetter the performance.

 I agree, but it is quite hard to achieve that when the data set is both
 big AND the partitioning criteria is highly dynamic. Not to mention that
 deleting from that table is also a PITA performance-wise, so I wonder
 how well the continuous back and forth between the active and inactive
 table would do.

 Also, and anyone listening correct me if I'm wrong, you NEED to vacuum
 frequently because the indexes grow and vacuuming them doesnt remove
 everything, sometimes a REINDEX or a drop/recreate is the only way to
 get
 performance back. So if you wait too long between vacuums, your indexes
 grow  and spread across more disk blocks than they should and thus use
 more disk I/O to search and/or shared memory to cache.

 This is nice in theory, but kills performance. I vacuum the big tables
 only overnight, otherwise the server is sluggish.

Well, the only thing left is to cluster the database. There are a couple
ways to do this, one switch to a platform that supports clustering or
create an API to wrap multiple databases. If your queries are simple and
limited, you could create an HTTP/XML service that wraps a number of
postgresql databases, issues a query across all databases, merges multiple
query sets, and returns one homoginous stream.

Inserts would be handled by hash to machine weighted by number of records
on each machine.

Updates and deletes would have two keys, machine and ID.

It sounds like you have a big problem and you need a big solution.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Planning without reason.

Hi,
I think there is a bug/misscalculation of some rare query i am using.

Suppose we only query one specific relation R.

R contains indices but not on all attributes or not on 
all ordered subset of keys.

Query example:
(SELECT * FROM R
WHERE a=3, b=6,. ...)
UNION
(SELECT * FROM R
WHERE b=5, d=2,. ...)
UNION

And lots of unions.

When doing explain analyze i see that some nodes in the plan uses an index
and some uses a sequential scan (where the WHERE clause made it impossible
to use an index).
As you can see, having even one sequential scan should nullify the whole plan
to using only one node of sequential scan.
Currently, the planner does not seem to understand that.
I circumvent the problem by doing a recursive tree search of the plan
and checking if there is a node of sequential scan (and redefine the query
as a sequential scan) but obviously it is prefferable that the planner will do 
this.


P.s.:
I am currently just writing the query as a string and open a cursor.
Is there a simple way to use Datums instead of converting the attributes to 
strings to create a plan for SPI.
10x.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

 You said the activity comes in bursts per group, so the obvious
 partitioning would be per group.
 If you have too many groups to have one partition per group you could
 try to find some modulo or other rule to spread them into separate
 partitions.

This could be a solution... but then I'm not sure how well would do
queries which need the first 10 records based on some criteria which
does not include the group id. I guess limit queries across the union of
the partitions don't work too well for now, and we do have such queries.
I'm pretty sure we could work this out, but it would need some big
refactoring of our current code which is not that simple... and it must
work well with oracle too. We do have systems on Oracle too.

Cheers,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Planning without reason.

On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
 R contains indices but not on all attributes or not on 
 all ordered subset of keys.
 
 Query example:
 (SELECT * FROM R
 WHERE a=3, b=6,. ...)
 UNION
 (SELECT * FROM R
 WHERE b=5, d=2,. ...)
 UNION
 
 And lots of unions.

Do you need UNION, or do you actually mean UNION ALL?

Also, couldn't you just do:

SELECT * FROM R 
WHERE (a=3, b=6, ...)
OR (b=5, d=2, ...)
etc

 I am currently just writing the query as a string and open a cursor.
 Is there a simple way to use Datums instead of converting the attributes to 
 strings to create a plan for SPI.
 10x.

I imagine SPI_prepare() and SPI_execp() would be used for this.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Overhead for stats_command_string et al, take 2

2006-06-23 Thread Alvaro Herrera

Tom Lane wrote:
 I redid my previous measurements after finishing up the weekend's
 hacking.  The numbers shown below are elapsed time in seconds for
 
   time psql -f testfile.sql postgres /dev/null

Average of 5 runs, for the first two cases, on the x86 machine that
shows high overhead in gettimeofday.

I used only 3 SELECT 1 queries instead of 100k.

3 SELECT 1;
HEAD8.1
no overhead 21.923.1
stats_command_string=1  22.436.6

BEGIN; 3 SELECT 1; COMMIT;
HEAD8.1
no overhead 19.120.3
stats_command_string=1  19.430.3

It can be observed that HEAD in the no overhead case is actually faster
than 8.1 on this machine.   And while stats_command_string adds some
overhead, it still is faster than the no overhead case in 8.1.  (These
results took me by surprise actually.)

For the curious, here are medians and averages for each run (file3 is
the plain SELECT, while file4 is BEGIN-30k * SELECT-COMMIT.  caso1
is no overhead, caso2 is stats_command_string=1).  I couldn't find a
quick'n dirty way to calculate stddev in shell but if anyone knows one
let me know.

median  average
8.1-file3-caso1: 23.098 23.145
8.1-file3-caso2: 36.595 36.607
8.1-file4-caso1: 20.304 20.269
8.1-file4-caso2: 30.169 30.256
head-file3-caso1: 21.890 21.931
head-file3-caso2: 22.509 22.390
head-file4-caso1: 19.142 19.126
head-file4-caso2: 19.488 19.442

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

 Well, the only thing left is to cluster the database. There are a couple
 ways to do this, one switch to a platform that supports clustering or
 create an API to wrap multiple databases. If your queries are simple and
 limited, you could create an HTTP/XML service that wraps a number of
 postgresql databases, issues a query across all databases, merges multiple
 query sets, and returns one homoginous stream.
 
Our queries are not simple nor limited :-)

We have a big variety of users, with a big variety of needs and talented
user story writers who have imaginative minds... so the same code must
work in quite a few ways, and some of the resulting queries are
dynamically created. It's a tough job to optimize all the queries we
have.

 Inserts would be handled by hash to machine weighted by number of records
 on each machine.
 
 Updates and deletes would have two keys, machine and ID.
 
Such a setup might work for us but I fear it would be a major PITA to
make it reliable and to maintain it. It's not like I can say let's
allot a month of work for trying out a clustering solution, but I'm not
sure if it will work fine at the end. We still have enough features to
develop, the DB is something to solve part of the problem, not to keep
us busy... the Oracle systems were there first, the application works
more or less fine on them (with occasional need to optimize here and
there). Supporting Postgres was a side-project to see if it works, and
it works decently, so we deployed some systems on it. Both of the DBs
have their quirks, and I can voice here the ones I don't like in
Postgres... and then some developer might do something about it or not,
and I find that OK. If my mind wouldn't refuse so categorically to learn
C style programming (tried that and gave up), I would probably scratch
my itches. I did it for smaller scaled ones, like truncating
timestamp(0) instead of rounding so that it is consistent with what
Oracle does, but that was just a one file modification... I simply don't
find it fun to browse C code, compared to how easy is to understand Java
code which I work with here. So unless somebody ports Postgres to Java,
I'll further need to voice my itches here in the hope that they'll be
solved by others... sorry for the long rant.

 It sounds like you have a big problem and you need a big solution.

Well, Postgres does a decent job as it is. The problem is under peek
load, sometimes it gets bogged down and the usual things like vacuum
will not help immediately. I think a few more features like the dead
space map for quick vacuum and even something like the original post's
proposition would make postgres fly under heavy load too...

Cheers,
Csaba.





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

Ühel kenal päeval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:

  Depending on exact details and optimisations done, this can be either
  slower or faster than postgresql's way, but they still need to do
  something to get transactional visibility rules implemented.
 
 I think they have a different strategy. I think they maintain the notion
 of current version of a row, and hunt for previous versions when needed,
 at least that's how I suspect Oracle does it with redo logs.

Not current but last :)

And one side effect of redo logs is that it is practically impossible to
do large deletes on production databases. So you design around that,
like you have to design around limitations of MVCC.

There has to be a more linear way of handling this scenario.
  
   So vacuum the table often.
 
  It's easy to say VACUUM often... but I'd bet that vacuuming is going
  to lessen the throughput in his tests even more; no matter how it's
  tuned.
 
  Running VACUUM often/continuously will likely keep his update rate
  fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2% extra
  load. At least if vacuum is configured right and the server is not
  already running at 100% IO saturation, in which case it will be worse.
 
 Assuming the table is a reasonable size, the I/O required for vacuum
 doesn't kill everything else!

I have solved the problem of unneccessary IO by keeping active and
finished rows in separate tables, with the finish() function moving the
row between tables.

In case of the number of actively modified rows being in only tens or
low hundreds of thousands of rows, (i.e. the modified set fits in
memory) the continuous vacuum process shows up as just another backend,
not really taking order of magnitude more resources. It mainly generates
WAL traffic, as modified pages are already in memory/cache and are
mostly synced by background writer and/or checkpoint.

Of course you have to adjust vacuum_cost_* variables so as to not
saturate IO.

  The max throughput figure is not something you actually need very often
  in production.
 
 No, but you need to have some degree of certainty and predictability in
 the system you are developing.

Yup. You have to design it so it has.

  What is interesting is setting up the server so that you
  can service your loads comfortably. Running the server at 100% lead is
  not anything you want to do on production server. There will be things
  you need to do anyway and you need some headroom for that.
 
 Of course, you design it so peaks are easily managed, but unless you run
 vacuum continuously, and that has its own set of problems, you run into
 this problem, and it can get really really bad.

Usually it gets really bad if you *don't* run vacuum continuously, maybe
hopeing to do it in slower times at night. For high-update db you have
to run it continuously, maybe having some 5-15 sec pauses between runs.


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-23 Thread John DeSoi



On Jun 23, 2006, at 3:10 AM, Dave Page wrote:


Is there any real reason to continue to support Cygwin? We've always
said it's not a first class port, and now we have the native port  
which

is it seems somewhat pointless expending further effort on it.



Are all the tools needed to compile from source on Win32 freely  
available?



John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

On Fri, Jun 23, 2006 at 03:30:45PM +0200, Csaba Nagy wrote:
  It sounds like you have a big problem and you need a big solution.
 
 Well, Postgres does a decent job as it is. The problem is under peek
 load, sometimes it gets bogged down and the usual things like vacuum
 will not help immediately. I think a few more features like the dead
 space map for quick vacuum and even something like the original post's
 proposition would make postgres fly under heavy load too...

I know there have a been a number of suggestions in the past to deal
with this thing. Some I don't remember being mentioned in this thread
are:

- Once a tuple has been determined to be invisible to everyone,
truncate it to just the header. This would probably work wonders for
frequently updated wide tables. However, this required keeping track of
the oldest active xact, I'm not sure how that works at the moment.

- Have the bgwriter do cleanup work before writing out a block. It
could probably do the truncation bit above, but totally removing old
tuples requires cleaning out the indexes too, which AIUI is the hard
part of vacuuming.

One totally whacked out idea I just thought of: Instead of just
truncating tuples when they're invisible, mark them dying and make
the data section store an array of CTIDs pointing to the index tuples
pointing to it. Lookups that find the tuple via an index could store
the CTID of the index tuple before continuing. If the bgwriter sees it
has a full set, it can efficiently remove the tuple straight away.

There are ofcourse drawbacks to this approach, you'd probably need
something like the half-dirty pages to avoid a large increase in write
load. If it's even beneficial at all given concurrency issues.

Still, VACUUM has gotten faster in CVS so the issues are slowly being
addressed...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

Michael Meskes [EMAIL PROTECTED] writes:
 On Thu, Jun 22, 2006 at 11:37:08AM -0400, Tom Lane wrote:
 item #3: Carsten Wolff copyright in informix.c file
 The file informix.c contains a copyright from Carsten Wolff.  Did Carsten 
 directly contribute this file to the PostgreSQL project?
 
 This code was added by Michael Meskes in informix.c rev 1.6
 (2003-05-06).  Michael, any info on the exact copyright status?

 Yes. In fact the copyright belongs to credativ GmbH the company that
 paid Carsten for his work. As you may or may not know I'm the CEO of
 that company and can assure you that his work was contributed to the
 PostgreSQL project.

That sounds fine --- could you add a note in the source code to this
effect?  Contributed under the PostgreSQL License or something like
that after the copyright notice would be sufficient.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Diogo Biazus

On 6/23/06, Simon Riggs [EMAIL PROTECTED]
wrote:
- give more flexibility for managing the xlogs remotelyNot sure what you mean. - I think it's faster to implement and to have a working and usable tool.Why do you think that? It sounds like you've got more work since you
effectively need to rewrite the _desc routines.Yes, but I don't need to worry with program output, and I have the backend's memory management and error handling.

And there is one option to minimize the problem in the failed cluster case: the wrapper program could give the option to initdb a temporary area when no connection is given, creating a backend just to analyze a
set of xlogs.It seems a reasonable assumption that someone reading PostgreSQL logswould have access to another PostgreSQL cluster. It obviously needs towork when the server that originated the logs is unavailable, but that
does not mean that all PostgreSQL systems are unavailable. There's noneed to try to wrap initdb - just note that people would have to haveaccess to a PostgreSQL system.Yes, that's what I tought, wrap the initdb isn't needed but would make things easier for a newbie.
Other option is to start by the standalone tool and create a wrapper
function inside postgresql that would just call this external program
and extract data from the xlogs using this program's output (with some option to output all data in a CSV format).I think this idea is a good one, but we must also consider whether iscan be done effectively within the time available. Is this: can do now
or want to do in future?I think that could be done, I have some code to call external programs within the database ready. It would be one of the enhancements in the case we choose the standalone path.
The alternative of reinforcing xlogdump needs to be considered morefully now and quickly, so coding can begin as soon as possible.
- Diogo: what additional things can you make xlogdump do?I could add options to display the data of the bkp bloks, add the missing rmids: RM_HASH_ID, RM_GIST_ID, RM_SEQ_ID. Make a options to query only the transaction info xids status. And make a contrib module that calls the xlogdump and parses the output.
- Tom: can you say more about what you'd like to see from a tool, tohelp Diogo determine the best way forward. What value can he add if you
have already written the tool?Some other considerations:The biggest difficulty is finding loser transactions - ones that havenot yet committed by the end of the log. You need to do this in both
cases if you want to allow transaction state to be determined preciselyfor 100% of transactions; otherwise you might have to have an Unknowntransaction state in addition to the others.Yes, this is one thing we have to do in any case.
What nobody has mentioned is that connecting to a db to lookup tablenames from OIDs is only possible if that db knows about the set of
tables the log files refer to. How would we be certain that theOID-to-tablename match would be a reliable one?Good question, It seems to me that the only case where this have a trivial aswer is if your inside the backend querying the current xlog directory.
I'm still thinking about the solution for the other cases (inside or outside the backend).-- Diogo Biazus -

[EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Alvaro Herrera

Csaba Nagy wrote:

 This could be a solution... but then I'm not sure how well would do
 queries which need the first 10 records based on some criteria which
 does not include the group id. I guess limit queries across the union of
 the partitions don't work too well for now, and we do have such queries.
 I'm pretty sure we could work this out, but it would need some big
 refactoring of our current code which is not that simple... and it must
 work well with oracle too. We do have systems on Oracle too.

No, it wouldn't, because the partitioning logic can live in the
database.  You don't need to touch the application.

I agree that those queries not using the group criteria could put you in
some sort of trouble.  But if you exchange one big indexscan by several
smaller indexscans, maybe it's not that bad.  Even less if you stop
scanning (because of the LIMIT) before you used all the partitions.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Diogo Biazus

On 6/23/06, Diogo Biazus [EMAIL PROTECTED] wrote:
On 6/23/06, Simon Riggs 
[EMAIL PROTECTED]
 wrote:
 - give more flexibility for managing the xlogs remotelyNot sure what you mean.I can connect to the server if I want to query xlogs in a remote machine.
If i depend on a standalone tool that reads the filesystem I'll need some kind of ssh access.Sometimes the DBA needs the info in the xlogs but doesn't have/want access to the SO.It's also easier to create a script that collect statistics about xlogs in machiune clusters.
-- Diogo Biazus - [EMAIL PROTECTED]Móvel Consultoriahttp://www.movelinfo.com.br
http://www.postgresql.org.br

Re: [HACKERS] vacuum, performance, and MVCC


On 6/22/06, Alvaro Herrera [EMAIL PROTECTED] wrote:

 Not in all systems.  A few now perform in-memory UNDO and only write
 it to disk if and when it is required.

How does that work?  If the last transaction is not finished after it
wrote the tuple when the power goes out, and the UNDO is not written to
disk either, how do you reconstruct the tuple at all?


It still follows normal transaction semantics... on commit, the REDO
record is written to disk.  If the block was not written to disk and
the database crashes, the on-disk representation of the block remained
unchanged and recovery would pick up the REDO record from the log and
reapply it.

However, say someone updates a record which moves the old tuple value
to an in-memory UNDO location and changes the in-memory block
representation... if a checkpoint occurs before the user commits, the
block must be written to disk in changed form which means that the
UNDO record must also be written to the log (before the block is
written to disk).  In the event of a crash, recovery would see that
the transaction never committed and would apply the UNDO to the block.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Full Disjunction


On 6/23/06, Christopher Kings-Lynne [EMAIL PROTECTED] wrote:

What IS this full disjunction business?


Tzahi Fadida is working on a Summer of Code project to implement a
contrib module which can perform full disjunctions within PostgreSQL.
It's pretty cool.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Planning without reason.

On Friday 23 June 2006 16:14, Martijn van Oosterhout wrote:
 On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
  R contains indices but not on all attributes or not on
  all ordered subset of keys.
 
  Query example:
  (SELECT * FROM R
  WHERE a=3, b=6,. ...)
  UNION
  (SELECT * FROM R
  WHERE b=5, d=2,. ...)
  UNION
  
  And lots of unions.

 Do you need UNION, or do you actually mean UNION ALL?

I am using UNION ALL, but it doesn't matter
i am actually doing:
(SELECT * FROM R
WHERE a=3, b=6,. ... LIMIT 1)
UNION ALL
(SELECT * FROM R
WHERE b=5, d=2,. ... LIMIT 1)
UNION ALL


with LIMIT 1.
My initial reasoning was to avoid extra sorts but i guess that the planner 
just doesn't get the LIMIT 1. I see now that UNION should be better 
for the planner to undestand (not performance wise). 
However, UNION alone, doesn't seem to cut it. 
Following is an example. t7 has 2 attributes and a non-unique index on one
attribute. here is a printout:
explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from 
t7 where a2=139 LIMIT 1);
 QUERY PLAN 

 Unique  (cost=23.18..23.19 rows=2 width=8) (actual time=0.149..0.165 rows=1 
loops=1)
   -  Sort  (cost=23.18..23.18 rows=2 width=8) (actual time=0.142..0.148 
rows=2 loops=1)
 Sort Key: a4, a2
 -  Append  (cost=0.00..23.17 rows=2 width=8) (actual 
time=0.052..0.106 rows=2 loops=1)
   -  Limit  (cost=0.00..5.65 rows=1 width=8) (actual 
time=0.046..0.049 rows=1 loops=1)
 -  Index Scan using indext7 on t7  (cost=0.00..5.65 
rows=1 width=8) (actual time=0.038..0.038 rows=1 loops=1)
   Index Cond: (a4 = 113)
   -  Limit  (cost=0.00..17.50 rows=1 width=8) (actual 
time=0.035..0.038 rows=1 loops=1)
 -  Seq Scan on t7  (cost=0.00..17.50 rows=1 width=8) 
(actual time=0.029..0.029 rows=1 loops=1)
   Filter: (a2 = 139)
 Total runtime: 0.334 ms
(11 rows)



 Also, couldn't you just do:

 SELECT * FROM R
 WHERE (a=3, b=6, ...)
 OR (b=5, d=2, ...)
 etc

No, a filtering action is not enough since my goal is to only use indices
when retrieving single tuples each time thus, if i will use OR i cannot 
control the number of tuples returned by each Or clause.


  I am currently just writing the query as a string and open a cursor.
  Is there a simple way to use Datums instead of converting the attributes
  to strings to create a plan for SPI.
  10x.

 I imagine SPI_prepare() and SPI_execp() would be used for this.

I am already using SPI_prepare but it uses a query of the form of a char 
string, which i need to prepare and is quite long. I.e. if i have 100 tuples
i wish to retrieve it can be very wasteful to prepare the string in memory
and use SPI_prepare to prepare and later execute it.
better to use directly the datums (which i already have deformed from 
previous operations).


 Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

 Ãhel kenal pÃ¤eval, N, 2006-06-22 kell 12:41, kirjutas Mark Woodward:

  Depending on exact details and optimisations done, this can be either
  slower or faster than postgresql's way, but they still need to do
  something to get transactional visibility rules implemented.

 I think they have a different strategy. I think they maintain the notion
 of current version of a row, and hunt for previous versions when
 needed,
 at least that's how I suspect Oracle does it with redo logs.

 Not current but last :)

True

 And one side effect of redo logs is that it is practically impossible to
 do large deletes on production databases. So you design around that,
 like you have to design around limitations of MVCC.

Think that's bad, try doing an update in PostgreSQL on a table with 20
million rows and a few indexes. I had to write a script to chunk up the
block update into segments and vacuum between each.


There has to be a more linear way of handling this scenario.
  
   So vacuum the table often.
 
  It's easy to say VACUUM often... but I'd bet that vacuuming is going
  to lessen the throughput in his tests even more; no matter how it's
  tuned.
 
  Running VACUUM often/continuously will likely keep his update rate
  fluctuatons within a corridor of maybe 5-10%, at the cost of 1-2%
 extra
  load. At least if vacuum is configured right and the server is not
  already running at 100% IO saturation, in which case it will be worse.

 Assuming the table is a reasonable size, the I/O required for vacuum
 doesn't kill everything else!

 I have solved the problem of unneccessary IO by keeping active and
 finished rows in separate tables, with the finish() function moving the
 row between tables.

Sorry, an RDBMS is a relational database management system, if you are
doing the database management, it isn't a very good RDBMS.


 In case of the number of actively modified rows being in only tens or
 low hundreds of thousands of rows, (i.e. the modified set fits in
 memory) the continuous vacuum process shows up as just another backend,
 not really taking order of magnitude more resources. It mainly generates
 WAL traffic, as modified pages are already in memory/cache and are
 mostly synced by background writer and/or checkpoint.

 Of course you have to adjust vacuum_cost_* variables so as to not
 saturate IO.

These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
where it is very lacking.


  The max throughput figure is not something you actually need very
 often
  in production.

 No, but you need to have some degree of certainty and predictability in
 the system you are developing.

 Yup. You have to design it so it has.

I was refereing to the system as a whole and the individual components.
PostgreSQL's performance under some pathalogical condictions is not very
predictable or reliable.


  What is interesting is setting up the server so that you
  can service your loads comfortably. Running the server at 100% lead is
  not anything you want to do on production server. There will be things
  you need to do anyway and you need some headroom for that.

 Of course, you design it so peaks are easily managed, but unless you run
 vacuum continuously, and that has its own set of problems, you run into
 this problem, and it can get really really bad.

 Usually it gets really bad if you *don't* run vacuum continuously, maybe
 hopeing to do it in slower times at night. For high-update db you have
 to run it continuously, maybe having some 5-15 sec pauses between runs.

And how much I/O does this take?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

 -Original Message-
 From: John DeSoi [mailto:[EMAIL PROTECTED] 
 Sent: 23 June 2006 14:56
 To: Dave Page
 Cc: Andrew Dunstan; Tom Lane; Bort, Paul; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [CORE] GPL Source and Copyright Questions

 On Jun 23, 2006, at 3:10 AM, Dave Page wrote:

  Is there any real reason to continue to support Cygwin? We've always
  said it's not a first class port, and now we have the native port  
  which
  is it seems somewhat pointless expending further effort on it.

 Are all the tools needed to compile from source on Win32 freely  
 available?

Yes. Even when/if we add a VC++ build they will be.

Regards, Dave.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

Dave Page dpage@vale-housing.co.uk writes:
 Is there any real reason to continue to support Cygwin? We've always
 said it's not a first class port, and now we have the native port which
 is it seems somewhat pointless expending further effort on it.

I think the day will come when there's a good reason to decommission the
Cygwin port (ie, some problem that seems unreasonably hard to solve),
and then I'll be the first in line voting to do so.  But accidental
breakage due to someone being sloppy with #ifdefs for a different port
isn't a good reason.

Actually, my gripe about this one is that it wasn't detected promptly.
That patch went in two weeks ago; we should have known about the problem
within a couple days at most.  Seems like the Windows members of the
buildfarm don't run often enough.  The whole point of the buildfarm is
to spot problems while the code is still fresh in mind, no?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: 23 June 2006 15:15
 To: Dave Page
 Cc: Andrew Dunstan; Bort, Paul; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [CORE] GPL Source and Copyright Questions 

 Actually, my gripe about this one is that it wasn't detected promptly.
 That patch went in two weeks ago; we should have known about 
 the problem
 within a couple days at most.  Seems like the Windows members of the
 buildfarm don't run often enough.  The whole point of the buildfarm is
 to spot problems while the code is still fresh in mind, no?

I think that speaks for the current usage of the cygwin port. Snake runs
native builds daily, but like Magnus and his dev box there's no way I'm
letting Cygwin anywhere near it. Istr that the only vaguely active
Cygwin member is Andrew's laptop.

Regards, Dave.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs

On Fri, 2006-06-23 at 11:03 -0300, Diogo Biazus wrote:
 On 6/23/06, Diogo Biazus [EMAIL PROTECTED] wrote:


 On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
  - give more flexibility for managing the xlogs
 remotely
 
 Not sure what you mean.
 
 I can connect to the server if I want to query xlogs in a remote
 machine.
 If i depend on a standalone tool that reads the filesystem I'll need
 some kind of ssh access.
 Sometimes the DBA needs the info in the xlogs but doesn't have/want
 access to the SO.
 It's also easier to create a script that collect statistics about
 xlogs in machiune clusters. 

OK, security is an interesting one.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Alvaro Herrera

Mark Woodward wrote:

  In case of the number of actively modified rows being in only tens or
  low hundreds of thousands of rows, (i.e. the modified set fits in
  memory) the continuous vacuum process shows up as just another backend,
  not really taking order of magnitude more resources. It mainly generates
  WAL traffic, as modified pages are already in memory/cache and are
  mostly synced by background writer and/or checkpoint.
 
  Of course you have to adjust vacuum_cost_* variables so as to not
  saturate IO.
 
 These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
 where it is very lacking.

We all know Postgres is lacking; some of us try to improve it (some with
more success than others).  People who know the current limitations but
like the capabilities, try to find workarounds to the problems. What
surprises me is that, if you have such a low opinion of Postgres, you
still use it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL Source and Copyright Questions)

Dave Page dpage@vale-housing.co.uk writes:
 Actually, my gripe about this one is that it wasn't detected promptly.
 That patch went in two weeks ago; we should have known about 
 the problem
 within a couple days at most.  Seems like the Windows members of the
 buildfarm don't run often enough.  The whole point of the buildfarm is
 to spot problems while the code is still fresh in mind, no?

 I think that speaks for the current usage of the cygwin port. Snake runs
 native builds daily, but like Magnus and his dev box there's no way I'm
 letting Cygwin anywhere near it. Istr that the only vaguely active
 Cygwin member is Andrew's laptop.

Well, lack of interest is certainly adequate reason to decommission a
port.  If we can't find anyone who cares enough about Cygwin to host a
regularly-scheduled buildfarm member, I'm for blowing it off.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] xlog viewer proposal

2006-06-23 Thread Simon Riggs

On Fri, 2006-06-23 at 10:59 -0300, Diogo Biazus wrote:

 On 6/23/06, Simon Riggs [EMAIL PROTECTED] wrote:
  - give more flexibility for managing the xlogs remotely
 
 Not sure what you mean.
 
  - I think it's faster to implement and to have a working and
 usable
  tool.
 
 Why do you think that? It sounds like you've got more work
 since you 
 effectively need to rewrite the _desc routines.
 
 Yes, but I don't need to worry with program output, and I have the
 backend's memory management and error handling.

I'd suggest doing a quick prototype to allow us to evaluate which
architecture would be preferable.

I'm torn between the good-idea and the safe-minimal-but-definitely in
8.2 option.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread A.M.

On Fri, June 23, 2006 9:56 am, Martijn van Oosterhout wrote:
 On Fri, Jun 23, 2006 at 03:30:45PM +0200, Csaba Nagy wrote:

 It sounds like you have a big problem and you need a big
 solution.

 Well, Postgres does a decent job as it is. The problem is under peek
 load, sometimes it gets bogged down and the usual things like vacuum will
 not help immediately. I think a few more features like the dead space
 map for quick vacuum and even something like the original post's
 proposition would make postgres fly under heavy load too...

 I know there have a been a number of suggestions in the past to deal
 with this thing. Some I don't remember being mentioned in this thread are:


 - Once a tuple has been determined to be invisible to everyone,
 truncate it to just the header. This would probably work wonders for
 frequently updated wide tables. However, this required keeping track of
 the oldest active xact, I'm not sure how that works at the moment.

 - Have the bgwriter do cleanup work before writing out a block. It
 could probably do the truncation bit above, but totally removing old tuples
 requires cleaning out the indexes too, which AIUI is the hard part of
 vacuuming.

 One totally whacked out idea I just thought of: Instead of just
 truncating tuples when they're invisible, mark them dying and make the
 data section store an array of CTIDs pointing to the index tuples pointing
 to it. Lookups that find the tuple via an index could store the CTID of
 the index tuple before continuing. If the bgwriter sees it has a full set,
 it can efficiently remove the tuple straight away.

 There are ofcourse drawbacks to this approach, you'd probably need
 something like the half-dirty pages to avoid a large increase in write
 load. If it's even beneficial at all given concurrency issues.

A lot of these recommendations sound like garbage collection ideas found
in modern programming languages. Perhaps it would be worth considering
allowing sessions to keep track of which pages they alter and spawn a
separate process per connection to sweep up slowly along the way.

Also, it's nice that vacuum now has slow-down settings, but why isn't
there a option to autovacuum during periods of idleness and pause when
busy?

-M



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL Source and Copyright Questions)

2006-06-23 Thread Larry Rosenman

Tom Lane wrote:
 Dave Page dpage@vale-housing.co.uk writes:
 Actually, my gripe about this one is that it wasn't detected
 promptly. That patch went in two weeks ago; we should have known
 about 
 the problem
 within a couple days at most.  Seems like the Windows members of the
 buildfarm don't run often enough.  The whole point of the buildfarm
 is to spot problems while the code is still fresh in mind, no?
 
 I think that speaks for the current usage of the cygwin port. Snake
 runs native builds daily, but like Magnus and his dev box there's no
 way I'm letting Cygwin anywhere near it. Istr that the only vaguely
 active Cygwin member is Andrew's laptop.
 
 Well, lack of interest is certainly adequate reason to decommission
 a port.  If we can't find anyone who cares enough about Cygwin to
 host a regularly-scheduled buildfarm member, I'm for blowing it off.
 
What all's needed on the host for this?

I might be able to use either my house machine or my work desktop 
here @pervasive, or one of my test boxes here @pervasive.



-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

 Mark Woodward wrote:

  In case of the number of actively modified rows being in only tens or
  low hundreds of thousands of rows, (i.e. the modified set fits in
  memory) the continuous vacuum process shows up as just another
 backend,
  not really taking order of magnitude more resources. It mainly
 generates
  WAL traffic, as modified pages are already in memory/cache and are
  mostly synced by background writer and/or checkpoint.
 
  Of course you have to adjust vacuum_cost_* variables so as to not
  saturate IO.

 These sort of solutions, IMHO, don't show how good PostgreSQL is, but
 show
 where it is very lacking.

 We all know Postgres is lacking; some of us try to improve it (some with
 more success than others).  People who know the current limitations but
 like the capabilities, try to find workarounds to the problems. What
 surprises me is that, if you have such a low opinion of Postgres, you
 still use it.

Actually I love PostgreSQL, I've been using it for about 10 years on a lot
of projects. There are some serious issues with it, however, and it is
important to expose them, discuss them, and resolve them. Work arounds are
great, but in the end, they are work arounds.


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

  These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
  where it is very lacking.
 
 We all know Postgres is lacking; some of us try to improve it (some with
 more success than others).  People who know the current limitations but
 like the capabilities, try to find workarounds to the problems. What
 surprises me is that, if you have such a low opinion of Postgres, you
 still use it.

Alvaro, I understand your reaction, you're on the developer side... but
please try to understand us mortals who can't write good C code too:
some of us like what postgres already offers, but we would also like to
not need those workarounds. And the only thing we can do is make big
noise so somebody from the other side (sour side) will notice it and at
one point do something about it... the noise here by no means means we
have a low opinion about postgres. On the contrary, we appreciate enough
postgres and it's developers to ask for what we would like to see, and
we are sure there are some gifted developers out there who can program
those features we scream about...

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Zeugswetter Andreas DCP SD

  This could be a solution... but then I'm not sure how well would do 
  queries which need the first 10 records based on some criteria which

  does not include the group id. I guess limit queries across the
union 
  of the partitions don't work too well for now, and we do 
 have such queries.

You would query the parent (no union). Do you need order by's ?
Without order by it is currently no problem.

Do we push the limit down to the separate tables when we have an
appropriate
index for the order by (that could be a TODO item)?
(You need a max of limit rows per child in the outer order) 

Or we would need to implement an efficient index merge node
for order by queries on parent (and union all's) with low limits
and an appropriate index.

Selecting the oldest x rows from a time partitioned table is a frequent
problem
we need to work around here too (Informix db).

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Planning without reason.

On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
 My initial reasoning was to avoid extra sorts but i guess that the planner 
 just doesn't get the LIMIT 1. I see now that UNION should be better 
 for the planner to undestand (not performance wise). 
 However, UNION alone, doesn't seem to cut it. 
 Following is an example. t7 has 2 attributes and a non-unique index on one
 attribute. here is a printout:
 explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select * from 
 t7 where a2=139 LIMIT 1);

What are the indexes on?  If you only have an index on a4, the latter
query has to be an index scan and there's no way to optimise it way.

  Also, couldn't you just do:
 
  SELECT * FROM R
  WHERE (a=3, b=6, ...)
  OR (b=5, d=2, ...)
  etc
 
 No, a filtering action is not enough since my goal is to only use indices
 when retrieving single tuples each time thus, if i will use OR i cannot 
 control the number of tuples returned by each Or clause.

I must admit, this is a really strange way of doing it. For example, if
multiple rows match, the tuples eventually returned will be a random
selection of the rows that matched. Especially with the limit 1
there's no way the optimiser could combine the individual scans.

If you really need the LIMIT 1 and you don't have full index coverage
then you're quite limited as to how it can be optimised.

   I am currently just writing the query as a string and open a cursor.
   Is there a simple way to use Datums instead of converting the attributes
   to strings to create a plan for SPI.
   10x.
 
  I imagine SPI_prepare() and SPI_execp() would be used for this.
 
 I am already using SPI_prepare but it uses a query of the form of a char 
 string, which i need to prepare and is quite long. I.e. if i have 100 tuples
 i wish to retrieve it can be very wasteful to prepare the string in memory
 and use SPI_prepare to prepare and later execute it.
 better to use directly the datums (which i already have deformed from 
 previous operations).

I'm confused here too. I thought the datums you're talking about were
arguments, thus you could push them straight to SPI_execp(). But you
seem to be suggesting parts of the actual query are in datum form also?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] [CORE] GPL Source and Copyright Questions

2006-06-23 Thread Michael Meskes

On Fri, Jun 23, 2006 at 09:58:42AM -0400, Tom Lane wrote:
 That sounds fine --- could you add a note in the source code to this
 effect?  Contributed under the PostgreSQL License or something like
 that after the copyright notice would be sufficient.

No problem. Just committed it.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] vacuum, performance, and MVCC

Ühel kenal päeval, N, 2006-06-22 kell 13:49, kirjutas Mark Woodward:
  Christopher Browne [EMAIL PROTECTED] writes:

  Basically there's no free lunch: if you want the benefits of MVCC it's
  going to cost you somewhere.  In the Postgres design you pay by having
  to do VACUUM pretty often for heavily-updated tables.  I don't think
  that decision is fundamentally wrong --- the attractive thing about it
  is that the overhead is pushed out of the foreground query-processing
  code paths.
 
 Under certain circumstances, it is a very poor design. Think of a single
 row table that keeps a scoreboard or a session table that keeps a limited
 number of rows that are updated very frequently.

A single row table that every session updates is a really bad design on
any database, as it is a sure point of lock contention and thus removes
any possibility of concurrency.

But except for locking problems, it will perform really well when you
vacuum often enough :)

  We still have lots of work to do in making autovacuum
  smarter, avoiding vacuuming parts of relations that have not changed,
  and so on.  But I have no desire to go over to an Oracle-style solution
  instead.  We can't beat them by trying to be like them, and we run no
  small risk of falling foul of some of their patents if we do.
 
 I proposed having a key row entry for each logical row. The key row
 entry points to the latest version of the row. There, each row entry is a
 linked list, in descending order, of previous row versions. 

Do I understand right, that you are proposing a redesign of how indexing
works, by updating indexes in-place. 

How would older rows be found then by transactions needing to see
them ? 

Do you suggest reverting to seqscan when we see _any_ newer
transactions ?

Or if you want to have index pointing to latest row with each value in
indexed field, how would you find the last time this value was used ?

Don't tell me that you plan to trace the full update-chain on each
update.

Or would this new indexing mechanism be used only for non-changing key
fields ? How would you check for that ?

 The vast majority of the time, the latest version will be the first version. 

Not in a web scenario. In my experience more complicated web-pages tend
to produce lots of concurrent accesses.

 It is
 only when you have a previously started long running or concurrent
 transaction will you ever look at previous versions.

 I'm not saying it is an easy slam dunk, as I can think of a few
 difficulties off the top of my head, but it would solve the steady
 degradation of performance between vacuums and, to a possibly lesser
 extent, the cost of updating a row in a heavily indexed table.

VACUUMing often also solves the problem of steady degradation of
performance between vacuums :)

No need to be afraid of vacuum. Vacuum is your friend! Just learn to use
it right.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Webcluster session storage, was vacuum, performance, and MVCC

2006-06-23 Thread James Robinson


Verging on offtopic, but ...

Regarding the best place for session data, memcached isn't really the  
answer, for opposite reasons as to why it isn't so great to store it  
in the central DB for a bug web farm.


Folks on the memcached lists propose this [ I keep all my session  
data in memcached, it works great! ] from time to time, and always  
get smacked down with Don't do that -- memcached is just for caching  
stuff that is retrievable from some stable / slower source.  
Memcached's LRU policy could conceivably evict that session data at  
any time and you'll loose.


So, just it works fine under low / moderate load [ just as it does in  
PG when periodic vacuuming can keep up ], but under higher user  
volume than anticipated your memcaches could well decide to evict  
otherwise good sessions. Especially if all your session data objects  
are about the same size, but you also store things either smaller or  
bigger in memcache -- its current slab allocator subdivides total  
heap space into slabs for like-sized objects on powers of two I  
believe, so even though you gave memcache a half a gig or whatever to  
play with, the available size for 1K objects will not be that half a  
gig if you're also storing 20 byte and 10K values.


Therefore I would not recommend memcached as the sole container for  
session data. It's a well designed champ as a fast cache, but it  
oughta be used just as a cache, not as a datastore. Hence the name.


For folks who say that the stuff should be in RAM at the appserver  
side -- well -- that does imply sticky session load balancing which I  
believe most folks agree ought to be avoided if at all possible. And  
in our specific case, our appserver code is fastcgi / multiprocess on  
each webserver, so there's no central RAM to store it in, save for  
swallowing the shared memory bullet. Some sort of distributed shared  
memory [ which seems to be rather how Mohawksoft's MCache seems to be  
designed ] probably also performs best with sticky sessions --  
otherwise the session pages thrash between appservers on writes. I've  
not read the docs with fine-tooth comb to really infer how it is  
designed, so please forgive and educate me if MCache has an elegant  
solution for write-heavy non-sticky session use.


I'm squarely in the camp of just suck up and live with serialization  
-- storing session data in RAM on appservers in appserver processes  
really really binds your hands from a deployment and administration  
angle. And you don't even really realize it until you've tasted the  
freedom of having be outside. It lets you, oh, say, do code updates  
in the middle of the day -- not at 5AM. It lets you cluster  
horizontally in a much more simple manner -- no needing to deal with  
any broadcast-writes voodoo. It lets you not have to have sticky  
sessions. It lets you be much more, well, stateless, in your  
appserver code. Our site used to be single-JVM JBoss-based with  
sessions in RAM, and after moving away to multiprocess fastcgi model  
-- the grass is way greener on this side.


In rewriting we're also going completely stateless -- no illusion of  
scratch session data store [ we're so read-mostly that it is easily  
possible ]. But if we 'had' to have session support, I'd like to use  
a system which worked something like:


	1) Deployed centrally parallel to memcached and DB servers for the  
webcluster.

2) Performs update-in-place.
3) Will not arbitrarily evict data.
	4) Stores arbitrary volumes of info -- i.e. spills to disk if it has  
to, but doesn't have to worry about fsyncing all day long -- be they  
tx logs or datafiles or what have you.
	5) Fast as hell for reading / writing 'hot' session data. Does not  
have to worry about fast concurrent access to session data -- we're  
probably handling at most one single HTTP request per this session at  
a time.


	6) Clusterable in memcached model -- client-side-code hashes the  
session key to the backend session store. If said backend session  
server becomes unresponsive, flag him as temporarily dead [ try again  
in 30 seconds? ] and remove from possible candidates. rehash and re- 
query.


	7) Migrateable: If I need to bring down session server N, it stops  
answering requests and re-hashes what it has stored and transmits to  
its live peers. If hashing algorithm and list of servers were common  
across all servers + clients, and if perhaps a server response for a  
fetch / store could contain a redirect this to other server type  
response, then perhaps as the server going down streams the data to  
its peers and they ack reception, then this could possibly work w/o  
loss of data and would minimize delay at topology change time.


I  could live with 'if one of my session servers croaks unplanned, I  
loose those sessions'. Wishlist 6 and 7 items remind me more and more  
of AFS or NFS4 client / server interaction. Not trivial unfortunately.


I hate to say it, but would mysql /

Re: [HACKERS] vacuum, performance, and MVCC

First of all, I base my assumptions on what I recall to have read on
this list, as I didn't try yet partitioning using inheritance. It's not
trivial to set up and I didn't have the time to play with it yet. So I
wouldn't know for sure that it won't work fine with our application, and
that will only change when I'll get a few days to experiment. The
experimentation will include the migration of existing data to the
partitioned schema, which will be probably the most difficult part of it
due to the size of the tables which need partitioning...

 You would query the parent (no union). Do you need order by's ?
 Without order by it is currently no problem.

It's clear to me that partitioning by inheritance is transparent to the
application, what worries me is that our application likely has a few
queries which will be equivalent to a union when planning, and I fear
bad performance there.

An I need order by on all queries with limit. The few exceptions where I
wouldn't need order by are when I want to delete/update chunk-wise, but
that's not supported right now... another feature I made noise about ;-)

[snip]

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Planning without reason.

On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote:
 On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
  My initial reasoning was to avoid extra sorts but i guess that the
  planner just doesn't get the LIMIT 1. I see now that UNION should be
  better for the planner to undestand (not performance wise).
  However, UNION alone, doesn't seem to cut it.
  Following is an example. t7 has 2 attributes and a non-unique index on
  one attribute. here is a printout:
  explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select *
  from t7 where a2=139 LIMIT 1);

 What are the indexes on?  If you only have an index on a4, the latter
 query has to be an index scan and there's no way to optimise it way.

That's my point, it should have only used a sequence scan and not also
do an index scan.
In other words, it should have consolidated the two nodes of index scan and 
sequence scan into a single plan node where you only scan sequentially the
relation and choose a tuple for each UNION clause.


   Also, couldn't you just do:
  
   SELECT * FROM R
   WHERE (a=3, b=6, ...)
   OR (b=5, d=2, ...)
   etc
 
  No, a filtering action is not enough since my goal is to only use indices
  when retrieving single tuples each time thus, if i will use OR i cannot
  control the number of tuples returned by each Or clause.

 I must admit, this is a really strange way of doing it. For example, if
 multiple rows match, the tuples eventually returned will be a random
 selection of the rows that matched. Especially with the limit 1
 there's no way the optimiser could combine the individual scans.

It is a query i use for full disjunction which is a part of the algorithm.
I am doing it manually, so i don't see why it can't do it itself. 
I.e.: Scan sequentially R. for each UNION clause find a matching tuple. 
the end.


 If you really need the LIMIT 1 and you don't have full index coverage
 then you're quite limited as to how it can be optimised.

You misunderstood me, i wish the planner to only use sequence scan in the
event where even one node is a sequential scan.


I am currently just writing the query as a string and open a cursor.
Is there a simple way to use Datums instead of converting the
attributes to strings to create a plan for SPI.
10x.
  
   I imagine SPI_prepare() and SPI_execp() would be used for this.
 
  I am already using SPI_prepare but it uses a query of the form of a char
  string, which i need to prepare and is quite long. I.e. if i have 100
  tuples i wish to retrieve it can be very wasteful to prepare the string
  in memory and use SPI_prepare to prepare and later execute it.
  better to use directly the datums (which i already have deformed from
  previous operations).

 I'm confused here too. I thought the datums you're talking about were
 arguments, thus you could push them straight to SPI_execp(). But you
 seem to be suggesting parts of the actual query are in datum form also?

Example. i have a tuple T i am searching for. 
T contains attribute1, attribute2. I have T in a
heap_deformtuple(T) manner, i.e., i have T-v and T-n (for nulls).
Currently i am doing (loosely):
(SELECT * FROM R where attribute1= + convertDatumToCharString(T-v[0])+
 AND attribute2= + convertDatumToCharString(T-v[1]) + LIMIT 1)
+ UNION
... as above.

I can use prepare without conversions but i still have to construct the long 
query each time. I can't do prepare just once because the where clauses
structures are always changing. Thus, i was wondering if i can
also construct the part in the plan where i request to SELECT * FROM R...
I.e. not to use strings at all. The structure of the query is the same all the 
time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1 
nodes with UNION  ALL between SELECTS.


 Have a nice day,

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Webcluster session storage, was vacuum, performance, and MVCC

On Fri, Jun 23, 2006 at 10:31:22AM -0400, James Robinson wrote:
 Regarding the best place for session data, memcached isn't really the  
 answer, for opposite reasons as to why it isn't so great to store it  
 in the central DB for a bug web farm.

The thought that occurred to me while reading this is that you don't
what a database at all, you just want a quick tuplestore. In that case,
why not just create a wrapper around something like Berkley DB.

select * from bdb_get_key('session1234') as mysessiontype;
... session data ...

select bdb_put_key('session1234', ROW(... session data...))

select bdb_del_key('session1234');

select bdb_dump();
select bdb_destroy();

No pesky transactions, no vacuuming, (supposedly) high speed access.
Recent versions are supposed to be able to work with shared disk
storage between servers. AFAIK BDB is BSD licenced, so you could even
make a custom version that used the Postgres buffer cache and file
management.

Another possibility is something like tdb.

Basically, I'm not sure what all of this has to do with the core goal
of Postgres, which is to be an SQL compliant database.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread mark

On Fri, Jun 23, 2006 at 10:24:06AM -0400, Alvaro Herrera wrote:
 Mark Woodward wrote:
   In case of the number of actively modified rows being in only tens or
   low hundreds of thousands of rows, (i.e. the modified set fits in
   memory) the continuous vacuum process shows up as just another backend,
   not really taking order of magnitude more resources. It mainly generates
   WAL traffic, as modified pages are already in memory/cache and are
   mostly synced by background writer and/or checkpoint.
   Of course you have to adjust vacuum_cost_* variables so as to not
   saturate IO.
  These sort of solutions, IMHO, don't show how good PostgreSQL is, but show
  where it is very lacking.
 We all know Postgres is lacking; some of us try to improve it (some with
 more success than others).  People who know the current limitations but
 like the capabilities, try to find workarounds to the problems. What
 surprises me is that, if you have such a low opinion of Postgres, you
 still use it.

If everybody had good opinions, where would the development come from?

It's the parts that suck that need fixing the most... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Planning without reason.

On Fri, Jun 23, 2006 at 06:10:33PM +0300, Tzahi Fadida wrote:
 On Friday 23 June 2006 17:47, Martijn van Oosterhout wrote:
  On Fri, Jun 23, 2006 at 05:12:14PM +0300, Tzahi Fadida wrote:
   My initial reasoning was to avoid extra sorts but i guess that the
   planner just doesn't get the LIMIT 1. I see now that UNION should be
   better for the planner to undestand (not performance wise).
   However, UNION alone, doesn't seem to cut it.
   Following is an example. t7 has 2 attributes and a non-unique index on
   one attribute. here is a printout:
   explain analyze (select * from t7 where a4=113 LIMIT 1) UNION (select *
   from t7 where a2=139 LIMIT 1);
 
  What are the indexes on?  If you only have an index on a4, the latter
  query has to be an index scan and there's no way to optimise it way.
 
 That's my point, it should have only used a sequence scan and not also
 do an index scan.
 In other words, it should have consolidated the two nodes of index scan and 
 sequence scan into a single plan node where you only scan sequentially the
 relation and choose a tuple for each UNION clause.

I see what you're saying, but that's not necessarily faster. If you
consider the case where the tuple found by the seq scan is near the
beginning of the table and the tuple by the index near the end, it
could be worse to fold them. If you did only a single seq scan it would
have to scan the whole table, whereas now it only has to scan the
beginning.

Just because it says Seq Scan doesn't mean it actually scans the
whole table. There isn't a way in postgres to specify the plan you
want. LIMIT only works on the output of entire nodes, you can't say
output one tuple matching A, one tuple matching B in a single node.

 Example. i have a tuple T i am searching for. 
 T contains attribute1, attribute2. I have T in a
 heap_deformtuple(T) manner, i.e., i have T-v and T-n (for nulls).
 Currently i am doing (loosely):
 (SELECT * FROM R where attribute1= + convertDatumToCharString(T-v[0])+
  AND attribute2= + convertDatumToCharString(T-v[1]) + LIMIT 1)
 + UNION
 ... as above.
 
 I can use prepare without conversions but i still have to construct the long 
 query each time. I can't do prepare just once because the where clauses
 structures are always changing. Thus, i was wondering if i can
 also construct the part in the plan where i request to SELECT * FROM R...
 I.e. not to use strings at all. The structure of the query is the same all 
 the 
 time. I.e. there is the SELECT * FROM R and the WHERE clause with LIMIT 1 
 nodes with UNION  ALL between SELECTS.

I see. You could do:

(SELECT * FROM R WHERE attribute1=$1 and attribute2=$2 LIMIT 1) ... etc ...

That saves the conversions to cstrings, but you're looking for something
more abstract than that. I'm not sure that exists.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL

2006-06-23 Thread Andrew Dunstan


Tom Lane wrote:

Dave Page dpage@vale-housing.co.uk writes:
  

Actually, my gripe about this one is that it wasn't detected promptly.
That patch went in two weeks ago; we should have known about 
the problem

within a couple days at most.  Seems like the Windows members of the
buildfarm don't run often enough.  The whole point of the buildfarm is
to spot problems while the code is still fresh in mind, no?
  


  

I think that speaks for the current usage of the cygwin port. Snake runs
native builds daily, but like Magnus and his dev box there's no way I'm
letting Cygwin anywhere near it. Istr that the only vaguely active
Cygwin member is Andrew's laptop.



Well, lack of interest is certainly adequate reason to decommission a
port.  If we can't find anyone who cares enough about Cygwin to host a
regularly-scheduled buildfarm member, I'm for blowing it off.

  


We used to have a couple of Cygwin boxes doing regular buildfarm runs. I 
don't recall why Jim Buttafuoco stopped running ferret.


I have a shiny new set of components just waiting for me to put them 
together in a machine. Sudden trips to Australia  and bouts of ill 
health have delayed this process far beyond what I wanted. My intention 
is to put a couple of VMs on this box, one of which will be Windows, and 
will run buildfarm regularly. Of course, if someone wanted to donate a 
nice machine, either hosted by me or somewhere else, that would 
shortcircuit things :-)


Anyway, the lack of daily Cygwin builds is not permanent.

There are several supported platforms not represented on the buildfarm - 
e.g. the one HPUX member has never actually reported any results.


cheers

andrew



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] vacuum, performance, and MVCC

Ühel kenal päeval, R, 2006-06-23 kell 10:30, kirjutas Mark Woodward:

   What is interesting is setting up the server so that you
   can service your loads comfortably. Running the server at 100% lead is
   not anything you want to do on production server. There will be things
   you need to do anyway and you need some headroom for that.
 
  Of course, you design it so peaks are easily managed, but unless you run
  vacuum continuously, and that has its own set of problems, you run into
  this problem, and it can get really really bad.
 
  Usually it gets really bad if you *don't* run vacuum continuously, maybe
  hopeing to do it in slower times at night. For high-update db you have
  to run it continuously, maybe having some 5-15 sec pauses between runs.
 
 And how much I/O does this take?

Surprisingly its mostly WAL traffic, the heap/index pages themselves are
often not yet synced to disk by time of vacuum, so no additional traffic
there. If you had made 5 updates per page and then vacuum it, then you
make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

   Usually it gets really bad if you *don't* run vacuum continuously, maybe
   hopeing to do it in slower times at night. For high-update db you have
   to run it continuously, maybe having some 5-15 sec pauses between runs.
  
  And how much I/O does this take?
 
 Surprisingly its mostly WAL traffic, the heap/index pages themselves are
 often not yet synced to disk by time of vacuum, so no additional traffic
 there. If you had made 5 updates per page and then vacuum it, then you
 make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

Is this also holding about read traffic ? I thought vacuum will make a
full table scan... for big tables a full table scan is always badly
influencing the performance of the box. If the full table scan would be
avoided, then I wouldn't mind running vacuum in a loop... 

In fact I think that it would make sense to replace the whole current
vacuum stuff with a background thread which does that continuously using
a dead space map. That could be a heap sorted by tuple deletion time,
and always cleaned up up to the oldest running transaction's start
time... there would be no need for any other autovacuum then.

Cheers,
Csaba.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Planning without reason.

Martijn van Oosterhout kleptog@svana.org writes:
 On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
 (SELECT * FROM R
 WHERE a=3, b=6,. ...)
 UNION
 (SELECT * FROM R
 WHERE b=5, d=2,. ...)
 UNION
 
 And lots of unions.

 Do you need UNION, or do you actually mean UNION ALL?

 Also, couldn't you just do:

 SELECT * FROM R 
 WHERE (a=3, b=6, ...)
 OR (b=5, d=2, ...)
 etc

That seems to be what Tzahi wants the system to do for him.  But the OR
format is not in general logically equivalent to either UNION or UNION
ALL, because UNION would cause duplicate output rows to be suppressed
whereas UNION ALL could allow the same table row to be emitted multiple
times (if two different WHERE conditions could select the same row).

It's conceivable that the planner could prove that neither effect is
possible in a particular query and then make the transformation
automatically, but I'm not about to expend that kind of planning effort
on such an odd case --- checking for it would waste entirely too many
cycles in most cases.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Overhead for stats_command_string et al, take 2

Alvaro Herrera [EMAIL PROTECTED] writes:
 Average of 5 runs, for the first two cases, on the x86 machine that
 shows high overhead in gettimeofday.

 I used only 3 SELECT 1 queries instead of 100k.

 3 SELECT 1;
 HEAD8.1
 no overhead 21.923.1
 stats_command_string=1  22.436.6

 BEGIN; 3 SELECT 1; COMMIT;
 HEAD8.1
 no overhead 19.120.3
 stats_command_string=1  19.430.3

 It can be observed that HEAD in the no overhead case is actually faster
 than 8.1 on this machine.

That's more or less what I would have hoped to find, because we're
always finding ways to squeeze out bits of overhead here and there.
I wonder why your results are different from what I got on my older
machine?  I'll have to break out oprofile again and try to see what's
happening there.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL Source and Copyright Questions)

Andrew Dunstan [EMAIL PROTECTED] writes:
 Anyway, the lack of daily Cygwin builds is not permanent.

 There are several supported platforms not represented on the buildfarm - 
 e.g. the one HPUX member has never actually reported any results.

Yeah, and this is not a good thing.  Eventually I'd like to get to a
point where every platform we consider supported has regular buildfarm
reports.  No more calls for port reports during beta periods --- beta
work should focus on functionality testing, not getting it to build.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] checking on buildfarm member thrush

2006-06-23 Thread Gaetano Mendola

Tom Lane wrote:

 Could you get a gdb stack trace from that crash?  If the buildfarm
 run is under a suitable ulimit, it should be leaving a core file
 in the test PGDATA directory.

Unfortunately the core size for the user pgfarm is 0:

$ulimit -c
0

However I did a configure, make and make check on regression test
and I got two cores, may be I do something wrong but I'm not able
to get any information.

Configured with:

./configure --prefix=/home/pgfarm/HEAD/inst --enable-cassert --enable-debug 
--enable-nls --enable-integer-datetimes --with-perl --with-python --with-tcl


$ gdb tmp_check/install/home/pgfarm/HEAD/inst/bin/postgres 
./tmp_check/data/core.6516
GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-redhat-linux-gnu...Using host libthread_db 
library /lib/tls/libthread_db.so.1.


warning: core file may not match specified executable file.
Core was generated by `postgres: pgfarm regression [local] COMMENT  
 '.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
#0  0x0005 in ?? ()
(gdb) bt
#0  0x0005 in ?? ()
#1  0x4000 in ?? ()




anything else I can do ?




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

Csaba Nagy [EMAIL PROTECTED] writes:
 Surprisingly its mostly WAL traffic, the heap/index pages themselves are
 often not yet synced to disk by time of vacuum, so no additional traffic
 there. If you had made 5 updates per page and then vacuum it, then you
 make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 

 Is this also holding about read traffic ? I thought vacuum will make a
 full table scan... for big tables a full table scan is always badly
 influencing the performance of the box. If the full table scan would be
 avoided, then I wouldn't mind running vacuum in a loop... 

If you're doing heavy updates of a big table then it's likely to end up
visiting most of the table anyway, no?  There is talk of keeping a map
of dirty pages, but I think it'd be a win for infrequently-updated
tables, not ones that need constant vacuuming.

I think a lot of our problems in this area could be solved with fairly
straightforward tuning efforts on the existing autovacuum
infrastructure.  In particular, someone should be looking into
recommendable default vacuum-cost-delay settings so that a background
vacuum doesn't affect performance too much.  Another problem with the
current autovac infrastructure is that it doesn't respond very well to
the case where there are individual tables that need constant attention
as well as many that don't.  If you have N databases then you can visit
a particular table at most once every N*autovacuum_naptime seconds, and
*every* table in the entire cluster gets reconsidered at that same rate.
I'm not sure if we need the ability to have multiple autovac daemons
running at the same time, but we definitely could use something with a
more flexible table-visiting pattern.  Perhaps it would be enough to
look through the per-table stats for each database before selecting the
database to autovacuum in each cycle, instead of going by least
recently autovacuumed.

Bottom line: there's still lots of low-hanging fruit.  Why are people
feeling that we need to abandon or massively complicate our basic
architecture to make progress?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Planning without reason.

Perhaps it is over the top just for my specific query.
Basically, i wish not to do something the system should do
because, as i already noticed, when versions changes the
database can break your code if you don't keep up.

I guess i can make a map of attributes participating in an index
of a relation.
Also, i would have to take into account the type of index used.
For example, a btree should have the capability to do prefix key
searches while hash indices probably can't.
Then check each target tuple if it can use an index.
All this before constructing the query for the planner.

However, i already played with this in the past. I reached
the conclusion that it is better to let the planner decide what is
best since it is too complex for me to say things about cost
estimates or index changing capabilities.

On Friday 23 June 2006 19:28, Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  On Fri, Jun 23, 2006 at 03:57:19PM +0300, Tzahi Fadida wrote:
  (SELECT * FROM R
  WHERE a=3, b=6,. ...)
  UNION
  (SELECT * FROM R
  WHERE b=5, d=2,. ...)
  UNION
  
  And lots of unions.
 
  Do you need UNION, or do you actually mean UNION ALL?
 
  Also, couldn't you just do:
 
  SELECT * FROM R
  WHERE (a=3, b=6, ...)
  OR (b=5, d=2, ...)
  etc

 That seems to be what Tzahi wants the system to do for him.  But the OR
 format is not in general logically equivalent to either UNION or UNION
 ALL, because UNION would cause duplicate output rows to be suppressed
 whereas UNION ALL could allow the same table row to be emitted multiple
 times (if two different WHERE conditions could select the same row).

 It's conceivable that the planner could prove that neither effect is
 possible in a particular query and then make the transformation
 automatically, but I'm not about to expend that kind of planning effort
 on such an odd case --- checking for it would waste entirely too many
 cycles in most cases.

   regards, tom lane

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] vacuum, performance, and MVCC


 Bottom line: there's still lots of low-hanging fruit.  Why are people
 feeling that we need to abandon or massively complicate our basic
 architecture to make progress?

   regards, tom lane

I, for one, see a particularly nasty unscalable behavior in the
implementation  of MVCC with regards to updates.

For each update to a row additional work needs to be done to access that
row. Surely a better strategy can be done, especially considering that the
problem being solved is a brief one.

The only reason why you need previous versions of a row is for
transactions that started before or during the transaction that seeks to
modify a row. After which time, the previous versions continue to affect
performance and take up space even though they are of no value.
 (Caveats for rollback, etc. but the point is still valid).

This is a very pessimistic behavior and penalizes the more common and
optimistic operations. Now, if a tool were to be created that could roll
back an entire database to some arbitrary transaction ID between vacuums,
then I can see the usefulnes of the older versions.

I still think an in-place indirection to the current row could fix the
problem and speed up the database, there are some sticky situations that
need to be considered, but it shouldn't break much.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread David Fetter

On Fri, Jun 23, 2006 at 02:30:29PM -0400, Mark Woodward wrote:
 
  Bottom line: there's still lots of low-hanging fruit.  Why are
  people feeling that we need to abandon or massively complicate our
  basic architecture to make progress?
 
  regards, tom lane
 
 I, for one, see a particularly nasty unscalable behavior in the
 implementation  of MVCC with regards to updates.

You're not answering the question Tom asked.  Why not?

 For each update to a row additional work needs to be done to access
 that row. Surely a better strategy can be done, especially
 considering that the problem being solved is a brief one.
 
 The only reason why you need previous versions of a row is for
 transactions that started before or during the transaction that
 seeks to modify a row. After which time, the previous versions
 continue to affect performance and take up space even though they
 are of no value.  (Caveats for rollback, etc. but the point is still
 valid).

I wouldn't be so quick to dismiss those as parenthetical caveats.

 This is a very pessimistic behavior and penalizes the more common
 and optimistic operations.  Now, if a tool were to be created that
 could roll back an entire database to some arbitrary transaction ID
 between vacuums, then I can see the usefulnes of the older versions.

There was one called time travel.  Somebody might put it back in some
day :)

 I still think an in-place indirection to the current row could fix
 the problem and speed up the database, there are some sticky
 situations that need to be considered, but it shouldn't break much.

We're eagerly awaiting your patch.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] vacuum, performance, and MVCC


On 6/23/06, Mark Woodward [EMAIL PROTECTED] wrote:

I, for one, see a particularly nasty unscalable behavior in the
implementation  of MVCC with regards to updates.


I think this is a fairly common acceptance.  The overhead required to
perform an UPDATE in PostgreSQL is pretty heavy.  Actually, it's not
really PostgreSQL's implementation, but anything that employs basic
multi-version timestamp ordering (MVTO) style MVCC.  Basically,
MVTO-style systems require additional work to be done in an UPDATE so
that queries can find the most current row more quickly.


This is a very pessimistic behavior


Yes, and that's basically the point of MVTO in general.  The nice
thing about MVTO-style MVCC is that it isn't super complicated.  No
big UNDO strategy is needed because the old versions are always there
and just have to satisfy a snapshot.


I still think an in-place indirection to the current row could fix the
problem and speed up the database, there are some sticky situations that
need to be considered, but it shouldn't break much.


I agree, but should make clear that moving to an in-place update isn't
a quick-fix; it will require a good amount of design and planning.

What I find in these discussions is that we always talk about over
complicating vacuum in order to fix the poor behavior in MVCC.  Fixing
autovacuum does not eliminate the overhead required to add index
entries and everything associated with performing an UPDATE... it's
just cleaning up the mess after the fact.  As I see it, fixing the
root problem by moving to update-in-place may add a little more
complication to the core, but will eliminate a lot of the headaches we
have in overhead, performance, and manageability.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL

2006-06-23 Thread Andrew Dunstan


Tom Lane wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:
  
There are several supported platforms not represented on the buildfarm - 
e.g. the one HPUX member has never actually reported any results.



Yeah, and this is not a good thing.  Eventually I'd like to get to a
point where every platform we consider supported has regular buildfarm
reports.  No more calls for port reports during beta periods --- beta
work should focus on functionality testing, not getting it to build.


  


Then people who have access to people who own or can provide access to 
machines in classes not covered need to do a bit of begging ;-)


The requirements are (deliberately) very modest:

OS and toolset required to build postgres from CVS
A modern perl installation (=5.6 is adequate)
Anonymous read access to a CVS repository - either the one at 
postgresql.org or a replica

Outbound HTTP port 80 access to www.pgbuildfarm.org, possibly via a proxy.

Once it is set up it is close to hands free - you just set up the cron 
job(s) or equivalent.


cheers

andrew




---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Jochem van Dieten


On 6/23/06, Mark Woodward wrote:


For each update to a row additional work needs to be done to access that
row. Surely a better strategy can be done, especially considering that the
problem being solved is a brief one.

The only reason why you need previous versions of a row is for
transactions that started before or during the transaction that seeks to
modify a row. After which time, the previous versions continue to affect
performance and take up space even though they are of no value.
 (Caveats for rollback, etc. but the point is still valid).

This is a very pessimistic behavior and penalizes the more common and
optimistic operations.


Are you sure about that? ISTM that for the most common cases the TID
returned by an indexscan is the one of the last version and only if
that vbersion is too new a second TID is tried etc.

Jochem

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC

Tom Lane wrote:
 If you're doing heavy updates of a big table then it's likely to end up
 visiting most of the table anyway, no?  There is talk of keeping a map
 of dirty pages, but I think it'd be a win for infrequently-updated
 tables, not ones that need constant vacuuming.
 
 I think a lot of our problems in this area could be solved with fairly
 straightforward tuning efforts on the existing autovacuum
 infrastructure.  In particular, someone should be looking into
 recommendable default vacuum-cost-delay settings so that a background
 vacuum doesn't affect performance too much.  Another problem with the
 current autovac infrastructure is that it doesn't respond very well to
 the case where there are individual tables that need constant attention
 as well as many that don't.  If you have N databases then you can visit
 a particular table at most once every N*autovacuum_naptime seconds, and
 *every* table in the entire cluster gets reconsidered at that same rate.
 I'm not sure if we need the ability to have multiple autovac daemons
 running at the same time, but we definitely could use something with a
 more flexible table-visiting pattern.  Perhaps it would be enough to
 look through the per-table stats for each database before selecting the
 database to autovacuum in each cycle, instead of going by least
 recently autovacuumed.
 
 Bottom line: there's still lots of low-hanging fruit.  Why are people
 feeling that we need to abandon or massively complicate our basic
 architecture to make progress?

I think at some point we have to admit that _polling_ the tables, which
is what autovacuum does, just isn't going to work well, no matter how
much it is tweeked, and another approach should be considered for
certain workload cases.

At some point, the autovacuum approach starts to look like a car with
fifty bumper stickers.  The first few were fine, but at some point, the
tweeks (bumper stickers) start to overwhelm the car, and it is time to
look for a new car.

I think particularly for the UPDATE with no index key changes, a new
approach must be considred.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: Anyone still care about Cygwin? (was Re: [HACKERS] [CORE] GPL

2006-06-23 Thread Adrian Maier


On 23/06/06, Andrew Dunstan [EMAIL PROTECTED] wrote:

Tom Lane wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:

 There are several supported platforms not represented on the buildfarm -
 e.g. the one HPUX member has never actually reported any results.

 Yeah, and this is not a good thing.  Eventually I'd like to get to a
 point where every platform we consider supported has regular buildfarm
 reports.  No more calls for port reports during beta periods --- beta
 work should focus on functionality testing, not getting it to build.


Then people who have access to people who own or can provide access to
machines in classes not covered need to do a bit of begging ;-)

The requirements are (deliberately) very modest:

Once it is set up it is close to hands free - you just set up the cron
job(s) or equivalent.


Hello,

I'll try to set up a buildfarm installation for Cygwin on my computer at work.
But I'm taking the next week off,  so this will have to wait until my return.


Cheers,
Adrian Maier

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Florian Weimer

* Gavin Sherry:

 Tom Lane's MVCC talk:
 http://www.postgresql.org/files/developer/transactions.pdf

Is this still up-to-date with regard to to partial page writes?
I hope that has been fixed (like the fsync issue).

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] vacuum, performance, and MVCC

 On 6/23/06, Mark Woodward [EMAIL PROTECTED] wrote:
 I, for one, see a particularly nasty unscalable behavior in the
 implementation  of MVCC with regards to updates.

 I think this is a fairly common acceptance.  The overhead required to
 perform an UPDATE in PostgreSQL is pretty heavy.  Actually, it's not
 really PostgreSQL's implementation, but anything that employs basic
 multi-version timestamp ordering (MVTO) style MVCC.  Basically,
 MVTO-style systems require additional work to be done in an UPDATE so
 that queries can find the most current row more quickly.

 This is a very pessimistic behavior

 Yes, and that's basically the point of MVTO in general.  The nice
 thing about MVTO-style MVCC is that it isn't super complicated.  No
 big UNDO strategy is needed because the old versions are always there
 and just have to satisfy a snapshot.

 I still think an in-place indirection to the current row could fix the
 problem and speed up the database, there are some sticky situations that
 need to be considered, but it shouldn't break much.

 I agree, but should make clear that moving to an in-place update isn't
 a quick-fix; it will require a good amount of design and planning.

This is NOT an in-place update. The whole MVCC strategy of keeping old
versions around doesn't change. The only thing that does change is one
level of indirection. Rather than keep references to all versions of all
rows in indexes, keep only a reference to the first or key row of each
row, and have the first version of a row form the head of a linked list to
subsequent versions of each row. The list will be in decending order.

In the vast majority of cases, the overhead of this action will be
trivial. In an unmodified row, you're there. In a modified row, you have
one extra lookup. In extream cases, you may have to go back a few
versions, but I don't see that as a common behavior.

On a heavily updated row, you are never more than one jump away, the
indexes  shouldn't grow overly much.


 What I find in these discussions is that we always talk about over
 complicating vacuum in order to fix the poor behavior in MVCC.  Fixing
 autovacuum does not eliminate the overhead required to add index
 entries and everything associated with performing an UPDATE... it's
 just cleaning up the mess after the fact.  As I see it, fixing the
 root problem by moving to update-in-place may add a little more
 complication to the core, but will eliminate a lot of the headaches we
 have in overhead, performance, and manageability.

Vacuum is a tool for removing old versions. I think there is an overly
eager tendency to have it fix other problems.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

Bruce Momjian [EMAIL PROTECTED] writes:
 I think at some point we have to admit that _polling_ the tables, which
 is what autovacuum does, just isn't going to work well, no matter how
 much it is tweeked, and another approach should be considered for
 certain workload cases.

Autovacuum polls in its current, first-generation implementation;
what I said upthread was it needs to be smarter than that.  I am not
sure how you get from that to the conclusion that the very next step
is to abandon the vacuuming approach altogether.

What I see in this discussion is a huge amount of the grass must be
greener on the other side syndrome, and hardly any recognition that
every technique has its downsides and complications.  Furthermore,
I do not believe that this project has the ability to support multiple
fundamental storage models, as a number of people seem to be blithely
suggesting.  We're having a hard enough time debugging and optimizing
*one* storage model.  I think the correct path forward is to stick with
the same basic storage model and vacuuming concept, and address the
known performance issues with better-optimized vacuuming.  No, it will
never be perfect for every scenario, but we can certainly make it much
better than it is now, without risking killing the project by
introducing undebuggable, unmaintainable complexity.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  I think at some point we have to admit that _polling_ the tables, which
  is what autovacuum does, just isn't going to work well, no matter how
  much it is tweeked, and another approach should be considered for
  certain workload cases.
 
 Autovacuum polls in its current, first-generation implementation;
 what I said upthread was it needs to be smarter than that.  I am not
 sure how you get from that to the conclusion that the very next step
 is to abandon the vacuuming approach altogether.

I am not ready to abandon autovacuum, but as I stated later the UPDATE
with no key change case is common enought that it could be handled
better without involving autovacuum and its limitations.

As I remember, most databases have problem with DELETE/INSERT cycles,
but we seem to be hit by UPDATE performance more than most, and more
than is wise.

 What I see in this discussion is a huge amount of the grass must be
 greener on the other side syndrome, and hardly any recognition that
 every technique has its downsides and complications.  Furthermore,
 I do not believe that this project has the ability to support multiple
 fundamental storage models, as a number of people seem to be blithely
 suggesting.  We're having a hard enough time debugging and optimizing
 *one* storage model.  I think the correct path forward is to stick with
 the same basic storage model and vacuuming concept, and address the
 known performance issues with better-optimized vacuuming.  No, it will
 never be perfect for every scenario, but we can certainly make it much
 better than it is now, without risking killing the project by
 introducing undebuggable, unmaintainable complexity.

Well, are you suggesting we just stop improving the database?  I am sure
not.  But, your suggestion is that we can't do better without incurring
more complexity (true), and that complexity will not be worth it.  I
don't agree with that until I see some proposals, and shutting down
discussion because they will add complexity or are fruitless seems
unwise.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC

Ühel kenal päeval, R, 2006-06-23 kell 18:05, kirjutas Csaba Nagy:
Usually it gets really bad if you *don't* run vacuum continuously, maybe
hopeing to do it in slower times at night. For high-update db you have
to run it continuously, maybe having some 5-15 sec pauses between runs.
   
   And how much I/O does this take?
  
  Surprisingly its mostly WAL traffic, the heap/index pages themselves are
  often not yet synced to disk by time of vacuum, so no additional traffic
  there. If you had made 5 updates per page and then vacuum it, then you
  make effectively 1 extra WAL write meaning 20% increase in WAL traffic. 
 
 Is this also holding about read traffic ? I thought vacuum will make a
 full table scan... for big tables a full table scan is always badly
 influencing the performance of the box. If the full table scan would be
 avoided, then I wouldn't mind running vacuum in a loop... 

I was referring to a design that keeps frequently updated tuples in a
separate table.

 In fact I think that it would make sense to replace the whole current
 vacuum stuff with a background thread which does that continuously using
 a dead space map. That could be a heap sorted by tuple deletion time,
 and always cleaned up up to the oldest running transaction's start
 time... there would be no need for any other autovacuum then.

This has been on todo list for some time already.

 
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] vacuum, performance, and MVCC


On 6/23/06, Mark Woodward [EMAIL PROTECTED] wrote:

Rather than keep references to all versions of all
rows in indexes, keep only a reference to the first or key row of each
row, and have the first version of a row form the head of a linked list to
subsequent versions of each row. The list will be in decending order.


By all means, please go ahead and try it because it's not quite that
easy.  You're going to run into serious locking and contention issues
this way.  In the end, it's not much better than running a sequential
scan to query a row that's been updated several thousand times on a
table that hasn't been vacuumed... follow that pointer :)

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

 Tom Lane wrote:
 If you're doing heavy updates of a big table then it's likely to end up
 visiting most of the table anyway, no?  There is talk of keeping a map
 of dirty pages, but I think it'd be a win for infrequently-updated
 tables, not ones that need constant vacuuming.

 I think a lot of our problems in this area could be solved with fairly
 straightforward tuning efforts on the existing autovacuum
 infrastructure.  In particular, someone should be looking into
 recommendable default vacuum-cost-delay settings so that a background
 vacuum doesn't affect performance too much.  Another problem with the
 current autovac infrastructure is that it doesn't respond very well to
 the case where there are individual tables that need constant attention
 as well as many that don't.  If you have N databases then you can visit
 a particular table at most once every N*autovacuum_naptime seconds, and
 *every* table in the entire cluster gets reconsidered at that same rate.
 I'm not sure if we need the ability to have multiple autovac daemons
 running at the same time, but we definitely could use something with a
 more flexible table-visiting pattern.  Perhaps it would be enough to
 look through the per-table stats for each database before selecting the
 database to autovacuum in each cycle, instead of going by least
 recently autovacuumed.

 Bottom line: there's still lots of low-hanging fruit.  Why are people
 feeling that we need to abandon or massively complicate our basic
 architecture to make progress?

 I think at some point we have to admit that _polling_ the tables, which
 is what autovacuum does, just isn't going to work well, no matter how
 much it is tweeked, and another approach should be considered for
 certain workload cases.

Thank you, that is *eactly* the anaology I have been unable to formulate.
It was on my mind but I could not put my finger on it.

Vacuum is findimentally inefficient as it does not know what has changed
and must go through an entirety to find the specific each time. Going
through the whole table each time is messed up and wasteful.


 At some point, the autovacuum approach starts to look like a car with
 fifty bumper stickers.  The first few were fine, but at some point, the
 tweeks (bumper stickers) start to overwhelm the car, and it is time to
 look for a new car.

 I think particularly for the UPDATE with no index key changes, a new
 approach must be considred.


I have been ranting about a first row strategy, one where the first
version of a row is the top of a linked list of versions.

(1) The indexes point to the first key row.
(2) When a row is updated, it is found in the various indexes, if the key
row currenlty exists in the index, no changes to the index are made. If it
is not found, the old version of the row is orphaned and behaves as
PostgreSQL always behaves.
(3) If the row is not orphaned, its last version reference is updated.

For the most part, this should only affect updates where the index entries
don't change. If the index value is always change, PostgreSQL will behave
as it currently does. If the index values do not change, updates will be
faster to do and won't impact queries.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] vacuum, performance, and MVCC


On 6/23/06, Mark Woodward [EMAIL PROTECTED] wrote:

Vacuum is findimentally inefficient


The theory of database cleanup (VACUUM) is good, but has nothing to do
with the overhead we currently encounter in performing an update.


I have been ranting about a first row strategy, one where the first
version of a row is the top of a linked list of versions.



(1) The indexes point to the first key row.


I'm with ya.


(2) When a row is updated, [...] If it is not found [in the index], the
old version of the row is orphaned and behaves as PostgreSQL
always behaves.


aka current bad update behavior.


(3) If the row is not orphaned, its last version reference is updated.


The *last version* reference?  Where is this, another entry on the
tuple header (extra size overhead on every tuple) or updating the
index (additional locking+WAL)?  Following something like the ctid?
How is that really any different than performing a sequential scan
from the index as a starting point?  That could still encounter
multiple physical I/Os to get to the current row and is potentially
very slow.  Sure, we can add extra free space to each block to limit
tuple version migration across blocks... combined with constant
vacuuming and FSM updates we'd get some great performance.

Of course, we could just avoid the overcomplication and locking issues
of having to tune block-level freespace, vacuum, and tweaking FSM for
each type of application, but where's the fun in that? (yes, that was
a badly placed sarchastic joke)


For the most part, this should only affect updates where the index entries
don't change. If the index value is always change, PostgreSQL will behave
as it currently does. If the index values do not change, updates will be
faster to do and won't impact queries.


I like the idea of a new strategy when index entries do not change.
But, like Tom said, there is no free lunch; in your example you're
moving the high cost of an UPDATE to even more costly SELECTs.

This is something that requires a good amount of planning and design,
and I'm really glad to see some ideas being thrown into this
discussion.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Stefan Kaltenbrunner

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
 I think at some point we have to admit that _polling_ the tables, which
 is what autovacuum does, just isn't going to work well, no matter how
 much it is tweeked, and another approach should be considered for
 certain workload cases.
 
 Autovacuum polls in its current, first-generation implementation;
 what I said upthread was it needs to be smarter than that.  I am not
 sure how you get from that to the conclusion that the very next step
 is to abandon the vacuuming approach altogether.

yeah autovacuum still can be improved quite a lot, but as always this
can be done on a step by step base.

 
 What I see in this discussion is a huge amount of the grass must be
 greener on the other side syndrome, and hardly any recognition that
 every technique has its downsides and complications.  Furthermore,
 I do not believe that this project has the ability to support multiple
 fundamental storage models, as a number of people seem to be blithely
 suggesting.  We're having a hard enough time debugging and optimizing
 *one* storage model.  I think the correct path forward is to stick with
 the same basic storage model and vacuuming concept, and address the
 known performance issues with better-optimized vacuuming.  No, it will
 never be perfect for every scenario, but we can certainly make it much
 better than it is now, without risking killing the project by
 introducing undebuggable, unmaintainable complexity.

While I'm not an expert on MVCC - it certainly seems that sticking to
the current storage model and continuing to improve on it (especially
wrt vacuum performance) gradually over time (as it has happened for the
last years) is a much better and safer approach than trying to do
something revolutionary which in theory might (or might not) be better
than the current approach for this or that workload.

PostgreSQL got a _LOT_ faster for each of the last releases and by my
testing -HEAD is already significantly(20-30%) faster for some of our
apps than 8.1 and all that was achieved without radically redesigning a
proven (reliability wise) storage engine.
Maybe and only maybe one day(or when somebody comes up with a usable
patch - as always) we will at the point where we really need to think
about doing that but for now there seems to be still a lot of low
hanging fruit left to improve for month and years to come.


Stefan

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] vacuum, performance, and MVCC


On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote:

What I see in this discussion is a huge amount of the grass must be
greener on the other side syndrome, and hardly any recognition that
every technique has its downsides and complications.


I'm being totally objective.  I don't think we should abandon
PostgreSQL's overall design at all, because we do perform INSERTs and
DELETEs much better than most systems.  However, I've looked at many
systems and how they implement UPDATE so that it is a scalable
operation.  Sure, there are costs and benefits to each implementation,
but I think we have some pretty brilliant people in this community and
can come up with an elegant design for scalable UPDATEs.


Furthermore, I do not believe that this project has the ability to
support multiple fundamental storage models, as a number of
people seem to be blithely suggesting.


The fundamental storage model was not designed to be scalable.  Many
improvements have been done to get it where it is today.  The
improvements are certainly good but the model itself it isn't perfect
and sometimes things need to be redesigned a bit every now and again.
To be clear though, I'm in no way suggesting that we trash and rewrite
PostgreSQL's storage architecture.


we can certainly make it much
better than it is now, without risking killing the project by
introducing undebuggable, unmaintainable complexity.


I'm not saying we rewrite the entire system in C++ (as Firebird
did)... that was a risky and possibly project-killing move.  I see us
changing the way a single operation works through community-oriented
planning and design.  This approach seems to have worked for years...
and includes the MVCC implementation itself.

--
Jonah H. Harris, Software Architect | phone: 732.331.1300
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 2nd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Planning without reason.

2006-06-23 Thread Greg Stark


Tom Lane [EMAIL PROTECTED] writes:

 It's conceivable that the planner could prove that neither effect is
 possible in a particular query and then make the transformation
 automatically, but I'm not about to expend that kind of planning effort
 on such an odd case --- checking for it would waste entirely too many
 cycles in most cases.

Fwiw these aren't really very rare cases. Usually it goes the other direction
though. I seem to recall Oracle did in fact support a plan where it converted
OR expressions into a kind of union plan node.

But I think Postgres's bitmap index scan satisfies much of the same need. I
think the most useful case where the union plan was beneficial was precisely
when you had something like WHERE index_col1=1 OR indexed_col2=2.

Going from an UNION plan to a OR plan would be somewhat strange. Programmers
don't usually write plans as UNION in place of the more natural OR unless they
have a reason to.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Planning without reason.

On Fri, Jun 23, 2006 at 08:14:07PM +0300, Tzahi Fadida wrote:
 I guess i can make a map of attributes participating in an index
 of a relation.
 Also, i would have to take into account the type of index used.
 For example, a btree should have the capability to do prefix key
 searches while hash indices probably can't.
 Then check each target tuple if it can use an index.
 All this before constructing the query for the planner.

At the end of the day it comes down to that Postgres has no way
currently to express the plan you want, so we're not talking about
small planner or optimiser changes, we're talking about creating a
completely new node type which could be used for this purpose.

It would be a node that sat on top of a seq scan and had a number of
conditions. Each tuple would be matched against each condition. Once a
condition is matched, that tuple is returned. Once a condition has
matched N times it is disabled. Once all conditions are disabled,
you're done.

Seems terribly special purpose, yet I don't see how you could do it any
other way. If it wern't for the LIMIT clauses the whole operation would
be trivial.

  It's conceivable that the planner could prove that neither effect is
  possible in a particular query and then make the transformation
  automatically, but I'm not about to expend that kind of planning effort
  on such an odd case --- checking for it would waste entirely too many
  cycles in most cases.

I think in the case we have here, the transformation wouldn't apply
anyway (otherwise it could be done by hand).

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] vacuum, performance, and MVCC

Jonah H. Harris wrote:
 On 6/23/06, Tom Lane [EMAIL PROTECTED] wrote:
  What I see in this discussion is a huge amount of the grass must be
  greener on the other side syndrome, and hardly any recognition that
  every technique has its downsides and complications.
 
 I'm being totally objective.  I don't think we should abandon
 PostgreSQL's overall design at all, because we do perform INSERTs and
 DELETEs much better than most systems.  However, I've looked at many
 systems and how they implement UPDATE so that it is a scalable
 operation.  Sure, there are costs and benefits to each implementation,
 but I think we have some pretty brilliant people in this community and
 can come up with an elegant design for scalable UPDATEs.

I think the UPDATE case is similar to the bitmap index scan or perhaps
bitmap indexes on disk --- there are cases we know can not be handled
well by our existing code, so we have added (or might add) these
features to try to address those difficult cases.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Buffers to Nest Loop Join and him pages

2006-06-23 Thread Daniel Xavier de Sousa

Hi for all  Please,  I have two doubts. Can somebody help me please??  First  doubt: I want know how many pages Postgres use when execute one query for Nest-loop-Join. I  have used explain analyze, but it dont give number of pages  Second  doubt: Normally when some SGBD exec the algorithm Nest-Loop Join, there are differences  about the space of memory (buffer) for outer table and  inner table. So, I want know where (What file?) Postgres define the  number for this spaces (buffers) ? And can I change it?I have  asked for others forum, but anybody has asked me. So Please, SOMEBODY HELP ME  !!!By  Daniel   
		 
Yahoo! doce lar. Faça do Yahoo! sua homepage.

Re: [HACKERS] [GENERAL] Out of memory error in 8.1.0 Win32

Ah-hah.  I made a table similar to yours (with a lot of dummy data) and
went trawling through the backend memory to try to see where the space
was going.  I found two significant inefficiencies in HashAggregate's
space usage:

* It stores a representative tuple for each input group, containing
the grouping fields needed to identify the group.  Or at least that's
the theory.  What I saw in this example (HashAggregate over a direct
table SeqScan) is that what was getting stored was raw disk tuples
including *all* the table columns not only the needed ones.  This is
doubtless because the optimization that skips a projection step when
not needed at a table-scan node is firing inappropriately.  This was
only costing one extra integer field in my cut-down example, but it
might've accounted for significant overhead in your case, and in the
general case it could be horribly bad.  Even if the projection were
being done properly, I think we'd be storing copies of the input
columns used to compute the aggregates, not only the grouping columns.
So there's probably an easy fix here that could be back-patched into
existing releases, and a tenser fix that will save more space.

* dynahash.c is allocating new hashtable entries 32 at a time, without
any awareness for the fact that palloc() rounds small requests up to the
next power-of-2 size.  In the example I was looking at, it was asking
for 1280 bytes at a time, resulting in almost 40% of the space used
being completely wasted.  This is pretty trivial to fix.

Aside from being just plain inefficient, neither of these effects are
being accounted for in the planner's estimate of space needed for a hash
aggregation, and thus they could be contributing to the problem of
underestimated table size leading to out-of-memory failures.

I'm taking off for the evening but will look into fixing these soon.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger

Just out of curiosity Mark, didn't you write your session daemon so  
that you don't have to put sessions in postgres anymore?  Or are you  
just giving that as an example of a very wide, very heavily updated  
table?  My session tables have been an extreme case of this problem,  
but no other table that I have is so adversely affected by this  
behavior.  My decision was not to pull postgres out entirely, just  
using other session handlers.


Rick

On Jun 22, 2006, at 7:59 AM, Mark Woodward wrote:


After a long battle with technology, [EMAIL PROTECTED] (Mark
Woodward), an earthling, wrote:
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward)  
mumbled into

her beard:

[snip]


1.  The index points to all the versions, until they get  
vacuumed out.


It can't point to all versions, it points to the last current
version
as  updated by vacuum, or the first version of the row.


No, it points to *all* the versions.

Suppose I take a table with two rows:

INFO:  analyzing public.test
INFO:  test: 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Then, over and over, I remove and insert one entry with the same PK:

sample=# delete from test where id = 2;insert into test (id)  
values (2);

DELETE 1


[snip]


Now, I vacuum it.

sample=# vacuum verbose analyze test;
INFO:  vacuuming public.test
INFO:  index test_id_key now contains 2 row versions in 2 pages
DETAIL:  10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  test: removed 10 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  test: found 10 removable, 2 nonremovable row versions in  
1 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.test
INFO:  test: 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Notice that the index contained 10 versions of that one row.

It pointed to *ALL* the versions.


Hmm, OK, then the problem is more serious than I suspected.
This means that every index on a row has to be updated on every
transaction that modifies that row. Is that correct?

I am attaching some code that shows the problem with regard to
applications such as web server session management, when run, each  
second
the system can handle fewer and fewer connections. Here is a brief  
output:


[EMAIL PROTECTED]:~/pgfoo$ ./footest
1307 sessions per second, elapsed: 1
1292 sessions per second, elapsed: 2
1287 sessions per second, elapsed: 3

1216 sessions per second, elapsed: 25
1213 sessions per second, elapsed: 26
1208 sessions per second, elapsed: 27

1192 sessions per second, elapsed: 36
1184 sessions per second, elapsed: 37
1183 sessions per second, elapsed: 38

1164 sessions per second, elapsed: 58
1170 sessions per second, elapsed: 59
1168 sessions per second, elapsed: 60

As you can see, in about a minute at high load, this very simple table
lost about 10% of its performance, and I've seen worse based on update
frequency.  Before you say this is an obscure problem, I can tell  
you it
isn't. I have worked with more than a few projects that had to  
switch away

from PostgreSQL because of this behavior.

Obviously this is not a problem with small sites, but this is a real
problem with an enterprise level web site with millions of visitors  
and

actions a day. Quite frankly it is a classic example of something that
does not scale. The more and more updates there are, the higher the  
load

becomes. You can see it on top as the footest program runs.

There has to be a more linear way of handling this scenario.
footest.c

---(end of  
broadcast)---

TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger



On Jun 22, 2006, at 10:33 AM, Mark Woodward wrote:

Ãœhel kenal pÃ¤eval, N, 2006-06-22 kell 09:59, kirjutas Mark  
Woodward:

After a long battle with technology, [EMAIL PROTECTED] (Mark
Woodward), an earthling, wrote:
Clinging to sanity, [EMAIL PROTECTED] (Mark Woodward)  
mumbled

into

It pointed to *ALL* the versions.


Hmm, OK, then the problem is more serious than I suspected.
This means that every index on a row has to be updated on every
transaction that modifies that row. Is that correct?


Yes.


I am attaching some code that shows the problem with regard to
applications such as web server session management, when run, each
second
the system can handle fewer and fewer connections. Here is a brief
output:

[EMAIL PROTECTED]:~/pgfoo$ ./footest
1307 sessions per second, elapsed: 1
1292 sessions per second, elapsed: 2
1287 sessions per second, elapsed: 3

1216 sessions per second, elapsed: 25
1213 sessions per second, elapsed: 26
1208 sessions per second, elapsed: 27

1192 sessions per second, elapsed: 36
1184 sessions per second, elapsed: 37
1183 sessions per second, elapsed: 38

1164 sessions per second, elapsed: 58
1170 sessions per second, elapsed: 59
1168 sessions per second, elapsed: 60

As you can see, in about a minute at high load, this very simple  
table
lost about 10% of its performance, and I've seen worse based on  
update
frequency.  Before you say this is an obscure problem, I can tell  
you it
isn't. I have worked with more than a few projects that had to  
switch

away
from PostgreSQL because of this behavior.


You mean systems that are designed so exactly, that they can't  
take 10%

performance change ?


No, that's not really the point, performance degrades over time, in  
one

minute it degraded 10%.

The update to session ratio has a HUGE impact on PostgreSQL. If you  
have a

thousand active sessions, it may take a minute to degrade 10% assuming
some level of active vs operations per session per action.


How big are your session?  Running with about 1000 sessions, running  
vacuum on just the session table is so fast it is barely noticeable.   
Vacuuming my session table every 5 minutes keeps them very, very  
small and easy to vacuum and performance degradation is not an  
issue.  I could probably do it every minute if I had to and it would  
be fine.  But my sessions are only about 5k on average.


What is a bigger concern for me is the massive amount of writes to  
the disk that happen in postgres to make sure the data is safe.  It's  
just a waste of disk bandwidth for data for data that is transient  
anyway.


To me postgres (and rdbms's in general) are just not good for  
handling session data for web apps.  Once again isn't that why you  
wrote mcache?




If an active user causes a session update once a second, that is  
not too
bad, but if an active user updates a session more often, then it is  
worse.


Generally speaking, sessions aren't updated when they change, they are
usually updated per HTTP request. The data in a session may not  
change,
but the session handling code doesn't know this and simply updates  
anyway.


This problem is more or less specific to php no?  Because it reads  
the whole session on session open and writes the whole thing on  
close.  Because of this I am looking into smarter ways of handling  
sessions than this.  Because yes, the session data RARELY changes.  I  
am looking into ways of only doing updates when the data changes.  In  
fact for a very similar problem, where I had tons of tiny requests  
coming in that would NEVER alter the sessions I skipped session_start  
and used my own session_touch function to update the timestamp on the  
session because that's all I needed to do.  It saved TONS of wasted  
overhead.


I don't mean to get off topic but it seems like these sorts of  
problems are better solved outside of postgres.  I think your session  
daemon is in fact the right approach here.  If you have other tables  
with similar problems that is one thing but if it is just php session  
tables then I think we need to look for a better use case to look  
into this.




In a heavily AJAX site, you may have many smaller HTTP requests  
returning
items in a page. So, a single page may consist of multiple HTTP  
requests.
Worse yet, as a user drags an image around, there are lots of  
background
requests being made. Each request typically means a session lookup  
and a
session update. This is compounded by the number of active users.  
Since

the object of a site is to have many active users, this is always a
problem. It is less intrusive now that non-locking vacuum is there,  
but

that doesn't mean it isn't a problem.


Once again I think to run an Enterprise app (and by that I mean it  
scales well) you need Enterprise class session management.  The php  
model is not good for this and using postgres is not good for this.   
It's just not the right tool for this job in my opinion.  I would  
think you could

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread Rick Gigger



On Jun 22, 2006, at 2:36 PM, Mark Woodward wrote:



What you seem not to grasp at this point is a large web-farm,  
about 10

or
more servers running PHP, Java, ASP, or even perl. The database is
usually
the most convenient and, aside from the particular issue we are  
talking

about, best suited.


	The answer is sticky sessions : each user is assigned to one and  
only one
webserver in the cluster and his session is maintained locally, in  
RAM. No

locks, no need to manage distributed session...

I actually have a good number of years of experience in this  
topic, and
memcached or file system files are NOT the best solutions for a  
server

farm.


	If sessions are distributed, certainly, but if sessions are  
sticky to

their own server ?


And what if a particulr server goes down? or gets too high a  
percentage of

the load?


Yes, I don't think that sticky sessions are the answer.  But phps  
session handling behavior could be greatly improved on.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC

2006-06-23 Thread mark

On Fri, Jun 23, 2006 at 03:08:34PM -0400, Bruce Momjian wrote:
 Tom Lane wrote:
  ...
  suggesting.  We're having a hard enough time debugging and optimizing
  *one* storage model.  I think the correct path forward is to stick with
  the same basic storage model and vacuuming concept, and address the
  known performance issues with better-optimized vacuuming.  No, it will
  never be perfect for every scenario, but we can certainly make it much
  better than it is now, without risking killing the project by
  introducing undebuggable, unmaintainable complexity.
 Well, are you suggesting we just stop improving the database?  I am sure
 not.  But, your suggestion is that we can't do better without incurring
 more complexity (true), and that complexity will not be worth it.  I
 don't agree with that until I see some proposals, and shutting down
 discussion because they will add complexity or are fruitless seems
 unwise.

It sounds like everybody agrees that things need to be fixed, and genuinely
motivated people are trying to offer what they have to the table.

Tom already has enough on his plate, as do most others here - so unless
a competent champion can take up the challenge, discussion is all we have.

I'm not liking the we should do it this way, no, we should do it that.
My read of the situation is that both may be useful, and that both should
be pursued. But one set of people can't pursue both.

Is any who is able, able to take up this challenge? Perhaps more than one,
from both major directions? (vacuum on one side, and improved storage on
the other) Somebody with the time and skill, who can work through the
design discussions on one of the aspects?

I want to contribute soon, and this is the sort of thing that interests me -
but I still don't have time yet, and there would be no guarantee that I
succeeded. Somebody else? :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] vacuum, performance, and MVCC