Re: [HACKERS] WIP patch: Collation support

2008-09-18 Thread Martijn van Oosterhout
On Thu, Sep 18, 2008 at 05:41:16PM +0300, Heikki Linnakangas wrote:
> Attached is an updated version of the stripped-down patch. I've cleaned 
> it up a bit, and added more sanity checks. Documentation is still 
> missing and I haven't test it much.

FWIW, I disagree with the stripping down and think we should aim for
the whole patch as submitted. Historically, the hardest part of getting
collation support into postgres has been the catalog changes and easier
the support for something other than OS locales. I supported the patch as
it was precisely because it finally did that, and the stripping down
takes us back to square one.

Implementing the COLLATE keyword is much easier once you have the underlying
support and woldn't involve any backward incompatabilities. The
stripped down version gets us to a state which will only be supported
(hopefully) for one release.

For anyone counting, Firebird added support for ICU more than three
years ago.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Alvaro Herrera
I committed this to HEAD.  If anything breaks, I request that someone
else does the cleanup, as it looks like we're headed for the hospital
first thing tomorrow morning to see what newborns look like.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 8.3.1 autovacuum stopped doing anything months ago

2008-09-18 Thread Jeffrey Baker
I have an 8.3.1 instance on Linux and since June 29th the autovacuum process
has claimed to be working on the same three tables.  That's OK, I am a very
patient man, and these are very large tables.  Today I started to get
transaction wraparound warnings, so I go and check it out.  Turns out the
autovacuum processes are all just doing nothing.  When I strace them, they
are all three blocked on syscalls.

So I restart the database and run a vacuum.  Of course, once the wraparound
warning is reached, there's no way to disable the autovac, so now my vacuum
maintenance job is competing with three invulnerable autovacuum processes.
I am thinking of sending them SIGSTOP.

Anyway, I have some issues.  One, of course, is that the autovacuum should
not have been deadlocked or otherwise stalled like that.  Perhaps it needs a
watchdog of some kind.  Has anyone else experienced an issue like that in
8.3.1?  The only thing I can see in the release notes that indicates this
problem may have been fixed is the following:

"Repair two places where SIGTERM exit of a backend could leave corrupted
state in shared memory (Tom)"

However I don't know who or what would have sent SIGTERM to the autovacuum
children.

Secondly, there really does need to be an autovacuum=off,really,thanks so
that my maintenance can proceed without competition for i/o resources.  Is
there any way to make that happen?  Is my SIGSTOP idea dangerous?

-jwb


Re: [HACKERS] Proposal of SE-PostgreSQL patches (for CommitFest:Sep)

2008-09-18 Thread KaiGai Kohei

At the CommitFest:Sep, I got several comments about SE-PostgreSQL from Peter.
(Thanks for your comments.)
He asked me several questions about its concept, then I replied for them.
However, it seems to me there is a difference in our opinions.

In my opinion, it is quite natural that different security mechanisms can
have differences in its decision making, its gulanuality and its scope.
But he concerned that SE-PostgreSQL provides fine-grained access control
feature, though the native PostgreSQL does not provide it yet.

> *) Row-level security, column-level security and so on should in my mind
> first exist as a native SQL-level feature.  It would be hard to explain
> that PostgreSQL does not have column-level GRANT privileges but that you
> can achieve the same if you go through SELinux.

I don't know his current opinion.
But I think it is not worthful to stop making a progress due to
differences in opinions. So, I think there are the following three
options to solve the issue.


[1] Make a consensus that different security mechanisms have differences
in its decision making, its gulanuality and its scope

I think it is the most straightforward answer.
As operating system doing, DAC and MAC based access controls should be
independently applied on accesses from users, and this model is widely
accepted.
These facilities can also have different results, gulanualities and scopes.


[2] Make a new implementation of OS-independent fine grained access control

If it is really really necessary, I may try to implement a new separated
fine-grained access control mechanism due to the CommitFest:Nov.
However, we don't have enough days to develop one more new feature from
the scratch by the deadline.


[3] Restrict functionalities of SE-PostgreSQL

It is the most laughable idea.
If SE-PostgreSQL lose its fine-grained access control feature, both of
access control features have same gulanualities at least.
But it makes unmotivate me so much. I believe no one agree this option.


I want any comments on this topic.
Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-18 Thread Tom Lane
Tatsuo Ishii <[EMAIL PROTECTED]> writes:
>> Why does set_recursion_pathlist think that the subquery might have
>> useful pathkeys?  We know it must always be a UNION ALL, no?

> Right. But someday we might implement "UNION" (without ALL) then we
> have useful pathkeys...

Good point.  Might as well leave it in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to Host Project

2008-09-18 Thread Alvaro Herrera
David E. Wheeler wrote:

> So I'm wondering, given the various discussions of PostgreSQL module  
> hosting in the past, where would be a good place to put a PostgreSQL  
> module project? The things I would like to have are:
>
>   * SVN or git hosting (I've not used git, but would try it)
>   * Ability to hand out commit bits to other folks
>   * A project home page and/or wiki
>   * Good search results rankings in Google et al.
>   * Mail lists
>   * Bug tracking
>   * Release management

Why not host the code on (say) GitHub, and the rest of the stuff on
pgFoundry?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to Host Project

2008-09-18 Thread Robert Treat
On Thursday 18 September 2008 14:22:14 David E. Wheeler wrote:
> Howdy,
>
> Not *exactly* hackers-related, but I wanted to get a feel for this
> from those who are likely to use project hosting, and to minimize the
> chances of a flame war.
>
> Right now I have pgTAP on pgFoundry, which is okay, though it appears
> to be largely unmaintained. PostgreSQL module projects seem to mainly
> just flounder there.
>
> So I'm wondering, given the various discussions of PostgreSQL module
> hosting in the past, where would be a good place to put a PostgreSQL
> module project? The things I would like to have are:
>
>* SVN or git hosting (I've not used git, but would try it)
>* Ability to hand out commit bits to other folks
>* A project home page and/or wiki
>* Good search results rankings in Google et al.
>* Mail lists
>* Bug tracking
>* Release management
>
> Overall, it should be easy to find my project, and easy to download it
> and build it for PostgreSQL. I've had the following suggestions for
> places to try, in addition to pgFoundry:
>

my .02, since i have used most of these... 

>* github

does not offer mailing lists or bug tracking, and the release management is 
odd

>* Google Code

does not offer mailing lists

>* LaunchPad

does not offer svn or git, and i think they dont offer a home page service

>* WebFaction
>

dont really know anything about these guys, but i thought they did web 
hosting, not project hosting. 

Just for the record, you have overlooked SourceForge. While it appears to 
fallen out of favor with the open source crowd, it is the one service that 
does provide everything you wanted. 

> I've not used any of these. So my question is, what do you prefer for
> third-party PostgreSQL modules. Where is it that the the PostgreSQL
> community is likely to aggregate with its modules?
>

I've been saying for some time now we need to get out of the project hosting 
service, and get into the project directory service. What we really want is 
to make it easy for people to find postgresql related projects, regardless of 
where they are. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Common Table Expressions (WITH RECURSIVE) patch

2008-09-18 Thread Tatsuo Ishii
> Why does set_recursion_pathlist think that the subquery might have
> useful pathkeys?  We know it must always be a UNION ALL, no?

Right. But someday we might implement "UNION" (without ALL) then we
have useful pathkeys...

Or shall I completely remove the step to generate patheys and do not
pass pathkeys to create_recursion_path?

pathkeys = convert_subquery_pathkeys(root, rel, 
subroot->query_pathkeys);
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Joshua Drake
On Thu, 18 Sep 2008 16:57:19 -0700
Ron Mayer <[EMAIL PROTECTED]> wrote:

> (c) are secretly praying for an excuse
> to upgrade anyway.

heh

-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Ron Mayer

Steve Crawford wrote:

Tom Lane wrote:


Yeah.  What this is about is how long the *community* supports 7.4...


Is there any way to poll the community and see how much people
in the community care about 7.4 community support?

It seems possible that most people with large important 7.4 systems
either (a) have commercial support contracts anyway or (b) are capable
of supporting it in-house, or (c) are secretly praying for an excuse
to upgrade anyway.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Synchronous Log Shipping Replication

2008-09-18 Thread Bruce Momjian
Simon Riggs wrote:
> Why not:
> 1. Same procedure as Warm Standby now
> a) WAL archiving to standby starts
> b) base backup
> 
> 2. Startup standby, with additional option to stream WAL. WALReceiver
> starts, connects to Primary. Primary issues log switch. Archiver turns
> itself off after sending that last file. WALSender starts streaming
> current WAL immediately after log switch.
> 
> 3. Startup process on standby begins reading WAL from point mentioned by
> backup_label. When it gets to last logfile shipped by primary's
> archiver, it switches to reading WAL files written by WALReceiver.
> 
> So all automatic. Uses existing code. Synchronous replication starts
> immediately. Also has the advantage that we do not get WAL bloat on
> primary. Configuration is almost identical to current Warm Standby, so
> little change for existing Postgres sysadmins.

I totally agree.  Requiring the master to be down for a significant time
to add a slave isn't going to keep people happy very long.  We have the
technology now to allow warm standby slaves by using PITR, and it seems
a similar system can be used to setup slaves, and for cases when the
slave drops off and has to rejoin.  The slave can use the existing
'restore_command' command to pull all WAL files it needs, and then the
slave needs to connect to the master and say it is ready for WAL files. 
The master is going to need to send perhaps everything from the start of
the existing WAL file so the slave is sure to get all changes during the
switch from 'restore_command' to network-passed WAL info.

I can imagine the slave going in and out of network connectivity as long
as the required PITR files are still available.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to Host Project

2008-09-18 Thread Josh Berkus
David,

> Right now I have pgTAP on pgFoundry, which is okay, though it appears
> to be largely unmaintained. PostgreSQL module projects seem to mainly
> just flounder there.

Yeah.  We had huge plans for pgFoundry, but got burned out by a combination 
of GForge problems and Hub.org problems -- I know I put in over 150 hours 
just making it run.  At this point, nobody wants to deal with it anymore.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to Host Project

2008-09-18 Thread David E. Wheeler

On Sep 18, 2008, at 11:27, Joshua Drake wrote:


  * LaunchPad



Is backed by PostgreSQL. It is the only logical choice :). Seriously
though it is a good service.


Looks pretty nice, though it doesn't have project home pages. Having  
just created one for pgTAP on pgFoundry, I'd like to keep it. :-)


I posted a question about this to see if it's in the plans:

  https://answers.launchpad.net/launchpad/+question/45640

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Tom Lane
I did some editorializing on the FSM README file, in the line of
familiarizing myself with the contents.  Attached is an updated version.

Here are a couple of other random comments I jotted in the process:

search_avail makes me nervous: in the presence of a corrupt tree
I think it could index off the end of the page.  There needs to be
protection against trying to access a leaf node that doesn't exist.
(The search upward is okay because it cannot "miss" the root, and
you already checked the root is big enough; but a comment about
that wouldn't hurt.)  Also, I think it shouldn't throw a hard error
if the tree is corrupt, but just elog(LOG) and take corrective action.

I'd be happier if "next" were declared as int, as that makes it more
likely to be atomically fetchable/storable than if it's int16.  (On some
platforms a partial-word store is done by read, modify, write of a full
word.)  And could we name it something less generic than "next"?

The functions in fsmpage.c have names that are far too generic to be
exposed as global symbols.  If you don't want to fold that file into
freespace.c and make them static, then you need to rename them.

regards, tom lane

$PostgreSQL$

Free Space Map
--

The purpose of the free space map is to quickly locate a page with enough
free space to hold a tuple to be stored; or to determine that no such page
exists and the relation must be extended by one page.  As of PostgreSQL 8.4
each relation has its own, extensible free space map stored in a separate
"fork" of its relation.  This eliminates the disadvantages of the former
fixed-size FSM.

It is important to keep the map small so that it can be searched rapidly.
Therefore, we don't attempt to record the exact free space on a page.
We allocate one map byte to each page, allowing us to record free space
at a granularity of 1/256th of a page.  Another way to say it is that
the stored value is the free space divided by BLCKSZ/256 (rounding down).
We assume that the free space must always be less than BLCKSZ, since
all pages have some overhead; so the maximum map value is 255.

To assist in fast searching, the map isn't simply an array of per-page
entries, but has a tree structure above those entries.  There is a tree
structure of pages, and a tree structure within each page, as described
below.

FSM page structure
--

Within each FSM page, we use a binary tree structure where leaf nodes store
the amount of free space on heap pages (or lower level FSM pages, see
"Higher-level structure" below), with one leaf node per heap page. A non-leaf
node stores the max amount of free space on any of its children.

For example:

4
 4 2
3 4   0 2<- This level represents heap pages

We need two basic operations: search and update.

To search for a page with X amount of free space, traverse down the tree
along a path where n >= X, until you hit the bottom. If both children of a
node satisfy the condition, you can pick either one arbitrarily.

To update the amount of free space on a page to X, first update the leaf node
corresponding to the heap page, then "bubble up" the change to upper nodes,
by walking up to each parent and recomputing its value as the max of its
two children.  Repeat until reaching the root or a parent whose value
doesn't change.

This data structure has a couple of nice properties:
- to discover that there is no page with X bytes of free space, you only
  need to look at the root node
- by varying which child to traverse to in the search algorithm, when you have
  a choice, we can implement various strategies, like preferring pages closer
  to a given page, or spreading the load across the table.

Higher-level routines that use FSM pages access them through the set_avail()
and search_avail() functions. The interface to those functions hides the
page's internal tree structure, treating the FSM page as a black box that has
a certain number of "slots" for storing free space information.  (However,
the higher routines have to be aware of the tree structure of the whole map.)

The binary tree is stored on each FSM page as an array. Because the page
header takes some space on a page, the binary tree isn't perfect. That is,
a few right-most leaf nodes are missing, and there are some useless non-leaf
nodes at the right. So the tree looks something like this:

   0
   1   2
 3   4   5   6
7 8 9 A B

where the numbers denote each node's position in the array.  Note that the
tree is guaranteed complete above the leaf level; only some leaf nodes are
missing.  This is reflected in the number of usable "slots" per page not
being an exact power of 2.

A FSM page also has a "next" pointer that determines where to start the next
search for free space within that page.  The reason for that is to spread out
the pages that are returned by FSM searches.  When several backends are
concurrently inserting into a relation, contention can be avoided by having
them insert int

Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Steve Crawford

Tom Lane wrote:


Yeah.  What this is about is how long the *community* supports 7.4...
  
Perhaps the discussion should be more global (and ultimately save time 
on having this discussion again in the future). Decide on the policy, 
make official and make it obvious. The time I usually hear tossed around 
is 5 years. This is the same support period that Ubuntu uses for the 
long-term-support releases of their server version - the longest support 
period they offer. As a user, 5 years seems a reasonable support period 
for a core infrastructure component.


Whatever time-period is chosen, I would make it obvious in a variety of 
places:


The versioning policy (add something like "Major releases are supported 
through minor-release updates for a period of five years following 
initial release." to http://www.postgresql.org/support/versioning).


The FAQ (add an end-of-life FAQ): 
http://www.postgresql.org/docs/faqs.FAQ.html


All release notes: I.e. for 7.4: "Release date: 2003-11-17  End-of-life 
date: 2008-11-17", for 7.4.21: "Release date: 2008-06-12 End-of-life 
date 2008-11-17"


Perhaps even as a comment at the start of the "installation" sections of 
the manual: "It is recommended to use the most recent release... Major 
releases are supported for..."


Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> ... but we still haven't actually 
> established that the WAL-logging is causing the performance degradation 
> Zdenek observed.

Yeah, that's a good point.  I did some simple performance testing on
bulk inserts and updates, and found that while they indeed tended to be
WALInsertLock heavy, the FSM traffic seemed to be only a small part of
it.  Here are some xlog record type counts from a bulk update test:

 686555 XLogInsert: rm 10 info 20   HEAP_UPDATE
  89117 XLogInsert: rm 10 info 29   HEAP_UPDATE + bkp blk + removable
  24526 XLogInsert: rm 10 info 25   HEAP_UPDATE + bkp blk + removable
   3199 XLogInsert: rm 10 info 2d   HEAP_UPDATE + 2 bkp blks + removable
  27676 XLogInsert: rm 7 info 00FSM_SET_AVAIL
 35 XLogInsert: rm 7 info 09SET_AVAIL + bkp blk + removable

So either by record count or by volume, the FSM traffic doesn't seem to
be much.  I wonder whether Zdenek knows what the xlog traffic is like
for his test in an unpatched database ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Heikki Linnakangas

Tom Lane wrote:

Zdenek Kotala <[EMAIL PROTECTED]> writes:
1) remove WAL logging. I think that FSM record should be recovered 
during processing of others WAL records (like insert, update). 


Why are we WAL-logging FSM operations at all?  It's only a hint.


- to ensure self-consistency of the tree, so that if an upper-level page 
says there's no free space on pages in range X-Z, there really isn't

- to avoid in-page corruption from torn pages
- to have the FSM useful immediately after recovery (warm standby, mainly)


I think we could give serious consideration to not WAL-logging FSM,
with maybe a tweak here or there to improve the odds of self-repair.


Yeah, I'm starting to lean towards that option too.

Hmm, we could have a vacuum pass over the FSM, fixing any corruption 
from the torn-page problem, as well as updating the upper-level pages. 
That leaves us just the problem of propagating the FSM information to 
the standby, and that we could handle by updating the FSM in the redo 
functions of heap and indexes.


That sounds like a good idea anyway, but we still haven't actually 
established that the WAL-logging is causing the performance degradation 
Zdenek observed.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Zdenek Kotala wrote:
My conclusion is that new implementation is about 8% slower in OLTP 
workload.



Thanks. That's very disappointing :-(


One thing that jumped out at me is that you call FreeSpaceMapExtendRel
every time a rel is extended by even one block.  I admit I've not
studied the data structure in any detail yet, but surely most such calls
end up being a no-op?  Seems like some attention to making a fast path
for that case would be helpful.


Yes, most of those calls end up being no-op. Which is exactly why I 
would be surprised if those made any difference. It does call 
smgrnblocks(), though, which isn't completely free...


It is not a problem. It is really strange. I'm using DTrace to count number of 
calls and number of calls is really small (I monitor only one backend). I have 
removed WAL logging and it does not help too.


Zdenek, can you say off the top of your head whether the test was I/O 
bound or CPU bound? What was the CPU utilization % during the test?


CPU is not problem it is mostly in idle time.

-bash-3.00# iostat 5
   ttysd1   ssd0  ssd1  nfs1   cpu
 tin tout kps tps serv  kps tps serv  kps tps serv  kps tps serv   us sy wt id
   01   0   019   1   920   000   000  0  0 100
   0   47   0   00  894 11170   000   002  1  0 97
   0   16   0   00  949 11860   000   002  2  0 97
   0   16   0   00  965 12060   000   002  1  0 97
   0   16   0   00  981 12270   000   002  2  0 96
   0   16   0   00  944 11860   000   002  1  0 97
   0   16   0   00  1202 14970   000   003  2  0 95
   0   16   0   00  1261 15790   000   003  2  0 95
   0   16   0   00  1357 168   140   000   003  2  0 95
   0   16   0   00  1631 201   330   000   002  2  0 96
   0   16   0   00  1973 246   480   000   002  2  0 96
   0   16   0   00  2008 251   500   000   002  2  0 97
   0   16   0   00  1956 241   450   000   002  2  0 97
   0   16   0   00  2003 250   490   000   002  2  0 97

-bash-3.00# vmstat 1
 kthr  memorypagedisk  faults  cpu
 r b w   swap  free  re  mf pi po fr de sr s1 sd sd --   in   sy   cs us sy id
 0 0 0 28091000 31640552 3 4 0  0  0  0  0  0  1  0  0  359   72  206  0  0 100
 0 0 0 27363144 27614576 3 28 0 16 16 0  0  0 60  0  0 1216 1134 1072  1  1 99
 0 0 0 27363144 27614568 8 0 0 16 16  0  0  0 52  0  0 1099 1029  964  0  1 98
 0 0 0 27363144 27614560 9 0 0  8  8  0  0  0 53  0  0 1143  896 1009  1  1 98
 0 0 0 27363144 27614544 1 241 0 16 16 0 0  0 46  0  0 1042 1105  895  0  1 98
 0 0 0 27363144 27614544 0 0 0 16 16  0  0  0 50  0  0 1078  860  924  0  0 99
 0 0 0 27363144 27614552 10 0 0 16 16 0  0  0 56  0  0 1177  914 1033  1  1 98
 0 0 0 27363144 27614536 0 0 0  8  8  0  0  0 25  0  0  726  554  603  0  0 99
 0 0 0 27363144 27614528 1 0 0 16 16  0  0  0 65  0  0 1206 1159 1081  1  1 98
 0 0 0 27363144 27614512 13 0 0 16 16 0  0  0 63  0  0 1256 1088 1094  1  1 99
 0 0 0 27363144 27614512 0 0 0  8  8  0  0  0 37  0  0  920  797  779  0  1 99
 0 0 0 27363144 27614504 6 0 0 16 16  0  0  0 58  0  0 1218 1074 1078  1  0 99
 0 0 0 27363144 27614488 85 91 0 16 16 0 0  0 45  0  0  973 1344  833  1  1 99
 0 0 0 27363144 27614488 2 0 0 16 16  0  0  0 57  0  0 1164 1023 1036  1  1 99
 0 0 0 27363144 27614472 4 0 0  8  8  0  0  0 47  0  0 1133  937  957  0  1 99


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Heikki Linnakangas

Tom Lane wrote:

I did a bit of testing and immediately got an Assert failure:

...

The scary part of that is that it gets through the regression tests ---
doesn't leave one with a warm feeling about how much of VACUUM gets
exercised by regression.


Ouch..


I take it the comment at the top of indexfsm.c about using one bit per
page should be recast as a possible future improvement?


Yep. I also noted that the code in GetIndexFreeSpace() didn't match the 
comment above it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Here's a new patch, updated per your comments.


I did a read-through of the portions of this patch that change the rest
of the system (ie, not the guts of the new FSM itself).  Mostly it looks
pretty nice, but I have a few gripes:


Thanks. It's probably not worthwhile to dig too deep into the FSM 
internals, until the performance problem is solved.



Does smgrimmedsync at the bottom of nbtsort.c need to cover FSM too?
Maybe not, since index's FSM should be empty, but in that case you
still should add a comment saying so.


Right, the FSM should be empty at that point, it's extended in btbuild 
after that. nbtsort.c isn't concerned about FSM at all. I can add a 
comment on that.



Likewise for smgrimmedsync in tablecmds.c's copy_relation_data


Well, no, copy_relation_data() copies and syncs only one fork at a time. 
Hmm, perhaps it should be renamed to reflect that, copy_fork() might be 
more appropriate.



Grepping for P_NEW suggests that you missed some places where
FreeSpaceMapExtendRel or IndexFreeSpaceMapExtend calls should be added.
In particular GiST/GIN. 


Hmm, actually I think there's a problem with this approach to extending. 
If we crash after extending the file, but before the FSM extension has 
been WAL-logged, the next time the relation is vacuumed, vacuum will try 
to mark the page that FSM doesn't know about as free, and 
RecordPageWithFreeSpace doesn't like that.


I think we'll have to chance that rule so that the  FSM is extended 
automatically when RecordPageWithFreeSpace() is called on a page that's 
not in the FSM yet. That way we also won't need to remember to extend 
the FSM whenever the relation is extended.


That's easy to do by always calling FreeSpaceMapExtendRel from 
RecordPageWithFreeSpace(), but I'm afraid of the performance implication 
of that. Perhaps we could store the previously observed size of the FSM 
in RelationData, and only try to extend it when we get a request for a 
page beyond that. I think we'll then need locking to avoid extending the 
FSM from two backends at the same time, though, I'm relying on the 
extension lock of the main relation at the moment.



The change in catalog/heap.c invalidates the comment immediately
above it.


Thanks. I'm actually a bit unhappy about creating the FSM fork there.


In vacuumlazy.c, the num_free_pages, max_free_pages, and tot_free_pages
members of LVRelStats are now useless, as is the comment above them.
You need to take out the reporting of tot_free_pages if you are not
going to track it.


I took them all out now, though it would be nice to keep tracking it.


Does smgr.c still need to include storage/freespace.h at all?
Again, I think it would be a modularity violation to have it
calling FSM, now that FSM lives on top of storage.


Hmm, seems to work fine without it.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
>>> 1) remove WAL logging. I think that FSM record should be recovered 
>>> during processing of others WAL records (like insert, update). 

Why are we WAL-logging FSM operations at all?  It's only a hint.

>> I think we'd still need to WAL log operations that decrease the amount 
>> of free space on page. Otherwise, after we have partial vacuum, we might 
>> never revisit a page, and update the FSM, even though there's usable 
>> space on the page, leaving the space forgotten forever.

Well, it'd be recovered eventually since sooner or later we'd have to
visit the page for tuple freezing purposes.  I'm not that worried about
losing space on individual pages anyway.  A more serious issue would be
if corruption of an upper-level FSM page caused a large swath of the
table to be effectively "forgotten", because the upper page had too
small a value in its entry.  But wouldn't this be more or less
self-repairing?  Assuming that the underlying table is active (if it
isn't you probably haven't got free space in it anyway) then once VACUUM
records free space on any page in the lost range, that would bubble up.

I think we could give serious consideration to not WAL-logging FSM,
with maybe a tweak here or there to improve the odds of self-repair.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> One thing that jumped out at me is that you call FreeSpaceMapExtendRel
>> every time a rel is extended by even one block.

> Yes, most of those calls end up being no-op. Which is exactly why I 
> would be surprised if those made any difference. It does call 
> smgrnblocks(), though, which isn't completely free...

No, it's a kernel call (at least one) which makes it pretty expensive.

I wonder whether it's necessary to do FreeSpaceMapExtendRel at this
point at all?  Why not lazily extend the map when you are told to store
a nonzero space category for a page that's off the end of the map?
Whether or not this saved many cycles overall, it'd push most of the map
extension work to VACUUM instead of having it happen in foreground.

A further refinement would be to extend the map only for a space
category "significantly" greater than zero --- maybe a quarter page or
so.  For an insert-only table that would probably result in the map
never growing at all, which might be nice.  However it would go back to
the concept of FSM being lossy; I forget whether you were hoping to get
away from that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where to Host Project

2008-09-18 Thread Joshua Drake
On Thu, 18 Sep 2008 11:22:14 -0700
"David E. Wheeler" <[EMAIL PROTECTED]> wrote:

>* LaunchPad 


Is backed by PostgreSQL. It is the only logical choice :). Seriously
though it is a good service.

Joshua D. Drake


-- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Where to Host Project

2008-09-18 Thread David E . Wheeler

Howdy,

Not *exactly* hackers-related, but I wanted to get a feel for this  
from those who are likely to use project hosting, and to minimize the  
chances of a flame war.


Right now I have pgTAP on pgFoundry, which is okay, though it appears  
to be largely unmaintained. PostgreSQL module projects seem to mainly  
just flounder there.


So I'm wondering, given the various discussions of PostgreSQL module  
hosting in the past, where would be a good place to put a PostgreSQL  
module project? The things I would like to have are:


  * SVN or git hosting (I've not used git, but would try it)
  * Ability to hand out commit bits to other folks
  * A project home page and/or wiki
  * Good search results rankings in Google et al.
  * Mail lists
  * Bug tracking
  * Release management

Overall, it should be easy to find my project, and easy to download it  
and build it for PostgreSQL. I've had the following suggestions for  
places to try, in addition to pgFoundry:


  * github
  * Google Code
  * LaunchPad
  * WebFaction

I've not used any of these. So my question is, what do you prefer for  
third-party PostgreSQL modules. Where is it that the the PostgreSQL  
community is likely to aggregate with its modules?


Thanks!

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Gregory Stark
"Greg Sabino Mullane" <[EMAIL PROTECTED]> writes:

> From a business perspective, there has been no reason to go through the
> pain and downtime of an upgrade, as long as the PG project is releasing
> point revisions to the 7.4 branch. As I said, I'm all for getting people
> off 7.4, but it needs to be done with a definite date, and December is
> way too soon.

I don't understand this, as soon as we released 8.0 you could take that as
advance warning that 7.4 was going to be desupported someday. So in that sense
they've had four years warning that this time was coming. The fact that the
date wasn't set in stone doesn't change their decision-making process.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Zdenek Kotala wrote:
My conclusion is that new implementation is about 8% slower in OLTP 
workload.



Thanks. That's very disappointing :-(


One thing that jumped out at me is that you call FreeSpaceMapExtendRel
every time a rel is extended by even one block.  I admit I've not
studied the data structure in any detail yet, but surely most such calls
end up being a no-op?  Seems like some attention to making a fast path
for that case would be helpful.


Yes, most of those calls end up being no-op. Which is exactly why I 
would be surprised if those made any difference. It does call 
smgrnblocks(), though, which isn't completely free...


Zdenek, can you say off the top of your head whether the test was I/O 
bound or CPU bound? What was the CPU utilization % during the test?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> No, FANOUT^4 doesn't fit in int, good catch. Actually, FANOUTPOWERS 
> table doesn't need to go that far, so that's just a leftover. It only 
> needs to have DEPTH elements. However, we have the same problem if 
> DEPTH==3, FANOUT^4 will not fit into int. I put a comment there. 
> Ideally, the 4th element would be #iffed out, but I couldn't immediately 
> figure out how to do that.

This is a "must fix" IMHO --- I don't plan to tolerate a scary compiler
warning ...

BTW, the comment about and computation of DEPTH are wrong anyway.
We support up to 2^32-1 pages, so I think the cutoff should be 1626.

I did a bit of testing and immediately got an Assert failure:

regression=# create table foo as select x from generate_series(1,10) x;
SELECT
regression=# create index fooi on foo(x);
CREATE INDEX
regression=# delete from foo;
DELETE 10
regression=# vacuum foo;
VACUUM
regression=# vacuum foo;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

The reason is that the Assert in FSM_CATEGORY_AVAIL is failing:
TRAP: FailedAssertion("!(x < 8192)", File: "freespace.c", Line: 46)
LOG:  server process (PID 17691) was terminated by signal 6: Aborted

because RecordFreeIndexPage passes in BLCKSZ which is an illegal
value.  Maybe use BLCKSZ-1 instead?

The scary part of that is that it gets through the regression tests ---
doesn't leave one with a warm feeling about how much of VACUUM gets
exercised by regression.

I take it the comment at the top of indexfsm.c about using one bit per
page should be recast as a possible future improvement?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [ADMIN] Regaining superuser access

2008-09-18 Thread Alvaro Herrera
Bernt Drange escribió:

> After a lot of fiddling with being able to enter single user mode on a
> windows machine (I had to figure out how to run the command line as
> the correct user, then for some reason -D didn't work, but SET
> PGDATA=xxx worked), I finally managed to fix my problem.

Hmm, the -D thing not working should probably be studied -- perhaps
we're missing escaping something somewhere.  Does the PGDATA path
contain spaces or weird chars?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Tom Lane
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Greg Sabino Mullane wrote:
>> - From a business perspective, there has been no reason to go through the
>> pain and downtime of an upgrade, as long as the PG project is releasing
>> point revisions to the 7.4 branch. As I said, I'm all for getting people
>> off 7.4, but it needs to be done with a definite date, and December is
>> way too soon.

> Specifying an EOL date does not stop people from continuing to run 7.4. 
> They can set their own time lines to get off the release. They can also 
> pay someone to update the back branch if it is that important to them.

Yeah.  What this is about is how long the *community* supports 7.4
(for free, and at the cost of time that might be better spent on
development).

Looking at the CVS logs makes it pretty obvious that we've been
effectively partially desupporting 7.4, and even 8.0 and 8.1 to some
extent, for awhile now.  There have been numerous patches that were not
carried all the way back because they would have needed major revisions
for the older branches --- not because the problem didn't exist in some
form or other back then.  The community hasn't got the interest or
resources to create such patches, much less to QA them to the level
where a person too conservative to move off an old branch would think
the patches were safe.

It's really past time to make it clear to all concerned that if they
want continued bug fixes for 7.4, they'd better start paying somebody
to do it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread David E. Wheeler

On Sep 18, 2008, at 07:38, Tom Lane wrote:


I wasn't intending to start a discussion about how/when to EOL 7.4,
but since the thread has gone in that direction: my vote would be to
announce now (say, with the announcement of this set of releases) that
7.4 will be EOL'd with our first set of updates in 2009.  That would
probably be the next update after this one, maybe two updates away
if we find any really serious bugs in the next month or two.


+1

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Devrim GÃœNDÃœZ
Hi,

On Thu, 2008-09-18 at 09:20 -0400, Tom Lane wrote:
> So I'm thinking that generating a 7.4.x tarball now would be mostly a
> waste of server space, and we should leave these changes for the next
> update cycle.

How much server space or CPU cycles are we talking about? I bet it is
less than the bytes we spent during this discussion. Only I will build
binaries for 7.4, and you'll push an update for RHEL.

My vote is announcing all releases together, including 7.4 .
-- 
Devrim GÃœNDÃœZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
   http://www.gunduz.org


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Zdenek Kotala wrote:
>> My conclusion is that new implementation is about 8% slower in OLTP 
>> workload.

> Thanks. That's very disappointing :-(

One thing that jumped out at me is that you call FreeSpaceMapExtendRel
every time a rel is extended by even one block.  I admit I've not
studied the data structure in any detail yet, but surely most such calls
end up being a no-op?  Seems like some attention to making a fast path
for that case would be helpful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Joshua D. Drake

Greg Sabino Mullane wrote:


- From a business perspective, there has been no reason to go through the
pain and downtime of an upgrade, as long as the PG project is releasing
point revisions to the 7.4 branch. As I said, I'm all for getting people
off 7.4, but it needs to be done with a definite date, and December is
way too soon.


Specifying an EOL date does not stop people from continuing to run 7.4. 
They can set their own time lines to get off the release. They can also 
pay someone to update the back branch if it is that important to them.


December 31st is plenty of time. This isn't closed source where with the 
big O says no more updates, you are completely hosed.


Sincerely,

Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Don't temp tables use this kind of temp file?  I admit I didn't check; I
> just assumed they did.

No, temp tables go through localbuf.c, which sits atop regular smgr.
I don't think fd.c knows any difference from regular tables.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


>> Frankly, the whole pg_dump mess is what keeps many people on older versions,
>> somtimes including 7.4.

> Sure but that was fixed, four years ago. At some point you recognize
> laziness and ineptness over caution and responsibility.

I think you misunderstand my "mess". I'm referring to the fact that the only
way to upgrade between major versions is with pg_dump and reload, which
really, really sucks for large databases.

- From a business perspective, there has been no reason to go through the
pain and downtime of an upgrade, as long as the PG project is releasing
point revisions to the 7.4 branch. As I said, I'm all for getting people
off 7.4, but it needs to be done with a definite date, and December is
way too soon.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200809181205
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkjSfRYACgkQvJuQZxSWSsgcAgCeJi5t23JhHIOBHDRqXMYneJaW
pKoAoPflQaE6G6HR4H0OAsCC1BWiMt9g
=Tz0+
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Here's a new patch, updated per your comments.

I did a read-through of the portions of this patch that change the rest
of the system (ie, not the guts of the new FSM itself).  Mostly it looks
pretty nice, but I have a few gripes:


Does smgrimmedsync at the bottom of nbtsort.c need to cover FSM too?
Maybe not, since index's FSM should be empty, but in that case you
still should add a comment saying so.

Likewise for smgrimmedsync in tablecmds.c's copy_relation_data

Grepping for P_NEW suggests that you missed some places where
FreeSpaceMapExtendRel or IndexFreeSpaceMapExtend calls should be added.
In particular GiST/GIN.  (I assume hash indexes still don't use FSM.
I wonder whether it'd be a good idea to get rid of hash bitmap pages
in favor of using FSM?  TODO item, not material for this patch.)

The change in catalog/heap.c invalidates the comment immediately
above it.

In vacuum.c's vac_update_fsm, the outPages counter is now useless.

In vacuumlazy.c, the num_free_pages, max_free_pages, and tot_free_pages
members of LVRelStats are now useless, as is the comment above them.
You need to take out the reporting of tot_free_pages if you are not
going to track it.

I think it's a modularity violation for bufmgr.c to be calling FSM.
Furthermore, it's pretty useless to have RelationTruncate doing
the fixup only for heaps and not indexes.  Please move that out
to the callers instead.

Does smgr.c still need to include storage/freespace.h at all?
Again, I think it would be a modularity violation to have it
calling FSM, now that FSM lives on top of storage.

RESOURCES_FSM needs to be removed from utils/guc_tables.h

The NOTE in the enum ForkNumber declaration was wrong before and
still is.

GetFreeSpaceOnPage() seems a bit misleadingly named; it's not obvious
from the name that it's not giving you the *true* free space on the page
but rather what FSM thinks it is.  Maybe call it something like
GetRecordedFreeSpace().  Also, please do not use the declaration style
that omits parameter names; I think those are important for
documentation/readability.

Doc updates are missing, but you knew that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Andrew Sullivan
On Thu, Sep 18, 2008 at 03:25:10PM -, Greg Sabino Mullane wrote:
> Frankly, the whole pg_dump mess is what keeps many people on older versions,
> somtimes including 7.4.

This isn't my experience.  The reasons people stay on older releases
are manifold.

A
-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Joshua D. Drake

Greg Sabino Mullane wrote:


No, but if we are going to stop releasing revisions with critical bugfixes,
it is important that people know well in advance and can plan a migration
to a supported version.

Frankly, the whole pg_dump mess is what keeps many people on older versions,
somtimes including 7.4.


Sure but that was fixed, four years ago. At some point you recognize 
laziness and ineptness over caution and responsibility.


Sincerely,


Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


> The handwriting has been on the wall for 7.4 ever since we agreed that
> 7.3 would be EOL'd at five years...

"Handwriting on the wall" is entirely unrelated to an offical,
published end of life date.

> It's not like people have to stop using it the moment we do our
> last release.

No, but if we are going to stop releasing revisions with critical bugfixes,
it is important that people know well in advance and can plan a migration
to a supported version.

Frankly, the whole pg_dump mess is what keeps many people on older versions,
somtimes including 7.4.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200809181123
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkjScrkACgkQvJuQZxSWSshnKACg1+QxldUo8fHX/ULeDkWwclCh
SpkAoKkN6tOclSqWl3YIGTpfDRMoINNh
=Anux
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Alvaro Herrera
Tom Lane wrote:
> Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> [ blink... ]  Doesn't look like that should happen.  What is your
> >> test case?

This was simply a CREATE TEMP TABLE ... ON COMMIT DROP.  The file stays
in place until checkpoint (either a manually invoked one, or a
shutdown's)

Don't temp tables use this kind of temp file?  I admit I didn't check; I
just assumed they did.

> > Hmph, must be because of the patch from last winter to prevent 
> > relfilenode reuse until next checkpoint.
> 
> Ah.  I had misunderstood Alvaro to say that temp files (the kind under
> discussion up to now) were not unlinked immediately; which would be
> pretty strange given that fd.c is underneath md.c.

The test case where I actually verify that fd.c was used was a WITH
SCROLL cursor.  The file does go away like I expected in that case, as
soon as the cursor is destroyed (or as soon as the backend is closed).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subtransaction commits and Hot Standby

2008-09-18 Thread Simon Riggs

On Tue, 2008-09-16 at 10:11 -0400, Alvaro Herrera wrote:

> I wonder if the improved clog API required to mark multiple
> transactions as committed at once would be also useful to
> TransactionIdCommitTree which is used in regular transaction commit.

I've hacked together this concept patch (WIP).

Not fully tested yet, but it gives a flavour of the API rearrangements
required for atomic clog updates. It passes make check, but that's not
saying enough for a serious review yet. I expect to pick this up again
next week.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
Index: src/backend/access/transam/clog.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/clog.c,v
retrieving revision 1.47
diff -c -r1.47 clog.c
*** src/backend/access/transam/clog.c	1 Aug 2008 13:16:08 -	1.47
--- src/backend/access/transam/clog.c	17 Sep 2008 20:06:15 -
***
*** 80,89 
  static bool CLOGPagePrecedes(int page1, int page2);
  static void WriteZeroPageXlogRec(int pageno);
  static void WriteTruncateXlogRec(int pageno);
! 
! 
! /*
!  * Record the final state of a transaction in the commit log.
   *
   * lsn must be the WAL location of the commit record when recording an async
   * commit.	For a synchronous commit it can be InvalidXLogRecPtr, since the
--- 80,105 
  static bool CLOGPagePrecedes(int page1, int page2);
  static void WriteZeroPageXlogRec(int pageno);
  static void WriteTruncateXlogRec(int pageno);
! static void TransactionIdSetPageStatus(TransactionId xid, int nsubxids, 
! 	TransactionId *subxids, XidStatus status, XLogRecPtr lsn, int pageno, bool subcommitted);
! static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status, 
! 	XLogRecPtr lsn, int slotno);
! 
! /*
!  * TransactionIdSetTreeStatus
!  *
!  * Record the final state of transaction entries in the commit log for
!  * a transaction and its subtransaction tree. Take care to ensure this is
!  * both atomic and efficient. Prior to 8.4, this capability was provided
!  * by the non-atomic TransactionIdSetStatus, which is replaced by this
!  * new atomic version.
!  *
!  * xid is a single xid to set status for. This will typically be
!  * the top level transactionid for a top level commit or abort. It can
!  * also be a subtransaction when we record transaction aborts.
!  *
!  * subxids is an array of xids of length nsubxids, representing subtransactions
!  * in the tree of xid. In various cases nsubxids may be zero.
   *
   * lsn must be the WAL location of the commit record when recording an async
   * commit.	For a synchronous commit it can be InvalidXLogRecPtr, since the
***
*** 91,107 
   * should be InvalidXLogRecPtr for abort cases, too.
   *
   * NB: this is a low-level routine and is NOT the preferred entry point
!  * for most uses; TransactionLogUpdate() in transam.c is the intended caller.
   */
  void
! TransactionIdSetStatus(TransactionId xid, XidStatus status, XLogRecPtr lsn)
  {
- 	int			pageno = TransactionIdToPage(xid);
- 	int			byteno = TransactionIdToByte(xid);
- 	int			bshift = TransactionIdToBIndex(xid) * CLOG_BITS_PER_XACT;
  	int			slotno;
! 	char	   *byteptr;
! 	char		byteval;
  
  	Assert(status == TRANSACTION_STATUS_COMMITTED ||
  		   status == TRANSACTION_STATUS_ABORTED ||
--- 107,221 
   * should be InvalidXLogRecPtr for abort cases, too.
   *
   * NB: this is a low-level routine and is NOT the preferred entry point
!  * for most uses; functions in in transam.c are the intended callers.
!  *
!  * Note that no lock requests are made at this level, only lower functions.
!  *
!  * XXX Think about issuing FADVISE_WILLNEED on pages that we will need,
!  * but aren't yet in cache, as well as hinting pages not to fall out of
!  * cache yet.
   */
  void
! TransactionIdSetTreeStatus(TransactionId xid, int nsubxids, 
! TransactionId *subxids, XidStatus status, XLogRecPtr lsn)
! {
! 	int		pageno = TransactionIdToPage(xid); /* get page of parent */
! 	int 	i;
! 	bool	subcommitted = false;
! 
! 	/*
! 	 * See how many subxids, if any, are on the same page as the parent, if any.
! 	 */
! 	for (i = 0; i < nsubxids; i++)
! 	{
! 		if (TransactionIdToPage(subxids[i]) != pageno)
! 			break;
! 	}
! 
! 	/*
! 	 * Do all items fit on a single page?
! 	 */
! 	if (i == nsubxids)
! 	{
! 		/*
! 		 * Set the parent and any subtransactions on same page as it
! 		 */
! 		TransactionIdSetPageStatus(xid, nsubxids, subxids, status, lsn, 
! 		pageno, subcommitted);
! 	}
! 	else
! 	{
! 		int		start_subxid = 0;
! 		int		num_on_page = 1;
! 
! 		if (status == TRANSACTION_STATUS_COMMITTED)
! 		{
! 			/* 
! 			 * If this is a commit then we care about doing this atomically.
! 			 * By here, we know we're updating more than one page of clog,
! 			 * so we must mark entries that are not on the first page so
! 			 * that they show as subcommitted before we then return to 
! 			

Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Magnus Hagander
Tom Lane wrote:
> "Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> Greg Sabino Mullane wrote:
>>> I'm all for killing 7.4, but that's a rather short time frame, especially as
>>> this is a busy time of year for many businesses. How about we make it 
>>> further
>>> in the future (perhaps 2009-07-01, six months into the next year), and 
>>> announce
>>> the change far and wide?
> 
>> I believe a single quarter gap is plenty in consideration of the age. It 
>> isn't like 7.4 isn't already really old. If this was 8.0 I would agree.
> 
> The handwriting has been on the wall for 7.4 ever since we agreed that
> 7.3 would be EOL'd at five years...
> 
> I wasn't intending to start a discussion about how/when to EOL 7.4,
> but since the thread has gone in that direction: my vote would be to
> announce now (say, with the announcement of this set of releases) that
> 7.4 will be EOL'd with our first set of updates in 2009.  That would
> probably be the next update after this one, maybe two updates away
> if we find any really serious bugs in the next month or two.

While I agree with the principle, I think the people who care about such
things would need a date, and not just "whenever we do an update". This
could be Jan 2nd or Dec 29th for all they know.

I think it's better to set a date, and then we can push out a final 7.4
release on that day if there are any accumulated changes.

//Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP patch: Collation support

2008-09-18 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:


Well, I proposed disallowing using a different collation than the source
database, except for using template0 as the source. That's pretty limited, but
is trivial to implement and still let's you have databases with different
collations in the same cluster.


+   if (strcmp(dbtemplate, "template0") != 0 &&
+   (strcmp(lc_collate, src_collation) || strcmp(lc_ctype, 
src_ctype)))
+   ereport(NOTICE,
+   (errmsg("database \"%s\" needs to be reindexed 
manually (REINDEX DATABASE)",
+   dbname)));  
+ 


This isn't what you described but I think I prefer it this way as just a
warning not an error.


Well, I'd prefer to make it an error, but I'm willing to listen if 
others feel otherwise. I don't think the inconvenience of having to use 
template0 is that big, compared to the potential of strange behavior 
people would run into if they ignore the advice to reindex.


One weakness with a straight strcmp comparison is that it won't 
recognize aliases of the same locale. For example, "fi_FI.UTF8" and 
"fi_FI.UTF-8".



AFAIK we can't easily connect to the new database and do some fiddling with
it, can we? If we could we could check if there are any non-empty indexes
which depend on the collation and only print the warning if we find any (and
even mark them invalid).


I don't see that happening, unfortunately..

Attached is an updated version of the stripped-down patch. I've cleaned 
it up a bit, and added more sanity checks. Documentation is still 
missing and I haven't test it much.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** doc/src/sgml/ref/create_database.sgml
--- doc/src/sgml/ref/create_database.sgml
***
*** 24,29  CREATE DATABASE name
--- 24,31 
  [ [ WITH ] [ OWNER [=] dbowner ]
 [ TEMPLATE [=] template ]
 [ ENCODING [=] encoding ]
+[ COLLATE [=] collation ]
+[ CTYPE [=] ctype ]
 [ TABLESPACE [=] tablespace ]
 [ CONNECTION LIMIT [=] connlimit ] ]
  
***
*** 113,118  CREATE DATABASE name
--- 115,136 

   
   
+   collation
+   
+
+ LC_COLLATE setting to use in the new database.  XXX
+
+   
+  
+  
+   ctype
+   
+
+ LC_CTYPE setting to use in the new database.  XXX
+
+   
+  
+  
tablespace

 
*** src/backend/access/transam/xlog.c
--- src/backend/access/transam/xlog.c
***
*** 3847,3853  WriteControlFile(void)
  {
  	int			fd;
  	char		buffer[PG_CONTROL_SIZE];		/* need not be aligned */
- 	char	   *localeptr;
  
  	/*
  	 * Initialize version and compatibility-check fields
--- 3847,3852 
***
*** 3876,3893  WriteControlFile(void)
  	ControlFile->float4ByVal = FLOAT4PASSBYVAL;
  	ControlFile->float8ByVal = FLOAT8PASSBYVAL;
  
- 	ControlFile->localeBuflen = LOCALE_NAME_BUFLEN;
- 	localeptr = setlocale(LC_COLLATE, NULL);
- 	if (!localeptr)
- 		ereport(PANIC,
- (errmsg("invalid LC_COLLATE setting")));
- 	StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN);
- 	localeptr = setlocale(LC_CTYPE, NULL);
- 	if (!localeptr)
- 		ereport(PANIC,
- (errmsg("invalid LC_CTYPE setting")));
- 	StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN);
- 
  	/* Contents are protected with a CRC */
  	INIT_CRC32(ControlFile->crc);
  	COMP_CRC32(ControlFile->crc,
--- 3875,3880 
***
*** 4126,4159  ReadControlFile(void)
  		   " but the server was compiled without USE_FLOAT8_BYVAL."),
   errhint("It looks like you need to recompile or initdb.")));
  #endif
- 
- 	if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN)
- 		ereport(FATAL,
- (errmsg("database files are incompatible with server"),
-  errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d,"
-   " but the server was compiled with LOCALE_NAME_BUFLEN %d.",
- 		   ControlFile->localeBuflen, LOCALE_NAME_BUFLEN),
-  errhint("It looks like you need to recompile or initdb.")));
- 	if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL)
- 		ereport(FATAL,
- 			(errmsg("database files are incompatible with operating system"),
- 			 errdetail("The database cluster was initialized with LC_COLLATE \"%s\","
- 	   " which is not recognized by setlocale().",
- 	   ControlFile->lc_collate),
- 			 errhint("It looks like you need to initdb or install locale support.")));
- 	if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
- 		ereport(FATAL,
- 			(errmsg("database files are incompatible with operating system"),
- 		errdetail("The database cluster was initialized with LC_CTYPE \"%s\","
-   " which is not recognized by setlocale().",
-   ControlFile->lc_ctype),
- 			 errhint("It 

Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Tom Lane
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
> Greg Sabino Mullane wrote:
>> I'm all for killing 7.4, but that's a rather short time frame, especially as
>> this is a busy time of year for many businesses. How about we make it further
>> in the future (perhaps 2009-07-01, six months into the next year), and 
>> announce
>> the change far and wide?

> I believe a single quarter gap is plenty in consideration of the age. It 
> isn't like 7.4 isn't already really old. If this was 8.0 I would agree.

The handwriting has been on the wall for 7.4 ever since we agreed that
7.3 would be EOL'd at five years...

I wasn't intending to start a discussion about how/when to EOL 7.4,
but since the thread has gone in that direction: my vote would be to
announce now (say, with the announcement of this set of releases) that
7.4 will be EOL'd with our first set of updates in 2009.  That would
probably be the next update after this one, maybe two updates away
if we find any really serious bugs in the next month or two.

It's not like people have to stop using it the moment we do our
last release.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 10:09 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> >> Do we really need a checkpoint there at all?
> 
> > "Timelines only change at shutdown checkpoints".
> 
> Hmm.  I *think* that that is just a debugging crosscheck rather than a
> critical property.  But yeah, it would take some close investigation,
> which maybe isn't warranted if you have a less-invasive solution.

The important thing is that everybody uses the new timelineid.

AFAICS we actually write new timelineids into the WAL file if the
previous timelineid, so it can't be that risky.

Ignore v5 then and look for v6 on Monday. Hopefully the final one :-)

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:57 -0400, Tom Lane wrote:
> I wrote:
> > Simon Riggs <[EMAIL PROTECTED]> writes:
> >> In some cases, but my wish is also to minimise WAL volume as much as
> >> possible.
> 
> > I'm with Greg on this one: baroque bit-squeezing schemes are a bad idea.
> 
> Wait a minute ... why are we even having this conversation?  XLogRecord
> has at least two entirely-wasted bytes right now, due to alignment.
> It is entirely not sane to consider messing with xl_prev in preference
> to using that space.

OK, two bytes it is then. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
>> Do we really need a checkpoint there at all?

> "Timelines only change at shutdown checkpoints".

Hmm.  I *think* that that is just a debugging crosscheck rather than a
critical property.  But yeah, it would take some close investigation,
which maybe isn't warranted if you have a less-invasive solution.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 08:38 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Thu, 2008-09-18 at 12:40 +0100, Greg Stark wrote:
> >> Why bit just add a new bitfield for flags if we need them? I'm usually  
> >> the one worried about data density so perhaps I should be on the other  
> >> side of the fence here but I'm not sure. The conventional wisdom is  
> >> that wal bandwidth is not a major issue.
> 
> > In some cases, but my wish is also to minimise WAL volume as much as
> > possible.
> 
> I'm with Greg on this one: baroque bit-squeezing schemes are a bad idea.

I'm easy, just trying to save bytes.

> You still haven't answered the question of what you need four more bits
> for (and why four more is all that anyone will ever need --- unless you
> can prove that, we might as well just add another flag field).

Just trying to avoid having same info on multiple threads.

I would like to use 2 flag bits for Hot Standby:

* flag1 to indicate the first WAL record for an xid. This allows Hot
Standby to efficiently manage a recovery snapshot.
* flag2 to indicate the first WAL record for a subxid. This will
indicate that there is another 4 byte xid at end of struc and before
backup blocks that holds parentxid for this subxid. That allows us to
maintain subtrans during Hot Standby.

If we do this the suggested way it will add X bytes for flag bits (2?)
to every WAL record, plus 4 bytes to every initial WAL record in a
subtransaction. 

There will also be occasional WAL records to avoid race conditions, as
described in the thread on transaction snapshots and hot standby, though
those are rare. There are also some other WAL records that may need to
be written in other less common paths also, which will be individually
justified when I come to them.


The two bits I need will be set zero in xl_prev until we have written
4x10^18 bytes of WAL, or 4000 Petabytes. So we're not really ever like
to witness them set in our lifetimes for diagnostic purposes.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Joshua D. Drake

Greg Sabino Mullane wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160



Comments?



IMO, we release 7.4.22 with the rest and it is also announced that as of
12-31-08 7.4.x is no more.


I'm all for killing 7.4, but that's a rather short time frame, especially as
this is a busy time of year for many businesses. How about we make it further
in the future (perhaps 2009-07-01, six months into the next year), and announce
the change far and wide?


I believe a single quarter gap is plenty in consideration of the age. It 
isn't like 7.4 isn't already really old. If this was 8.0 I would agree.


Joshua D. Drake

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Tom Lane
I wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
>> In some cases, but my wish is also to minimise WAL volume as much as
>> possible.

> I'm with Greg on this one: baroque bit-squeezing schemes are a bad idea.

Wait a minute ... why are we even having this conversation?  XLogRecord
has at least two entirely-wasted bytes right now, due to alignment.
It is entirely not sane to consider messing with xl_prev in preference
to using that space.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:06 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Having some trouble trying to get a clean state change from recovery to
> > normal mode.
> 
> > Startup needs to be able to write WAL at the end of recovery so it can
> > write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> > that. Other services are still not fully up yet. So every other process
> > apart from Startup musn't write WAL until Startup has fully completed
> > its actions, which is sometime later.
> > ...
> > We *might* solve this by making the final checkpoint that Startup writes
> > into an online checkpoint.
> 
> Do we really need a checkpoint there at all?  I can't recall right now
> if there was a good reason why Vadim made it do that.  I have a feeling
> that it might have had to do with an assumption that the recovery
> environment was completely distinct from live environment; which is a
> statement that's certainly not going to be true in a query-answering
> slave.

Hmm, a question I hadn't considered.

We definitely don't want to do PreallocXlogFiles().

"Timelines only change at shutdown checkpoints". But its just as easy to
write a short timeline change record rather than the checkpoint itself,
so we can sort that out.

It should be sufficient to request bgwriter to perform an immediate
checkpoint, rather than have startup process perform it. That way
startup can exit and we can change to normal processing faster.

If we crash before the next checkpoint then we would revert to archive
recovery or crash recovery. The last restartpoint might well be
somewhere else. In streaming mode we would presumably have that data
locally, so not really a problem. But we musn't mark control file in
production yet, but we can do so following the next checkpoint.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: RIPEMD160


>> Comments?

> IMO, we release 7.4.22 with the rest and it is also announced that as of
> 12-31-08 7.4.x is no more.

I'm all for killing 7.4, but that's a rather short time frame, especially as
this is a busy time of year for many businesses. How about we make it further
in the future (perhaps 2009-07-01, six months into the next year), and announce
the change far and wide?

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200809180939
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iEYEAREDAAYFAkjSWjAACgkQvJuQZxSWSsg4OACggkx2PNYoWxC2Cmxx6wUy0X2b
BjsAn3wMtU6nEklw5exl7DM0wsTNF6Rp
=2Zak
-END PGP SIGNATURE-



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Dave Page
On Thu, Sep 18, 2008 at 2:20 PM, Tom Lane <[EMAIL PROTECTED]> wrote:

> One thing that ties into this is whether there ever will *be* another
> 7.4.x release.  We haven't formally discussed an EOL date for 7.4,
> but its fifth birthday will be 2008-11-17.  I imagine we'd want to make
> its final update release be after that date.  If we do a release now,
> the final update might be even skimpier than this one.

I'd be in favour of EOL'ing it after a single final release late
November/December (or whenever we do 8.3.5).


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Joshua D. Drake

Tom Lane wrote:

I went through the CVS logs to draft release notes, and found that the
list of patches applied to REL7_4_STABLE is a bit skimpy:
http://developer.postgresql.org/pgdocs/postgres/release-7-4-22.html



One thing that ties into this is whether there ever will *be* another
7.4.x release.  We haven't formally discussed an EOL date for 7.4,
but its fifth birthday will be 2008-11-17.  I imagine we'd want to make
its final update release be after that date.  If we do a release now,
the final update might be even skimpier than this one.

Comments?


IMO, we release 7.4.22 with the rest and it is also announced that as of 
12-31-08 7.4.x is no more.


Joshua D. Drake



regards, tom lane




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Do we really need a 7.4.22 release now?

2008-09-18 Thread Tom Lane
I went through the CVS logs to draft release notes, and found that the
list of patches applied to REL7_4_STABLE is a bit skimpy:
http://developer.postgresql.org/pgdocs/postgres/release-7-4-22.html

I'm wondering if we should leave 7.4 out of the current set of update
releases.  If I were a DBA conservative enough to still be runnng 7.4.x,
I doubt I'd find anything there compelling enough to justify an upgrade.
So I'm thinking that generating a 7.4.x tarball now would be mostly a
waste of server space, and we should leave these changes for the next
update cycle.

The main counter-argument I can think of is that it's too confusing
to release the same fixes in different branches at different times.
We did that this spring, and it was confusing, at least when it came
time to make the release notes for the next set of updates.  But I
dunno if any users noticed particularly.

One thing that ties into this is whether there ever will *be* another
7.4.x release.  We haven't formally discussed an EOL date for 7.4,
but its fifth birthday will be 2008-11-17.  I imagine we'd want to make
its final update release be after that date.  If we do a release now,
the final update might be even skimpier than this one.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> Having some trouble trying to get a clean state change from recovery to
> normal mode.

> Startup needs to be able to write WAL at the end of recovery so it can
> write a ShutdownCheckpoint, yet must not be allowed to write WAL before
> that. Other services are still not fully up yet. So every other process
> apart from Startup musn't write WAL until Startup has fully completed
> its actions, which is sometime later.
> ...
> We *might* solve this by making the final checkpoint that Startup writes
> into an online checkpoint.

Do we really need a checkpoint there at all?  I can't recall right now
if there was a good reason why Vadim made it do that.  I have a feeling
that it might have had to do with an assumption that the recovery
environment was completely distinct from live environment; which is a
statement that's certainly not going to be true in a query-answering
slave.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:
My conclusion is that new implementation is about 8% slower in OLTP 
workload.


Can you do some analysis of why that is?


I'll try something but I do not guarantee result.

Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Or, we could store only the delta between current record and the 
> previous one. Assuming we know what the current record is, that wouldn't 
> lose any precision. That way xl_prev only needs to be as big as the 
> biggest possible WAL record we can have.

The trouble with either approach is that it discards forensic
intelligence in the name of bit squeezing.  The high bits of xl_prev are
the only direct evidence *within* a WAL record of where it thinks it is
in the WAL sequence, and the back-comparison against where we thought
the previous record was is correspondingly the only really strong
protection against a torn page problem within a WAL page, should the
sector boundary happen to fall exactly at a WAL record boundary.

I fear that a delta would be completely unacceptable for that check,
because it's entirely possible that a lot of different WAL records would
be the same size (consider bulk load into a fixed-width table for an
example).  Simon's scheme merely removes some of the protection, not all
of it ;-).

But I don't really like removing any of it.  If we need more bits in WAL
headers, then so be it --- they'd likely still be smaller than they were
a couple releases ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Thu, 2008-09-18 at 12:40 +0100, Greg Stark wrote:
>> Why bit just add a new bitfield for flags if we need them? I'm usually  
>> the one worried about data density so perhaps I should be on the other  
>> side of the fence here but I'm not sure. The conventional wisdom is  
>> that wal bandwidth is not a major issue.

> In some cases, but my wish is also to minimise WAL volume as much as
> possible.

I'm with Greg on this one: baroque bit-squeezing schemes are a bad idea.

You still haven't answered the question of what you need four more bits
for (and why four more is all that anyone will ever need --- unless you
can prove that, we might as well just add another flag field).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:05 +0100, Simon Riggs wrote:

> Feels like I should shutdown the bgwriter after recovery and then
> allow it to be cranked up again after normal processing starts, and do
> all of this through postmaster state changes. That way bgwriter
> doesn't need to do a dynamic state change.

This approach appears to be working nicely so far. Some ugly bits of
former patch removed.

Patch passes basic tests and changes state cleanly. 

Restarting test cycle on this patch now, confirm tomorrow.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support
Index: src/backend/access/transam/xlog.c
===
RCS file: /home/sriggs/pg/REPOSITORY/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.317
diff -c -r1.317 xlog.c
*** src/backend/access/transam/xlog.c	11 Aug 2008 11:05:10 -	1.317
--- src/backend/access/transam/xlog.c	18 Sep 2008 11:39:07 -
***
*** 119,124 
--- 119,126 
  
  /* Are we doing recovery from XLOG? */
  bool		InRecovery = false;
+ bool		reachedSafeStopPoint = false;
+ bool		StartupCanWriteWAL = false;
  
  /* Are we recovering using offline XLOG archives? */
  static bool InArchiveRecovery = false;
***
*** 131,137 
  static bool recoveryTarget = false;
  static bool recoveryTargetExact = false;
  static bool recoveryTargetInclusive = true;
- static bool recoveryLogRestartpoints = false;
  static TransactionId recoveryTargetXid;
  static TimestampTz recoveryTargetTime;
  static TimestampTz recoveryLastXTime = 0;
--- 133,138 
***
*** 286,295 
--- 287,298 
  /*
   * Total shared-memory state for XLOG.
   */
+ #define	XLOGCTL_BUFFER_SPACING	128
  typedef struct XLogCtlData
  {
  	/* Protected by WALInsertLock: */
  	XLogCtlInsert Insert;
+ 	char	InsertPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlInsert)];
  
  	/* Protected by info_lck: */
  	XLogwrtRqst LogwrtRqst;
***
*** 297,305 
--- 300,315 
  	uint32		ckptXidEpoch;	/* nextXID & epoch of latest checkpoint */
  	TransactionId ckptXid;
  	XLogRecPtr	asyncCommitLSN; /* LSN of newest async commit */
+ 	/* add data structure padding for above info_lck declarations */
+ 	char	InfoPadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogwrtRqst) 
+ - sizeof(XLogwrtResult)
+ - sizeof(uint32)
+ - sizeof(TransactionId)
+ - sizeof(XLogRecPtr)];
  
  	/* Protected by WALWriteLock: */
  	XLogCtlWrite Write;
+ 	char	WritePadding[XLOGCTL_BUFFER_SPACING - sizeof(XLogCtlWrite)];
  
  	/*
  	 * These values do not change after startup, although the pointed-to pages
***
*** 311,316 
--- 321,337 
  	int			XLogCacheBlck;	/* highest allocated xlog buffer index */
  	TimeLineID	ThisTimeLineID;
  
+ 	/*
+ 	 * canWriteWAL changes at the end of recovery only and is only ever set
+ 	 * by the Startup process. We assume that changes to it are atomic,
+ 	 * so accesses to it is never locked. When it does change bgwriter
+ 	 * must immediately begin using it, since this helps it decide whether
+ 	 * to flush WAL or not when it writes dirty blocks. If bgwriter does
+ 	 * it too soon, we will write invalid WAL records and if it reflects the
+ 	 * change too late it could skip flushing WAL for a data block change.
+ 	 */
+ 	bool		canWriteWAL;
+ 
  	slock_t		info_lck;		/* locks shared variables shown above */
  } XLogCtlData;
  
***
*** 480,485 
--- 501,510 
  	bool		doPageWrites;
  	bool		isLogSwitch = (rmid == RM_XLOG_ID && info == XLOG_SWITCH);
  
+ 	/* cross-check on whether we should be here or not */
+ 	if (!(canWriteWAL() || (InRecovery && StartupCanWriteWAL)))
+ 		elog(FATAL, "cannot make new WAL entries during recovery");
+ 
  	/* info's high bits are reserved for use by me */
  	if (info & XLR_INFO_MASK)
  		elog(PANIC, "invalid xlog info mask %02X", info);
***
*** 1677,1684 
  	XLogRecPtr	WriteRqstPtr;
  	XLogwrtRqst WriteRqst;
  
! 	/* Disabled during REDO */
! 	if (InRedo)
  		return;
  
  	/* Quick exit if already known flushed */
--- 1702,1709 
  	XLogRecPtr	WriteRqstPtr;
  	XLogwrtRqst WriteRqst;
  
! 	/* Disabled during StartupXLog */
! 	if (!canWriteWAL())
  		return;
  
  	/* Quick exit if already known flushed */
***
*** 1766,1774 
  	 * the bad page is encountered again during recovery then we would be
  	 * unable to restart the database at all!  (This scenario has actually
  	 * happened in the field several times with 7.1 releases. Note that we
! 	 * cannot get here while InRedo is true, but if the bad page is brought in
! 	 * and marked dirty during recovery then CreateCheckPoint will try to
! 	 * flush it at the end of recovery.)
  	 *
  	 * The current approach is to ERROR under normal conditions, but only
  	 * WARNING during recovery, so that the system can be brought up even if
--- 1791,1799 
  	 * the bad page is encountered again 

Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
Looks like we didn't make an 
exception for temporary tables. Although it's harmless, we could put an 
isTempOrToastNamespace() test in there:


Bad, bad idea to have md.c doing any catalog access.


isTempOrToastNamespace() doesn't access catalogs, it's just checking the 
argument against two global variables. I guess you could argue that it's 
a modularity violation.


>  As already noted downthread, it wouldn't buy much anyway.

Yeah. *shrug*

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 12:40 +0100, Greg Stark wrote:

> Why bit just add a new bitfield for flags if we need them? I'm usually  
> the one worried about data density so perhaps I should be on the other  
> side of the fence here but I'm not sure. The conventional wisdom is  
> that wal bandwidth is not a major issue.

In some cases, but my wish is also to minimise WAL volume as much as
possible.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Heikki Linnakangas

Zdenek Kotala wrote:
My conclusion is that new implementation is about 8% slower in OLTP 
workload.


Can you do some analysis of why that is?

Looks like I need to blow the dust off my DBT-2 test rig and try to 
reproduce that as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Tom Lane
Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> [ blink... ]  Doesn't look like that should happen.  What is your
>> test case?

> Hmph, must be because of the patch from last winter to prevent 
> relfilenode reuse until next checkpoint.

Ah.  I had misunderstood Alvaro to say that temp files (the kind under
discussion up to now) were not unlinked immediately; which would be
pretty strange given that fd.c is underneath md.c.

> Looks like we didn't make an 
> exception for temporary tables. Although it's harmless, we could put an 
> isTempOrToastNamespace() test in there:

Bad, bad idea to have md.c doing any catalog access.  As already noted
downthread, it wouldn't buy much anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] New FSM patch

2008-09-18 Thread Zdenek Kotala

Heikki Linnakangas napsal(a):

Zdenek Kotala wrote:



Suggestions:

1) remove WAL logging. I think that FSM record should be recovered 
during processing of others WAL records (like insert, update). 
Probably only we need full page write on first modification after 
checkpoint.


Hmm, we don't have the infrastructure to do that, but I guess it's 
doable. In case of a crash, the FSM information after recovery wouldn't 
obviously be up-to-date. And the FSM information in a warm standby would 
also lag behind.


One option would be to put RecordPageWithFreeSpace() calls into 
heap_redo, so that the FSM would be updated based on the WAL records we 
write anyway.


Yes, it seems to be a good place.

I think we'd still need to WAL log operations that decrease the amount 
of free space on page. Otherwise, after we have partial vacuum, we might 
never revisit a page, and update the FSM, even though there's usable 
space on the page, leaving the space forgotten forever.


I think, if you update actual free space on each page which is recorded in the 
WAL then you should have actual FSM. Something like this:


RecordPageWithFreeSpace(PageGetFreeSpace(..))





Other important test cases are various bulk insert kind of tests.


Parallel bulkload could be problem. Another problem could be CREATE TEMP TABLE 
command which updates catalog, which is usually small.


Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FSM patch - performance test

2008-09-18 Thread Heikki Linnakangas

Zdenek Kotala wrote:
My conclusion is that new implementation is about 8% slower in OLTP 
workload.


Thanks. That's very disappointing :-(

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] FSM patch - performance test

2008-09-18 Thread Zdenek Kotala

Hi Heikki,

I finally performed iGen test. I used two v490 servers with 4 dual core SPARC 
CPUs and 32GB RAM. I have only one disk and I did not performed any disk I/O 
optimization. I tested 105 parallel connection and think time was 200ms.

See the result:

Original:
-
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1458.76 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 3122.44 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2626.70 tpm


TRANSACTION MIX

Total number of transactions = 438133
TYPETX. COUNT   MIX
-   ---
Light:  72938   16.65%
Medium: 156122  35.63%
DSS:48516   11.07%
Heavy:  131335  29.98%
Connection: 29222   6.67%


RESPONSE TIMES  AVG.MAX.90TH

Light   0.541   3.692   0.800
Medium  0.542   3.702   0.800
DSS 0.539   3.510   0.040
Heavy   0.539   3.742   4.000
Connections 0.545   3.663   0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 64.851454


New FSM implementation:
---
Actual run/snap-shot time: 3004 sec

MQThL (Maximum Qualified Throughput LIGHT): 1351.20 tpm
MQThM (Maximum Qualified Throughput MEDIUM): 2888.74 tpm
MQThH (Maximum Qualified Throughput HEAVY): 2428.90 tpm


TRANSACTION MIX

Total number of transactions = 405502
TYPETX. COUNT   MIX
-   ---
Light:  67560   16.66%
Medium: 144437  35.62%
DSS:45028   11.10%
Heavy:  121445  29.95%
Connection: 27032   6.67%


RESPONSE TIMES  AVG.MAX.90TH

Light   0.596   3.735   0.800
Medium  0.601   3.748   0.800
DSS 0.601   3.695   0.040
Heavy   0.597   3.725   4.000
Connections 0.599   3.445   0.800
Number of users = 105
Sum of Avg. RT * TPS for all Tx. Types = 66.419466




My conclusion is that new implementation is about 8% slower in OLTP workload.

Zdenek


--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Greg Stark
Why bit just add a new bitfield for flags if we need them? I'm usually  
the one worried about data density so perhaps I should be on the other  
side of the fence here but I'm not sure. The conventional wisdom is  
that wal bandwidth is not a major issue.


greg

On 18 Sep 2008, at 12:15, Simon Riggs <[EMAIL PROTECTED]> wrote:



On Thu, 2008-09-18 at 13:56 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I'd like to add some new flag bits to XLogRecord. (xlog.h)

Where? xl_prev.


I'm more curious about "What?" and "Why?", actually ;-),


Just trying to solve the egg/chicken problem of "I want to add a  
flag";

"but there are no flags available". I'm sure there's a few uses coming
up in synch replication also.

I want two flags for Hot Standby, but lets justify them on another  
post.



So I would like to propose that we ignore the top 4 bits in
xl_prev.xlogid when comparing values, rather than using all 32  
bits for
comparison. That then frees up 4 new flag bits on XLogRecords.  
Changing
xl_prev handling is only required in 3 places, all in xlog.c, plus  
some

log outputs.


Or, we could store only the delta between current record and the
previous one. Assuming we know what the current record is, that  
wouldn't

lose any precision. That way xl_prev only needs to be as big as the
biggest possible WAL record we can have.

Not that I think the precision in your scheme isn't enough, but I  
find

the delta easier to comprehend.


That can't work, I know, that was my first thought.

The files are reused, so xl_prev protects against reusing a file and
then having a perfectly valid WAL record *after* the correct end of  
WAL

that makes it look like WAL continues. So a delta could be valid data
even though the record was invalid.

We don't want to zero the files cause that costs too much, so we  
have to

allow for seemingly correct data as well as correct data in WAL.

I think the xl_prev field could be removed completely when streaming,
except the new flags of course.

--
Simon Riggs   www.2ndQuadrant.com
PostgreSQL Training, Services and Support


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 13:56 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > I'd like to add some new flag bits to XLogRecord. (xlog.h)
> > 
> > Where? xl_prev.
> 
> I'm more curious about "What?" and "Why?", actually ;-),

Just trying to solve the egg/chicken problem of "I want to add a flag";
"but there are no flags available". I'm sure there's a few uses coming
up in synch replication also.

I want two flags for Hot Standby, but lets justify them on another post.

> > So I would like to propose that we ignore the top 4 bits in
> > xl_prev.xlogid when comparing values, rather than using all 32 bits for
> > comparison. That then frees up 4 new flag bits on XLogRecords. Changing
> > xl_prev handling is only required in 3 places, all in xlog.c, plus some
> > log outputs.
> 
> Or, we could store only the delta between current record and the 
> previous one. Assuming we know what the current record is, that wouldn't 
> lose any precision. That way xl_prev only needs to be as big as the 
> biggest possible WAL record we can have.
> 
> Not that I think the precision in your scheme isn't enough, but I find 
> the delta easier to comprehend.

That can't work, I know, that was my first thought.

The files are reused, so xl_prev protects against reusing a file and
then having a perfectly valid WAL record *after* the correct end of WAL
that makes it look like WAL continues. So a delta could be valid data
even though the record was invalid.

We don't want to zero the files cause that costs too much, so we have to
allow for seemingly correct data as well as correct data in WAL.

I think the xl_prev field could be removed completely when streaming,
except the new flags of course.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] 0x1A in control file on Windows

2008-09-18 Thread Magnus Hagander
Heikki Linnakangas wrote:
> ITAGAKI Takahiro wrote:
>> Tom Lane <[EMAIL PROTECTED]> wrote:
>>
>>> ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:
 We probably need to add PG_BINARY when we open control files
 because 0x1A is an end-of-file marker on Windows.
>>> Well, why is that a bug?  If the platform is so silly as to define text
>>> files that way, who are we to argue?
>>
>> Google says it is for for backward compatibility with CP/M
>> http://en.wikipedia.org/wiki/End-of-file
>> and adding O_BINARY is the answer.
>>
>> http://codenewbie.com/forum/standard-c-c/1208-binary-i-o-file-reading-0x1a-trouble.html
>>
> 
> Yes, apparently that's exactly why we have PG_BINARY, see c.h:
> 
>> /*
>>  *NOTE:  this is also used for opening text files.
>>  *WIN32 treats Control-Z as EOF in files opened in text mode.
>>  *Therefore, we open files in binary mode on Win32 so we can read
>>  *literal control-Z.The other affect is that we see CRLF, but
>>  *that is OK because we can already handle those cleanly.
>>  */
>> #if defined(WIN32) || defined(__CYGWIN__)
>> #define PG_BINARYO_BINARY
>> #define PG_BINARY_A "ab"
>> #define PG_BINARY_R "rb"
>> #define PG_BINARY_W "wb"
>> #else
>> #define PG_BINARY0
>> #define PG_BINARY_A "a"
>> #define PG_BINARY_R "r"
>> #define PG_BINARY_W "w"
>> #endif
> 
> I don't see anything wrong with the patch, but I wonder if there's more
> open() calls that need the same treatment? Like the one in
> pg_resetxlog.c/ReadControlFile().

Agreed, and I think that one would also need it - and pg_resetxlog
already does this when it writes the file. A quick look doesn't show any
other places, but I may have missed some?

/Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] 0x1A in control file on Windows

2008-09-18 Thread Heikki Linnakangas

ITAGAKI Takahiro wrote:

Tom Lane <[EMAIL PROTECTED]> wrote:


ITAGAKI Takahiro <[EMAIL PROTECTED]> writes:

We probably need to add PG_BINARY when we open control files
because 0x1A is an end-of-file marker on Windows.

Well, why is that a bug?  If the platform is so silly as to define text
files that way, who are we to argue?


Google says it is for for backward compatibility with CP/M
http://en.wikipedia.org/wiki/End-of-file
and adding O_BINARY is the answer.

http://codenewbie.com/forum/standard-c-c/1208-binary-i-o-file-reading-0x1a-trouble.html


Yes, apparently that's exactly why we have PG_BINARY, see c.h:


/*
 *  NOTE:  this is also used for opening text files.
 *  WIN32 treats Control-Z as EOF in files opened in text mode.
 *  Therefore, we open files in binary mode on Win32 so we can read
 *  literal control-Z.  The other affect is that we see CRLF, but
 *  that is OK because we can already handle those cleanly.
 */
#if defined(WIN32) || defined(__CYGWIN__)
#define PG_BINARY   O_BINARY
#define PG_BINARY_A "ab"
#define PG_BINARY_R "rb"
#define PG_BINARY_W "wb"
#else
#define PG_BINARY   0
#define PG_BINARY_A "a"
#define PG_BINARY_R "r"
#define PG_BINARY_W "w"
#endif


I don't see anything wrong with the patch, but I wonder if there's more 
open() calls that need the same treatment? Like the one in 
pg_resetxlog.c/ReadControlFile().


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Heikki Linnakangas

Simon Riggs wrote:

I'd like to add some new flag bits to XLogRecord. (xlog.h)

Where? xl_prev.


I'm more curious about "What?" and "Why?", actually ;-),


So I would like to propose that we ignore the top 4 bits in
xl_prev.xlogid when comparing values, rather than using all 32 bits for
comparison. That then frees up 4 new flag bits on XLogRecords. Changing
xl_prev handling is only required in 3 places, all in xlog.c, plus some
log outputs.


Or, we could store only the delta between current record and the 
previous one. Assuming we know what the current record is, that wouldn't 
lose any precision. That way xl_prev only needs to be as big as the 
biggest possible WAL record we can have.


Not that I think the precision in your scheme isn't enough, but I find 
the delta easier to comprehend.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Adding new flags to XLogRecord

2008-09-18 Thread Simon Riggs
I'd like to add some new flag bits to XLogRecord. (xlog.h)

Where? xl_prev.

xl_prev is an XLogRecPtr which points backwards to the immediately
preceeding WAL record. All of the bits are currently used, but I have
some observations and a proposal to change that.

We currently compare the whole xl_prev value against the whole
XLogRecPtr of the last WAL record.

When we are reading back WAL, if a WAL record is valid the xlogid
portion of the value seldom differs by more than +1 from pointer of the
current record, since that would imply an xlog record of more than 4GB.
If it is incorrect, it will either be garbage or occasionally be a
previously valid value but from two prior checkpoints back before this
file was reused.

So we probably don't need to compare the whole of xl_prev against the
whole of the last WAL record pointer, we can probably avoid comparing
some of the high bits, since the range of valid values is so limited.
How many bits?

checkpoint_segments is limited to INT_MAX, which means the xlogid
increase of a single checkpoint is always at most INT_MAX/255. That
means that the xl_prev value cannot differ by more than 2* INT_MAX/255
across two checkpoints. (I make that 134 Petabytes). Alternatively, the
checkpoint_timeout is one hour. So we're OK until systems can write WAL
at 67 Petabytes/hour. 

Which means if
* we never get WAL records of more than 67 Petabytes in size *and* 
* the lowest 25 bits of xl_prev do not match the position of the last
WAL record 
then the XLogRecord is invalid, no matter what the value of the highest
7 bits of xl_prev. 

So I would like to propose that we ignore the top 4 bits in
xl_prev.xlogid when comparing values, rather than using all 32 bits for
comparison. That then frees up 4 new flag bits on XLogRecords. Changing
xl_prev handling is only required in 3 places, all in xlog.c, plus some
log outputs.

I would simply document the limitation of WAL record sizes. Putting code
in for that would be pointless since the test would last years on
current systems. (We wouldn't need dtrace to measure the WALInsertLock
hold time, we could use tree rings.:-)

These values would vary if we allow XLOG_SEG_SIZE higher than 16MB, but
we should probably limit checkpoint_segments according to the setting of
XLOG_SEG_SIZE anyhow.

Thoughts?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Prototype: In-place upgrade v02

2008-09-18 Thread Zdenek Kotala

Abbas napsal(a):

Hi,
I downloaded latest postgresql source code from
 git clone git://git.postgresql.org/git/postgresql.git
and tried to apply the patch 
 http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz


It does not apply cleanly, see the failures in attached file.


It clash with hash index patch which was committed four days ago. Try to use 
little bit older revision from git (without hash index modification).


Zdenek



--
Zdenek Kotala  Sun Microsystems
Prague, Czech Republic http://sun.com/postgresql


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

An unfortunate choice of words! Harmless is not how your average DBA
would describe it when their disk fills and they are apparently unable
to reduce space consumption. So there is still a problem there even if
we fix the temp files portion of it.
The files *are* truncated to zero bytes immediately. They're left 
hanging as empty files until next checkpoint.


Ah, cool. So won't actually unlinking temp files be slower than just
leaving them for checkpointer to clear up offline? i.e. do we really
need the patch you just posted?


Dunno, maybe. I doubt it's significant either way.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] EncodeDateTime performance

2008-09-18 Thread Heikki Linnakangas

George McCollister wrote:

oprofile is showing that memset (via dopr) is using about 60% of the CPU. I 
traced back further and noticed most of the usage was coming from 
EncodeDateTime.

I'm not quite sure why oprofile is showing that memset is hogging so much CPU. 
Regardless, I found way to eliminate most of the sprintf calls that were taking 
place in my situation.

I made some modifications to EncodeDateTime and have attached them as a patch. These changes alone reduced the query time of 

the "select *  from archivetbl;" from 79 seconds to just 35 seconds.

I remember I saw a similar effect some time ago, in a test case of 
dumping a table with lots of timestamp columns. It was on 32-bit Linux, 
which uses the glibc sprintf, instead of the one in src/port, and the 
time wasn't spent in MemSet but in the glibc sprintf. However, the fix I 
came up with was exactly the same: replace sprintf with functions like 
that, which takes removes the overhead of parsing the format string on 
every invocation.


It then occurred to me that the compiler could do the same for us, and 
posted that idea on the gcc mailing list:


http://gcc.gnu.org/ml/gcc/2007-10/msg00073.html

I got busy with other stuff, and never got the chance to follow up with 
that for gcc, or with a patch to PostgreSQL.


It does seem like it would be worthwhile to do something about this.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [REVIEW] Prototype: In-place upgrade v02

2008-09-18 Thread Abbas
Hi,
I downloaded latest postgresql source code from
 git clone git://git.postgresql.org/git/postgresql.git
and tried to apply the patch 
 http://archives.postgresql.org/pgsql-hackers/2008-09/gza1fGXLvf3L.gz

It does not apply cleanly, see the failures in attached file.

Regards
Abbas
www.enterprisedb.com

patch -p1 
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Infrastructure changes for recovery

2008-09-18 Thread Simon Riggs

On Tue, 2008-09-16 at 15:45 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> wrote:
> > Testing takes a while on this, I probably won't complete it until
> > Friday. So enclosed patch is for eyeballs only at this stage.
> 
> What's the status on that patch?

Having some trouble trying to get a clean state change from recovery to
normal mode.

Startup needs to be able to write WAL at the end of recovery so it can
write a ShutdownCheckpoint, yet must not be allowed to write WAL before
that. Other services are still not fully up yet. So every other process
apart from Startup musn't write WAL until Startup has fully completed
its actions, which is sometime later.

bgwriter needs to know about the impending state change so it can finish
off any checkpoint its currently working on. But then can't start doing
normal processing yet either. So it must have a third state that is
between recovery and normal processing. When normal processing is
reached, it *must* write WAL immediately from that point onwards, yet
using the new timeline that startup decides upon.

So the idea of a single database-wide boolean state seems impossible. We
need a local canInsertWAL flag that is set at different times in
different processes.

Global states changes are now

not started
started - postmaster process then startup process
safestoppingpoint - bgwriter starts
startup_does_shutdown_checkpoint - bgwriter finishes up, just waits,
   startup is now allowed to insert WAL for checkpoint record
startup completes StartupXLog - bgwriter begins normal processing
startup exits - postmaster in normal state

We *might* solve this by making the final checkpoint that Startup writes
into an online checkpoint. That would then allow a straight and safe
transition for bgwriter from just one state to the other. But that
leaves some other ugliness that I don't want to deal with, 'cos there's
other fish frying.

Feels like I should shutdown the bgwriter after recovery and then allow
it to be cranked up again after normal processing starts, and do all of
this through postmaster state changes. That way bgwriter doesn't need to
do a dynamic state change.

Comments anyone?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 10:19 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > 
> > An unfortunate choice of words! Harmless is not how your average DBA
> > would describe it when their disk fills and they are apparently unable
> > to reduce space consumption. So there is still a problem there even if
> > we fix the temp files portion of it.
> 
> The files *are* truncated to zero bytes immediately. They're left 
> hanging as empty files until next checkpoint.

Ah, cool. So won't actually unlinking temp files be slower than just
leaving them for checkpointer to clear up offline? i.e. do we really
need the patch you just posted?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote:

Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

BTW in testing this patch I was surprised by the fact that temp tables
files are removed at checkpoint time,

[ blink... ]  Doesn't look like that should happen.  What is your
test case?
Hmph, must be because of the patch from last winter to prevent 
relfilenode reuse until next checkpoint. Looks like we didn't make an 
exception for temporary tables. Although it's harmless...


An unfortunate choice of words! Harmless is not how your average DBA
would describe it when their disk fills and they are apparently unable
to reduce space consumption. So there is still a problem there even if
we fix the temp files portion of it.


The files *are* truncated to zero bytes immediately. They're left 
hanging as empty files until next checkpoint.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] optimizing CleanupTempFiles

2008-09-18 Thread Simon Riggs

On Thu, 2008-09-18 at 09:23 +0300, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> >> BTW in testing this patch I was surprised by the fact that temp tables
> >> files are removed at checkpoint time,
> > 
> > [ blink... ]  Doesn't look like that should happen.  What is your
> > test case?
> 
> Hmph, must be because of the patch from last winter to prevent 
> relfilenode reuse until next checkpoint. Looks like we didn't make an 
> exception for temporary tables. Although it's harmless...

An unfortunate choice of words! Harmless is not how your average DBA
would describe it when their disk fills and they are apparently unable
to reduce space consumption. So there is still a problem there even if
we fix the temp files portion of it.

Seems like a complete fix must have some kind of pressure relief valve
for when things get full. Disk overflow is a much more likely problem
than a relfilenode recycling problem.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers