Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Heikki Linnakangas

Fujii Masao wrote:

On Tue, Jun 2, 2009 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:

Fujii Masao masao.fu...@gmail.com writes:

Yes, the old xlog itself is not used again. But, the *old file* might
be recycled and used later. The case that I'm looking at is that the
symlink to a temporary area is recycled. Am I missing something?

Actually, I think the right fix for that would be to add defenses to
xlog.c to not try to recycle a file that is a symlink.


OK, I tweaked Aidan's patch. Thanks Aidan!
http://archives.postgresql.org/message-id/20090601152736.gl15...@yugib.highrise.ca

Changes are:
- use lstat instead of stat
- add #if HAVE_WORKING_LINK and #endif code


Committed. I left out the #ifdef HAVE_WORKING_LINK and used S_ISREG() 
instead of S_ISLNK. We use lstat + S_ISREG elsewhere too, so there 
should be no portability issues.


I backpatched to 8.3, since that's when pg_standby was added. Arguably 
earlier versions should've been changed too, as pg_standby works with 
earlier versions, but I decided to not rock the boat as this only 
affects the pg_standby -l mode.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/1/09, Markus Wanner mar...@bluegap.ch wrote:
  a newish conversion with cvs2git is available to check here:

   git://www.bluegap.ch/

  (it's not incremental and will only stay for a few days)

+1 for the idea of replacing CVS usernames with full names.

The knowledge about CVS usernames will be increasingly obscure.

Also worth mentioning is that there is no need to assign absolutely
up-to-date email addresses, it's enough if they uniquely identify
person.

  Aidan Van Dyk wrote:
   Yes, but the point is you want an exact replica of CVS right?  You're
   git repo should have $PostgreSQL$ and the cvs export/checkout (you do
   use -kk right) should also have $PostgreSQL$.


 No, I'm testing against cvs checkout, as that's what everybody is used to.


   But it's important, because on *some* files you *do* want expanded
   keywords (like the $OpenBSD ... Exp $.  One of the reasons pg CVS went
   to the $PostgreSQL$ keyword (I'm guessing) was so they could explictly
   de-couple them from other keywords that they didn't want munging on.


 I don't care half as much about the keyword expansion stuff - that's
  doomed to disappear anyway.

But this is one aspect we need to get right for the conversion.

So preferably we test it sooner not later.

I think Aidan got it right - expand $PostgreSQL$ and others that are
actually expanded on current repo, but not $OpenBSD$ and others
coming from external sources.

  What I'm much more interested in is correctness WRT historic contents,
  i.e. that git log, git blame, etc.. deliver correct results. That's
  certainly harder to check.

  In my experience, cvs2svn (or cvs2git) does a pretty decent job at that,
  even in case of some corruptions. Plus it offers lots of options to fine
  tune the conversion, see the attached configuration I've used.


   So, I wouldn't consider any conversion good unless it had all these:
  

  As well as stuff like:
 parsecvs-master:src/backend/access/index/genam.c: *   
 $PostgreSQL$


 I disagree here and find it more convenient for the git repository to
  keep the old RCS versions - as in the source tarballs that got (and
  still get) shipped. Just before switching over to git one can (and
  should, IMO) remove these tags to avoid confusion.

I'd prefer we immediately test full conversion and not leave some
steps to last moment.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Fujii Masao
Hi,

On Tue, Jun 2, 2009 at 3:40 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:

 On Tue, Jun 2, 2009 at 10:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Fujii Masao masao.fu...@gmail.com writes:

 Yes, the old xlog itself is not used again. But, the *old file* might
 be recycled and used later. The case that I'm looking at is that the
 symlink to a temporary area is recycled. Am I missing something?

 Actually, I think the right fix for that would be to add defenses to
 xlog.c to not try to recycle a file that is a symlink.

 OK, I tweaked Aidan's patch. Thanks Aidan!

 http://archives.postgresql.org/message-id/20090601152736.gl15...@yugib.highrise.ca

 Changes are:
 - use lstat instead of stat
 - add #if HAVE_WORKING_LINK and #endif code

 Committed. I left out the #ifdef HAVE_WORKING_LINK and used S_ISREG()
 instead of S_ISLNK. We use lstat + S_ISREG elsewhere too, so there should be
 no portability issues.

Thanks a lot!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] 8.4b2 tsearch2 strange error

2009-06-02 Thread Tatsuo Ishii
Hi,

I have encountered strange errors while testing PostgreSQL 8.4 beta2.

SELECT msg_sid FROM msginfo WHERE plainto_tsquery(E'test') @@
body_index;
or
SELECT msg_sid FROM msginfo WHERE to_tsquery(E'test') @@ body_index;

produces following errors:

ERROR:  tuple offset out of range: 0
(occasionally ERROR:  tuple offset out of range: 459)

Here is the table definition:

CREATE TABLE msginfo (
msg_sid BIGSERIAL PRIMARY KEY,
file_size INTEGER,
file_mtime TIMESTAMP,
msg_date TIMESTAMP,
flags INTEGER,
hdr_from TEXT,
hdr_to TEXT,
hdr_cc TEXT,
hdr_newsgroups TEXT,
hdr_subject TEXT,
hdr_msgid TEXT UNIQUE NOT NULL,
hdr_inreplyto TEXT,
hdr_references TEXT,
body_text TEXT,
body_index TSVECTOR
);
CREATE INDEX msginfo_msg_date_index ON msginfo (msg_date);
CREATE INDEX msginfo_body_index ON msginfo USING gin (body_index);

and other info:

Ubuntu 8.04
./configure --prefix=/usr/local/pgsql84
initdb -E UTF-8 --no-locale /path/to/database

sylph=# EXPLAIN SELECT msg_sid FROM msginfo WHERE to_tsquery('test') @@ 
body_index;
   QUERY PLAN   
 

-
 Bitmap Heap Scan on msginfo  (cost=4.59..8.61 rows=1 width=8)
   Recheck Cond: (to_tsquery('test'::text) @@ body_index)
   -  Bitmap Index Scan on msginfo_body_index  (cost=0.00..4.59 rows=1 width=0)
 Index Cond: (to_tsquery('test'::text) @@ body_index)
(4 rows)
--
Tatsuo Ishii
SRA OSS, Inc. Japan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Marko Kreen mark...@gmail.com wrote:
 On 6/1/09, Markus Wanner mar...@bluegap.ch wrote:
a newish conversion with cvs2git is available to check here:
  
 git://www.bluegap.ch/
  
(it's not incremental and will only stay for a few days)

Btw this conversion seems broken as it contains random merge commits.

parsecvs managed to do it without them.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Markus Wanner

Hi,

Quoting Greg Stark st...@enterprisedb.com:

No, I'm not. I'm questioning whether a serializable transaction
isolation level that makes no guarantee that it won't fire spuriously
is useful.


It would certainly be an improvement compared to our status quo, where  
truly serializable transactions aren't supported at all. And it seems  
more promising than heading for a perfect *and* scalable implementation.



Heikki proposed a list of requirements which included a requirement
that you not get spurious serialization failures


That requirement is questionable. If we get truly serializable  
transactions (i.e. no false negatives) with reasonably good  
performance, that's more than enough and a good step ahead.


Why care about a few false positives (which don't seem to matter  
performance wise)? We can probably reduce or eliminate them later on.  
But eliminating false negatives is certainly more important to start  
with.


What I'm more concerned is the requirement of the proposed algorithm  
to keep track of the set of tuples read by any transaction and keep  
that set until sometime well after the transaction committed (as  
questioned by Neil [1]). That doesn't sound like a negligible overhead.


Maybe the proposed algorithm has to be applied to pages instead of  
tuples, as they did it in the paper for Berkeley DB. Just to keep that  
overhead reasonably low.


Regards

Markus Wanner

[1]: Neil Conway's blog, Serializable Snapshot Isolation:
http://everythingisdata.wordpress.com/2009/02/25/february-25-2009/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggested TODO: allow ALTERing of typemods without heap/index rebuild

2009-06-02 Thread Dimitri Fontaine
Hi,

Josh Berkus j...@agliodbs.com writes:
 The stumbling block has been to identify a reasonably clean way
 of determining which datatype changes don't require a scan.

 Yep.  One possibility I'm thinking is supplying a function for each type
 which takes two typemods (old and new) and returns a value (none, check,
 rebuild) which defines what we need to do: nothing, check but not rebuild,
 or rebuild.  Default would be rebuild.  Then the logic is simple for each
 data type.

That seems like a good idea, I don't see how the current infrastructure
could provide enough information to skip this here. Add in there whether
a reindex is needed, too, in the accepted return values (maybe a mask is
needed, such as NOREWRITE|REINDEX).

 Note that this doesn't deal with the special case of VARCHAR--TEXT, but
 just with changing typemods.  Are there other cases of data *type*
 conversions where no check or rebuild is required?  Otherwise we might just
 special case VARCHAR--TEXT.

It seems there's some new stuff for this in 8.4, around the notions of
binary coercibility and type categories, which allow user defined types
to be declared IO compatible with built-in types, e.g. citext/text.

Maybe the case is not so special anymore?

  
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=22ff6d46991447bffaff343f4e333dcee188094d
  
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=4a3be7e52d7e87d2c05ecc59bc4e7d20f0bc9b17

 Oh, here's a general case: changing DOMAINs on the same base type should
 only be a check, and changing from a DOMAIN to its own base type should be a
 none.

DOMAINs and CASTs are still on the todo list IIRC, so I'm not sure the
current infrastructure around DOMAINs would be flexible (or complete)
enough for the system to determine when the domain A to domain B type
change is binary coercible. It has no CAST information to begin with, I
guess.

As far as reindexing is concerned, talking with RhodiumToad (Andrew
Gierth) on IRC gave insights, as usual. Standard PostgreSQL supports two
data type change without reindex needs: varchar to text and cidr to
inet. In both cases, the types share the indexing infrastructure: same
PROCEDUREs are in use in the OPERATORs that the index is using.

Could it be that we already have the information we need in order to
dynamically decide whether a heap rewrite and a reindex are necessary,
even in case of user defined type conversions?

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Marko Kreen mark...@gmail.com:

I don't care half as much about the keyword expansion stuff - that's
 doomed to disappear anyway.


But this is one aspect we need to get right for the conversion.


What's your definition of right? I personally prefer the keyword  
expansion to match a cvs checkout as closely as possible.



So preferably we test it sooner not later.


I actually *am* testing against that. As mentioned, the only  
differences are insignificant, IMO. For example having 1.1.1.1  
instead of 1.1 (or vice versa, I don't remember).



I think Aidan got it right - expand $PostgreSQL$ and others that are
actually expanded on current repo, but not $OpenBSD$ and others
coming from external sources.


AFAIU Aidan proposed the exact opposite.

I'm proposing to leave both expanded, as in a CVS checkout and as  
shipped in the source release tarballs.



I'd prefer we immediately test full conversion and not leave some
steps to last moment.


IMO that would equal to changing history, so that a checkout from git  
doesn't match a released tarball as good as possible.


What you call leave(ing) some steps to last moment is IMO not part  
of the conversion. It's rather a conscious decision to drop these  
keywords as soon as we switch to git. This step should be represented  
in history as a separate commit, IMO.


What do others think?

Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Greg Stark
On Tue, Jun 2, 2009 at 1:13 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Greg Stark st...@enterprisedb.com wrote:

 Just as carefully written SQL code can be written to avoid deadlocks
 I would expect to be able to look at SQL code and know it's safe
 from serialization failures, or at least know where they might
 occur.

 This is the crux of our disagreement, I guess.  I consider existing
 techniques fine for situations where that's possible.

a) When is that possible? Afaict it's always possible, you can never
know and when it might happen could change any time.

b) What existing techniques, explicit locking?

 But, could you
 give me an estimate of how much time it would take you, up front and
 ongoing, to do that review in our environment?  About 8,700 queries
 undergoing frequent modification, by 21 programmers, for enhancements
 in our three-month release cycle.  Plus various ad hoc queries.  We
 have one full-time person to run ad hoc data fixes and reports
 requested by the legislature and various outside agencies, like
 universities doing research.

Even in your environment I could easily imagine, say, a monthly job to
delete all records older than 3 months. That job could take hours or
even days. It would be pretty awful for it to end up needing to be
retried. All I'm saying is that if you establish a policy -- perhaps
enforced using views -- that no queries are allowed to access records
older than 3 months you shouldn't have to worry that you'll get a
spurious serialization failure working with those records.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
   I don't care half as much about the keyword expansion stuff - that's
doomed to disappear anyway.
  
 
  But this is one aspect we need to get right for the conversion.
 

  What's your definition of right? I personally prefer the keyword
 expansion to match a cvs checkout as closely as possible.

This is Definitely Wrong (tm).  You seem to be thinking that comparing
GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
main use-case.  It is not.  Browsing history and looking and diffs
between versions is.  And expanded CVS keywords would be total PITA
for that.

  So preferably we test it sooner not later.
 

  I actually *am* testing against that. As mentioned, the only differences
 are insignificant, IMO. For example having 1.1.1.1 instead of 1.1 (or
 vice versa, I don't remember).

Why have those at all...

  I think Aidan got it right - expand $PostgreSQL$ and others that are
  actually expanded on current repo, but not $OpenBSD$ and others
  coming from external sources.
 

  AFAIU Aidan proposed the exact opposite.

Ah, sorry, my thinko.  s/expanded/stripped/.  Take Aidan's description
as authoritative.. :)

  I'm proposing to leave both expanded, as in a CVS checkout and as shipped
 in the source release tarballs.

No, the noise they add to history would seriously hurt usability.

  I'd prefer we immediately test full conversion and not leave some
  steps to last moment.
 

  IMO that would equal to changing history, so that a checkout from git
 doesn't match a released tarball as good as possible.

We need to compare against tarballs only when checking the conversion.
And only then.  Writing few scripts for that should not be a problem.

  What you call leave(ing) some steps to last moment is IMO not part of the
 conversion. It's rather a conscious decision to drop these keywords as soon
 as we switch to git. This step should be represented in history as a
 separate commit, IMO.

The question is how they should appear in historical commits.

I have no strong opinion whether to edit them out or not in the future.
Doing it during the periodic reindent would be good moment tho'.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Marko Kreen mark...@gmail.com:

Btw this conversion seems broken as it contains random merge commits.


Well, that's a feature, not a bug ;-)

When a commit adds a file to the master *and* then to the branch as  
well, cvs2git prefers to represent this as a merge from the master  
branch, instead of adding the file twice, once on the master and once  
on the branch.


This way the target VCS knows it's the *same* file, originating from  
one single commit. This may be important for later merges - otherwise  
you may suddenly end up with duplicated files after a merge, because  
the VCS doesn't know they are in fact the same.


(Okay, git assumes two files to have the same origin/history as long  
as they have the same filename. But just rename one of the two, and  
you are have the same troubles, again).


Also note that these situations occur rather frequently in the  
Postgres CVS repository. Every back-patch which adds files ends up as  
a merge. (One could even argue that in the perfect conversion *all*  
back-patches should be represented as merges, rather than as separate  
commits).



parsecvs managed to do it without them.


Now, I'm not calling it broken, but cvs2git's output is arguably  
better in that regard.


As you certainly see by now, conversion from CVS is neither simple nor  
unambiguous.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Marko Kreen mark...@gmail.com:

This is Definitely Wrong (tm).  You seem to be thinking that comparing
GIT checkout to random parallel CVS checkout (eg. from .tgz.) is the
main use-case.  It is not.  Browsing history and looking and diffs
between versions is.  And expanded CVS keywords would be total PITA
for that.


That's an agrument. Point taken. I'll check if cvs2git supports that as well.

Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
  Btw this conversion seems broken as it contains random merge commits.
 

  Well, that's a feature, not a bug ;-)

  When a commit adds a file to the master *and* then to the branch as well,
 cvs2git prefers to represent this as a merge from the master branch, instead
 of adding the file twice, once on the master and once on the branch.

  This way the target VCS knows it's the *same* file, originating from one
 single commit. This may be important for later merges - otherwise you may
 suddenly end up with duplicated files after a merge, because the VCS doesn't
 know they are in fact the same.

  (Okay, git assumes two files to have the same origin/history as long as
 they have the same filename. But just rename one of the two, and you are
 have the same troubles, again).

Not a problem for git I think - it assumes they are same if they have
same contents...

  Also note that these situations occur rather frequently in the Postgres CVS
 repository. Every back-patch which adds files ends up as a merge. (One could
 even argue that in the perfect conversion *all* back-patches should be
 represented as merges, rather than as separate commits).

Well, such behaviour may be a feature for some repo with complex CVS
usage, but currently we should aim for simple and clear conversion.

The question is - do such merges make any sense to human looking at
history - and the answer is no, as no VCS level merge was happening,
just some copying around (if your description is correct).  And
we don't need to add noise for the benefit of GIT as it works fine
without any fake merges.

Our target should be each branch having simple linear history,
without any fake merges.  This will result in minimal confusion
to both humans looking history and also GIT itself.

So please turn the merge logic off.  If this cannot be turned off,
cvs2git is not usable for conversion.

  parsecvs managed to do it without them.
 

  Now, I'm not calling it broken, but cvs2git's output is arguably better in
 that regard.

Seems it contains more complex logic to handle more complex CVS usage
cases, but seems like overkill for us if it creates a mess of history.

  As you certainly see by now, conversion from CVS is neither simple nor
 unambiguous.

I know, thats why I'm discussing the tradeoffs.  Simple+clear vs.
complex+messy. :)

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dot to be considered as a word delimiter?

2009-06-02 Thread Kenneth Marshall
On Mon, Jun 01, 2009 at 08:22:23PM -0500, Kevin Grittner wrote:
 Sushant Sinha sushant...@gmail.com wrote: 
  
  I think that dot should be considered by as a word delimiter because
  when dot is not followed by a space, most of the time it is an error
  in typing. Beside they are not many valid english words that have
  dot in between.
  
 It's not treating it as an English word, but as a host name.
  
 select ts_debug('english', 'Mr.J.Sai Deepak');
  ts_debug
 ---
  (host,Host,Mr.J.Sai,{simple},simple,{mr.j.sai})
  (blank,Space symbols, ,{},,)
  (asciiword,Word, all
 ASCII,Deepak,{english_stem},english_stem,{deepak})
 (3 rows)
  
 You could run it through a dictionary which would deal with host
 tokens differently.  Just be aware of what you'll be doing to
 www.google.com if you run into it.
  
 I hope this helps.
  
 -Kevin
 

In our uses for full text indexing, it is much more important to
be able to find host name and URLs than to find mistyped names.
My two cents.

Cheers,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk
* Markus Wanner mar...@bluegap.ch [090602 07:08]:
 Hi,

 Quoting Marko Kreen mark...@gmail.com:
 I don't care half as much about the keyword expansion stuff - that's
  doomed to disappear anyway.

 But this is one aspect we need to get right for the conversion.

 What's your definition of right? I personally prefer the keyword  
 expansion to match a cvs checkout as closely as possible.

 AFAIU Aidan proposed the exact opposite.

 I'm proposing to leave both expanded, as in a CVS checkout and as  
 shipped in the source release tarballs.

Well, since I have -kk set in my .cvsrc, mine matches exactly the CVS
checkout l-)

Basically, I want the git to be identical to the cvs checkout.  If you
use -kk, that means the PostgreSQL CVS repository keywords *aren't*
expanded.  If you like -kv, that means they are.

Pick your poison (after all, it's CVS), either way, I think the 2 of
*us* are going to disagree which is best here ;-)

But, which ever way (exact to -kk or exact to -kv), the conversion
should be exact, and there should be no reason to filter out
keyword-like stuff in the diffs.

 What you call leave(ing) some steps to last moment is IMO not part of 
 the conversion. It's rather a conscious decision to drop these keywords 
 as soon as we switch to git. This step should be represented in history 
 as a separate commit, IMO.

 What do others think?

I'm assuming they will get removed from the source eventually too - but
that step is *outside* the conversion.  Somebody could do it now in CVS
before the conversion, or afterwards, but it's still outside the
conversion.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Kevin Grittner
Markus Wanner mar...@bluegap.ch wrote: 
 
 What I'm more concerned is the requirement of the proposed algorithm
 to keep track of the set of tuples read by any transaction and keep
 that set until sometime well after the transaction committed (as
 questioned by Neil). That doesn't sound like a negligible overhead.
 
Quick summary for those who haven't read the paper: with this
non-blocking technique, every serializable transaction which
successfully commits must have its read locks tracked until all
serializable transactions which are active at the commit also
complete.
 
In the prototype implementation, I think they periodically scanned to
drop old transactions, and also did a final check right before
deciding there is a conflict which requires rollback, cleaning up the
transaction if it had terminated after the last scan but in time to
prevent a problem.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Tom Lane
Jeremy Kerr j...@ozlabs.org writes:
 The following patch changes psecure_write to be more like psecure_read -
 it only alters the signal mask if the connection is over SSL. It's only
 an RFC, as I'm not entirely sure about the reasoning behind blocking
 SIGPIPE for the non-SSL case - there may be other considerations here.

The consideration is that the application fails completely on server
disconnect (because it gets SIGPIPE'd).  This was long ago deemed
unacceptable, and we aren't likely to change our opinion on that.

What disturbs me about your report is the suggestion that there are
paths through that code that fail to protect against SIGPIPE.  If so,
we need to fix that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

Pick your poison (after all, it's CVS), either way, I think the 2 of
*us* are going to disagree which is best here ;-)


Marko already convinced me of -kk, I'm trying that with cvs2git.


But, which ever way (exact to -kk or exact to -kv), the conversion
should be exact, and there should be no reason to filter out
keyword-like stuff in the diffs.


I just really didn't want to care about keyword expansion. Besides  
lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
;-)


I'll let you know how cvs2git behaves WRT -kk.

Regards

Markus Wanner



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Simon Riggs

On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote:

 What I'd find strange about 6.67 rows in your example is more that on
 the estimated rows side, it seems to imply an unrealistically precise estimate
 in the same way that 667 rows would seem unrealistically precise to me.
 Maybe rounding to 2 significant digits would reduce confusion?

You're right that the number of significant digits already exceeds the
true accuracy of the computation. I think what Robert wants to see is
the exact value used in the calc, so the estimates can be checked more
thoroughly than is currently possible.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 8.4b2 tsearch2 strange error

2009-06-02 Thread Tom Lane
Tatsuo Ishii is...@postgresql.org writes:
 I have encountered strange errors while testing PostgreSQL 8.4 beta2.

 ERROR:  tuple offset out of range: 0
 (occasionally ERROR:  tuple offset out of range: 459)

This is evidently coming from tbm_add_tuples, indicating that it's being
passed bogus TID values from the GIN index.  We'll probably have to get
Teodor to look at it --- can you provide a self-contained test case?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Kevin Grittner
Greg Stark st...@enterprisedb.com wrote:
 
 On Tue, Jun 2, 2009 at 1:13 AM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Greg Stark st...@enterprisedb.com wrote:

 Just as carefully written SQL code can be written to avoid
deadlocks
 I would expect to be able to look at SQL code and know it's safe
 from serialization failures, or at least know where they might
 occur.

 This is the crux of our disagreement, I guess.  I consider existing
 techniques fine for situations where that's possible.
 
 a) When is that possible? Afaict it's always possible, you can never
 know and when it might happen could change any time.
 
Sorry that I wasn't more clear -- I meant I consider existing
techniques fine where it's possible to look at all the SQL code and
know what's safe from serialization failures or at least know where
they might occur.  I don't believe that's possible in an environment
with 8,700 queries in the application software, under constant
modification, with ad hoc queries run every day.
 
 b) What existing techniques, explicit locking?
 
Whichever techniques you would use right now, today, in PostgreSQL
which you feel are adequate to your needs.  You pick.
 
 But, could you
 give me an estimate of how much time it would take you, up front
and
 ongoing, to do that review in our environment?  About 8,700 queries
 undergoing frequent modification, by 21 programmers, for
enhancements
 in our three-month release cycle.  Plus various ad hoc queries.  We
 have one full-time person to run ad hoc data fixes and reports
 requested by the legislature and various outside agencies, like
 universities doing research.
 
 Even in your environment I could easily imagine, say, a monthly job
to
 delete all records older than 3 months. That job could take hours or
 even days. It would be pretty awful for it to end up needing to be
 retried. All I'm saying is that if you establish a policy -- perhaps
 enforced using views -- that no queries are allowed to access
records
 older than 3 months you shouldn't have to worry that you'll get a
 spurious serialization failure working with those records.
 
You have totally lost me.  We have next to nothing which can be
deleted after three months.  We have next to nothing which we get to
decide is deletable.  The elected Clerk of Court in each county is the
custodian of the records for that county, we facilitate their
record-keeping.  Some counties back-loaded data for some case types
(for example, probate) back to the beginning, in the mid-1800s, and
that information is not likely to go away any time soon.  Since
they've been using the software for about 20 years now, enough cases
are purgeable under Supreme Court records retention rules that we're
just now getting around to writing purge functions, but you don't even
*want* to know how complex the rules around that are
 
The three month cycle I mentioned was how often we issue a major
release of the application software.  Such a release generally
involves a lot of schema changes, and changes to hundreds of queries,
but no deletion of data.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Jeremy Kerr
Tom,

 The consideration is that the application fails completely on server
 disconnect (because it gets SIGPIPE'd).  This was long ago deemed
 unacceptable, and we aren't likely to change our opinion on that.

OK, understood. I'm guessing MSG_NOSIGNAL on the send() isn't portable 
enough here?

 What disturbs me about your report is the suggestion that there are
 paths through that code that fail to protect against SIGPIPE.  If so,
 we need to fix that.

I just missed the comment that pqsecure_read may end up writing to the 
socket in the SSL case, so looks like all is fine here. We shouldn't see 
a SIGPIPE from the recv() alone.

Cheers,


Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk
* Markus Wanner mar...@bluegap.ch [090602 09:37]:

 Marko already convinced me of -kk, I'm trying that with cvs2git.

Good ;-)

 I just really didn't want to care about keyword expansion. Besides  
 lacking consistency, it's one of the worst misfeatures of CVS, IMNSHO.  
 ;-)

Absolutely...  And one of the reasons I've had -kk in my .cvsrc for
years, even before I started with git.

 I'll let you know how cvs2git behaves WRT -kk.

Cool..

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


[HACKERS] pg_migrator and making columns invisible

2009-06-02 Thread Bruce Momjian
pg_migrator requies tables using tsvector data types to be rebuilt, and
there has been discussion of how to prevent people from accessing those
columns before they are rebuilt.  We discussed renaming the tables
(affects all columns) or columns, using rules (not fine-grained enough),
or using column permissions (doesn't affect super-users).

My new idea is to mark the column as dropped and unmark them before
rebuilding the table.  That might be the best I can do.  Comments?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Robert Haas

On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote:



On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote:

What I'd find strange about 6.67 rows in your example is more  
that on
the estimated rows side, it seems to imply an unrealistically  
precise estimate
in the same way that 667 rows would seem unrealistically precise  
to me.

Maybe rounding to 2 significant digits would reduce confusion?


You're right that the number of significant digits already exceeds the
true accuracy of the computation. I think what Robert wants to see is
the exact value used in the calc, so the estimates can be checked more
thoroughly than is currently possible.


Bingo.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator and making columns invisible

2009-06-02 Thread Bruce Momjian
Bruce Momjian wrote:
 pg_migrator requies tables using tsvector data types to be rebuilt, and
 there has been discussion of how to prevent people from accessing those
 columns before they are rebuilt.  We discussed renaming the tables
 (affects all columns) or columns, using rules (not fine-grained enough),
 or using column permissions (doesn't affect super-users).
 
 My new idea is to mark the column as dropped and unmark them before
 rebuilding the table.  That might be the best I can do.  Comments?

FYI, one big problem with this is that if they rebuild the table before
dropping the columns the data is lost.  It seems leaving the data around
as invalid might be safer.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeremy Kerr j...@ozlabs.org writes:
   The following patch changes psecure_write to be more like psecure_read -
   it only alters the signal mask if the connection is over SSL. It's only
   an RFC, as I'm not entirely sure about the reasoning behind blocking
   SIGPIPE for the non-SSL case - there may be other considerations here.


 The consideration is that the application fails completely on server
  disconnect (because it gets SIGPIPE'd).  This was long ago deemed
  unacceptable, and we aren't likely to change our opinion on that.

  What disturbs me about your report is the suggestion that there are
  paths through that code that fail to protect against SIGPIPE.  If so,
  we need to fix that.

Slightly OT, but why are we not using MSG_NOSIGNAL / SO_NOSIGPIPE
on OS'es that support them?  I guess significant portion of userbase
has at least one of them available...

Thus avoiding 2 syscalls per operation plus potential locking issues.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Tom Lane
Jeremy Kerr j...@ozlabs.org writes:
 The consideration is that the application fails completely on server
 disconnect (because it gets SIGPIPE'd).  This was long ago deemed
 unacceptable, and we aren't likely to change our opinion on that.

 OK, understood. I'm guessing MSG_NOSIGNAL on the send() isn't portable 
 enough here?

Well, it's certainly not 100% portable, but I wouldn't object to a patch
that tests for it and uses it where it works.

One question that might be a bit hard to answer is whether mere
existence of the #define is sufficient evidence that the feature works.
We've had problems before with userland headers not being in sync
with what the kernel knows.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Jeremy Kerr j...@ozlabs.org writes:

  The consideration is that the application fails completely on server
   disconnect (because it gets SIGPIPE'd).  This was long ago deemed
   unacceptable, and we aren't likely to change our opinion on that.

   OK, understood. I'm guessing MSG_NOSIGNAL on the send() isn't portable
   enough here?


 Well, it's certainly not 100% portable, but I wouldn't object to a patch
  that tests for it and uses it where it works.

  One question that might be a bit hard to answer is whether mere
  existence of the #define is sufficient evidence that the feature works.
  We've had problems before with userland headers not being in sync
  with what the kernel knows.

Well, we could just test in configure perhaps?  Runtime test is also
possible (if kernel gives error on unknown flag).  Safest would
be enable on known-good OSes, maybe with version check?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Tom Lane
Marko Kreen mark...@gmail.com writes:
 On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 We've had problems before with userland headers not being in sync
 with what the kernel knows.

 Well, we could just test in configure perhaps?

The single most common way to get into that kind of trouble is to
compile on machine A then install the executables on machine B with
a different kernel.  So a configure test wouldn't give me any warm
feeling at all.

A feature that is exercised via setsockopt is probably fairly safe,
since you can check for failure of the setsockopt call and then do
it the old way.  MSG_NOSIGNAL is a recv() flag, no?  The question
is whether you could expect that the recv() would fail if it had
any unrecognized flags.  Not sure if I trust that.  SO_NOSIGPIPE
seems safer.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Marko Kreen mark...@gmail.com:

Not a problem for git I think


Knowing that git doesn't track files as hard as monotone, I  
certainly doubt that.



- it assumes they are same if they have
same contents...


Why do you assume they have the same contents? Obviously these are  
different branches, where files can (and will!) have different contents.



Well, such behaviour may be a feature for some repo with complex CVS
usage, but currently we should aim for simple and clear conversion.


First of all, we should aim for a correct one.


The question is - do such merges make any sense to human looking at
history - and the answer is no, as no VCS level merge was happening,
just some copying around (if your description is correct).  And
we don't need to add noise for the benefit of GIT as it works fine
without any fake merges.


For low expectations of it works, maybe yes. However if you don't  
tell git, it has no chance of knowing that two (different) files  
should actually be the same.


Try the following:

 git init
 echo base  basefile
 git add basefile
 git commit -m base commit
 git checkout -b branch
 echo hello, world  testfile
 git add testfile
 git commit testfile -m addition on branch
 git checkout master
 echo hello world  testfile
 git add testfile
 git commit testfile -m addition on master

 # here we are a similar point like after a lacking conversion, having two
 # distinct, i.e. historically independent files called testfile

 git mv testfile movedfile
 git commit -m file moved
 git checkout branch
 git merge master
 ls

 # Bang, you suddenly have 'testfile' and 'movedfile', go figure!


I leave it as an exercise for the reader to try the same with a single  
historic origin of the file, as cvs2git does the conversion.



Our target should be each branch having simple linear history,
without any fake merges.  This will result in minimal confusion
to both humans looking history and also GIT itself.


I don't consider the above a minimal confusion. And concerning  
humans... you get used to merge commits pretty quickly. I for one am  
more confused by a linear history which in fact is not.


As mentioned before, I'd personally favor *all* of the back-ports to  
actually be merges of some sort, because that's what they effectively  
are. However, that also bring up the question of how we are going to  
do back-patches in the future with git.



So please turn the merge logic off.  If this cannot be turned off,
cvs2git is not usable for conversion.


As far as I know, it cannot be turned off. Use parsecvs if you want to  
get silly side effects later on in history. ;-)



Seems it contains more complex logic to handle more complex CVS usage
cases, but seems like overkill for us if it creates a mess of history.


You consider it a mess, I consider it a better and more valid  
representation of the mess that CVS is.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator and making columns invisible

2009-06-02 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 pg_migrator requies tables using tsvector data types to be rebuilt, and
 there has been discussion of how to prevent people from accessing those
 columns before they are rebuilt.  We discussed renaming the tables
 (affects all columns) or columns, using rules (not fine-grained enough),
 or using column permissions (doesn't affect super-users).

 My new idea is to mark the column as dropped and unmark them before
 rebuilding the table.  That might be the best I can do.  Comments?

You're expending a lot of work on solving the wrong problem.  The right
solution is a temporary data type.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Jeremy Kerr
Tom,

 A feature that is exercised via setsockopt is probably fairly safe,
 since you can check for failure of the setsockopt call and then do
 it the old way.  MSG_NOSIGNAL is a recv() flag, no?

It's a flag to send().

 The question is whether you could expect that the recv() would fail if
 it had any unrecognized flags.  Not sure if I trust that. SO_NOSIGPIPE
 seems safer.

Yep, a once-off test would be better. However, I don't seem to have a 
NOSIGPIPE sockopt here :(

Cheers,


Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com wrote:
 You're right that the number of significant digits already exceeds the
 true accuracy of the computation. I think what Robert wants to see is
 the exact value used in the calc, so the estimates can be checked more
 thoroughly than is currently possible.

 Bingo.

Uh, the planner's estimate *is* an integer.  What was under discussion
(I thought) was showing some fractional digits in the case where EXPLAIN
ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer.  In that
case the measured value can be considered arbitrarily precise --- though
I think in practice one or two fractional digits would be plenty.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Marko Kreen mark...@gmail.com writes:
   On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:

  We've had problems before with userland headers not being in sync
   with what the kernel knows.

   Well, we could just test in configure perhaps?


 The single most common way to get into that kind of trouble is to
  compile on machine A then install the executables on machine B with
  a different kernel.  So a configure test wouldn't give me any warm
  feeling at all.

Agreed.  Another problem would be cross-compilation.

  A feature that is exercised via setsockopt is probably fairly safe,
  since you can check for failure of the setsockopt call and then do
  it the old way.  MSG_NOSIGNAL is a recv() flag, no?  The question
  is whether you could expect that the recv() would fail if it had
  any unrecognized flags.  Not sure if I trust that.  SO_NOSIGPIPE
  seems safer.

send().  The question is if the kernel would give error (good)
or simply ignore it (bad).  I guess with MSG_NOSIGNAL only safe
way is to hardcode working OSes.

Are there any OS-es that have MSG_NOSIGNAL but not SO_NOSIGPIPE?

*grep*  Eh, seems like Linux is such OS...  But I also see it existing
as of Linux 2.2.0 in working state, so should be safe to use on linux
despite the kernel version.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC,PATCH] SIGPIPE masking in local socket connections

2009-06-02 Thread Tom Lane
Jeremy Kerr j...@ozlabs.org writes:
 MSG_NOSIGNAL is a recv() flag, no?

 It's a flag to send().

Doh, need more caffeine.

 The question is whether you could expect that the recv() would fail if
 it had any unrecognized flags.  Not sure if I trust that. SO_NOSIGPIPE
 seems safer.

 Yep, a once-off test would be better. However, I don't seem to have a 
 NOSIGPIPE sockopt here :(

On OS X I see SO_NOSIGPIPE but not MSG_NOSIGNAL.  Seems like we might
have to support both if we want this to work as widely as possible.

The SUS man page for send() does explicitly specify an error code for
unrecognized flags bits, so maybe it's safe to assume that we'll get
an error if we set MSG_NOSIGNAL but the kernel doesn't recognize it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Aidan Van Dyk
* Markus Wanner mar...@bluegap.ch [090602 10:23]:

  # Bang, you suddenly have 'testfile' and 'movedfile', go figure!

 I leave it as an exercise for the reader to try the same with a single  
 historic origin of the file, as cvs2git does the conversion.

Sure, and we can all construct example where that move is both right and
wrong...  But the point is that in PostgreSQL, (and that may be mainly
because we're using CVS), merges *aren't* something that happens.
Patches are written against HEAD (master) and then back-patched...

If you want to turn PostgreSQL devellopment on it's head, then we can
switch this around, so that patches are always done on the oldest
branch, and fixes always merged forward...

I'm not going to be the one that pushes that though ;-)

 I don't consider the above a minimal confusion. And concerning  
 humans... you get used to merge commits pretty quickly. I for one am  
 more confused by a linear history which in fact is not.

But the fact is, everyone using CVS wants a linear history. All
they care about is cvs update...wait...cvs update ... time ... cvs
update .. Everything *was* linear to them.  Any merge type things
certaily wasn't intentional in CVS...

 As mentioned before, I'd personally favor *all* of the back-ports to  
 actually be merges of some sort, because that's what they effectively  
 are. However, that also bring up the question of how we are going to do 
 back-patches in the future with git.

Well, if people get comfortable with it, I expect that backports don't
happenen.. Bugs are fixed where they happen, and merged forward into
all affected later development based on the bugged area.

 As far as I know, it cannot be turned off. Use parsecvs if you want to  
 get silly side effects later on in history. ;-)

Ya, that's one of the reasons I considered parsecvs the leading
candidate...  And why I went thouth, and showed that with the exception
of the one REL_8_0_0 tip, it *was* and exact copy of the current CVS
repository (minus the 1 messed up tag in the repository).

 You consider it a mess, I consider it a better and more valid  
 representation of the mess that CVS is.

So much better that it makes the history as useless as CVS... I think
one of the reasons people are wanting tomove from CVS to git is that it
makes things *better*...  The exact history will *always* be
available, right in CVS if people need it.  I thin the goal is to make
the git history as close to CVS as possible, such that it's useful.  I
mean, if we want it to be a more valid representation, then really, we
should be doing every file change in a single commit, and merging that
file commit into the branch *every* *single* *time*... I don't think
anybody wants our conversion to be that much better and move valid
representation of the mess that CVS is...

It's a balance...  We're moving because we want *better* tools and
access, not the same mess that CVS is.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] explain analyze rows=%.0f

2009-06-02 Thread Robert Haas



...Robert

On Jun 2, 2009, at 10:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:


Robert Haas robertmh...@gmail.com writes:
On Jun 2, 2009, at 9:41 AM, Simon Riggs si...@2ndquadrant.com  
wrote:
You're right that the number of significant digits already exceeds  
the
true accuracy of the computation. I think what Robert wants to see  
is
the exact value used in the calc, so the estimates can be checked  
more

thoroughly than is currently possible.



Bingo.


Uh, the planner's estimate *is* an integer.  What was under discussion
(I thought) was showing some fractional digits in the case where  
EXPLAIN

ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer.  In that
case the measured value can be considered arbitrarily precise ---  
though

I think in practice one or two fractional digits would be plenty.


We're in violent agreement here.

...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Greg Stark
On Tue, Jun 2, 2009 at 2:44 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 Even in your environment I could easily imagine, say, a monthly job
 to
 delete all records older than 3 months. That job could take hours or
 even days. It would be pretty awful for it to end up needing to be
 retried. All I'm saying is that if you establish a policy -- perhaps
 enforced using views -- that no queries are allowed to access
 records
 older than 3 months you shouldn't have to worry that you'll get a
 spurious serialization failure working with those records.

 You have totally lost me.  We have next to nothing which can be
 deleted after three months.  We have next to nothing which we get to
 decide is deletable.

That's reassuring for a courts system.

But i said I could easily imagine. The point was that even in a big
complex system with thousands of queries being constantly modified by
hundreds of people, it's possible there might be some baseline rules.
Those rules can even be enforced using tools like views. So it's not
true that no programmer could ever expect that they've written their
code to ensure there's no risk of serialization failures.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Alvaro Herrera
Aidan Van Dyk escribió:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 
   # Bang, you suddenly have 'testfile' and 'movedfile', go figure!
 
  I leave it as an exercise for the reader to try the same with a single  
  historic origin of the file, as cvs2git does the conversion.
 
 Sure, and we can all construct example where that move is both right and
 wrong...  But the point is that in PostgreSQL, (and that may be mainly
 because we're using CVS), merges *aren't* something that happens.
 Patches are written against HEAD (master) and then back-patched...
 
 If you want to turn PostgreSQL devellopment on it's head, then we can
 switch this around, so that patches are always done on the oldest
 branch, and fixes always merged forward...

The Monotone folk call this daggy fixes and it seems a clean way to
handle things.

http://www.monotone.ca/wiki/DaggyFixes/

However,

 I'm not going to be the one that pushes that though ;-)

I'm not either.  Maybe someday we'll be familiar enough with the tools
to make things this way, but I think just after the migration we'll
mainly want to be able to press on with development and not waste too
much time learning the new toys.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Tom Lane
Aidan Van Dyk ai...@highrise.ca writes:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 You consider it a mess, I consider it a better and more valid  
 representation of the mess that CVS is.

 So much better that it makes the history as useless as CVS... I think
 one of the reasons people are wanting tomove from CVS to git is that it
 makes things *better*...

FWIW, the tool that I customarily use (cvs2cl) considers commits on
different branches to be the same if they have the same commit message
and occur sufficiently close together (within a few minutes).  My
committing habits have been designed around that behavior for years,
and I believe other PG committers have been doing likewise.

I would consider a git conversion to be less useful to me, not more,
if it insists on showing me such cases as separate commits --- and if
it then adds useless merge messages on top of that, I'd start to get
seriously annoyed.

What we want here is a readable equivalent of the CVS history, not
necessarily something that is theoretically an exact equivalent.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Win32 link() function

2009-06-02 Thread Bruce Momjian
bruce wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Tom Lane wrote:
   (Come to think of it, --link can fail on Unix too, if the user tries to
   put the new database on a different filesystem.  Have you got guards in
   there to make sure this is discovered before the point of no return?)
  
   Of course:
   ...
   though you have to delete the new cluster directory and remove the _old
   suffixes to get your old cluster back.
  
  That wasn't what I had in mind by before the point of no return.
  You should be making some effort to detect obvious failure cases
  *before* the user has to do a lot of error-prone manual cleanup.
 
 That is something I will address during beta as I get problem reports.

I have implemented your suggestion:

Stopping postmaster servicing old cluster   ok
Starting postmaster to service new cluster
  waiting for postmaster to start   ok
Stopping postmaster servicing new cluster   ok

Could not create hard link between old and new data directories: 
Cross-device link
In link mode the old and new data directories must be on the same file
system volume.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_migrator and making columns invisible

2009-06-02 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  pg_migrator requies tables using tsvector data types to be rebuilt, and
  there has been discussion of how to prevent people from accessing those
  columns before they are rebuilt.  We discussed renaming the tables
  (affects all columns) or columns, using rules (not fine-grained enough),
  or using column permissions (doesn't affect super-users).
 
  My new idea is to mark the column as dropped and unmark them before
  rebuilding the table.  That might be the best I can do.  Comments?
 
 You're expending a lot of work on solving the wrong problem.  The right
 solution is a temporary data type.

How do cleanup all references to the tsvector data type?  I am afraid
there will be tons of tsvector references outside tables that I can't
clean up.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Aidan Van Dyk ai...@highrise.ca:

Sure, and we can all construct example where that move is both right and
wrong...


Huh? The problem is the file duplication. The move is an action of a  
committer - it's neither right nor wrong in this example.


I cannot see any use case for seemingly random files poping up out of  
nowhere, just because git doesn't know how to merge two files after a  
mv and a merge.



But the point is that in PostgreSQL, (and that may be mainly
because we're using CVS), merges *aren't* something that happens.
Patches are written against HEAD (master) and then back-patched...


..which can (and better is) represented as a merge in git (for the  
sake of comfortable automated merging).



If you want to turn PostgreSQL devellopment on it's head, then we can
switch this around, so that patches are always done on the oldest
branch, and fixes always merged forward...


I'd consider that good use of tools, yes. However, I realize that this  
probably is pipe-dreaming...



But the fact is, everyone using CVS wants a linear history. All
they care about is cvs update...wait...cvs update ... time ... cvs
update .. Everything *was* linear to them.  Any merge type things
certaily wasn't intentional in CVS...


..no, it just wasn't possible in CVS. Switching to git, people soon  
want merge type things. Heck, it's probably *the* reason for  
switching to git.



So much better that it makes the history as useless as CVS... I think
one of the reasons people are wanting tomove from CVS to git is that it
makes things *better*...


Yes, especially merging. Please don't cripple that ability just  
because CVS once upon a time enforced a linear history.



The exact history will *always* be
available, right in CVS if people need it.


Agreed. Please note that I mostly talk about a more correct  
representation *of history*, as it happened. This has nothing to do  
with single commits per file.



It's a balance...  We're moving because we want *better* tools and
access, not the same mess that CVS is.


Agreed. And please cut as many of its burdens of the past, like  
linearity. History is not linear and has never been. But I'm stopping  
now before getting overly philosophic...


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Greg Stark
On Tue, Jun 2, 2009 at 4:02 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:


 The Monotone folk call this daggy fixes and it seems a clean way to
 handle things.

 http://www.monotone.ca/wiki/DaggyFixes/

Is this like what git calls an octopus? I've been wondering what the
point of such things were.

Or maybe not. I thought an octopus was two patches with the same
parent -- ie, two patches that could independently be applied in any
order.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] faster version of AllocSetFreeIndex for x86 architecture

2009-06-02 Thread Atsushi Ogawa

Hi,
I made a faster version of AllocSetFreeIndex for x86 architecture.

Attached files are benchmark programs and patch file.

 alloc_test.pl: benchmark script
 alloc_test.c: benchmark program
 aset_free_index.patch: patch for util/mmgr/aset.c

This benchmark compares the original function with a faster version.
To try the benchmark, only execute alloc_test.pl. This script compiles
alloc_test.c and execute the benchmark.

Results of benchmark script:
Xeon(Core architecture), RedHat EL4, gcc 3.4.6
 bytes   : 4 8163264   128   256   512  1024   mix
 original: 0.780 0.780 0.820 0.870 0.930 0.970 1.030 1.080 1.130 0.950
 patched : 0.380 0.170 0.170 0.170 0.170 0.180 0.170 0.180 0.180 0.280

Core2, Windows XP, gcc 3.4.4 (cygwin)
 bytes   : 4 8163264   128   256   512  1024   mix
 original: 0.249 0.249 0.515 0.452 0.577 0.671 0.796 0.890 0.999 1.577
 patched : 0.358 0.218 0.202 0.218 0.218 0.218 0.202 0.218 0.218 0.218

Xeon(Pentium4 architecture), RedHal EL4, gcc 3.4.6
 bytes   : 4 8163264   128   256   512  1024   mix
 original: 0.510 0.520 0.620 0.860 0.970 1.260 1.150 1.220 1.290 0.860
 patched : 0.620 0.530 0.530 0.540 0.540 0.530 0.540 0.530 0.530 0.490

The effect of the patch that I measured by oprofile is:
- test program: pgbench -c 1 -t 5 (fsync=off)

original:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events
with a unit mask of 0x01 (mandatory) count 10
samples  %symbol name
66854 6.6725  AllocSetAlloc
47679 4.7587  base_yyparse
29058 2.9002  hash_search_with_hash_value
22053 2.2011  SearchCatCache
19264 1.9227  MemoryContextAllocZeroAligned
16223 1.6192  base_yylex
13819 1.3792  ScanKeywordLookup
13305 1.3279  expression_tree_walker
12144 1.2121  LWLockAcquire
11850 1.1827  XLogInsert
11817 1.1794  AllocSetFree

patched:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.55 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events
with a unit mask of 0x01 (mandatory) count 10
samples  %symbol name
47610 4.9333  AllocSetAlloc
47441 4.9158  base_yyparse
28243 2.9265  hash_search_with_hash_value
22197 2.3000  SearchCatCache
18984 1.9671  MemoryContextAllocZeroAligned
15747 1.6317  base_yylex
13368 1.3852  ScanKeywordLookup
12889 1.3356  expression_tree_walker
12092 1.2530  LWLockAcquire
12078 1.2515  XLogInsert
(skip)
6248  0.6474  AllocSetFree

I think this patch improves AllocSetAlloc/AllocSetFree performance.

Best regards,

---
Atsushi Ogawa
a_og...@hi-ho.ne.jp




#!/usr/bin/perl

system gcc -O2 -o alloc_test alloc_test.c;

my @test_bytes = (4,8,16,32,64,128,256,512,1024,
'8 16 28 36 12 4 8 64 1024 8 24 12 8 64 16');
my $cnt = 1000;

my @old_result;
my @new_result;
my $t0, $t1, $e;

foreach $e (@test_bytes) {
$t0 = (times)[2];
system ./alloc_test old $cnt $e;
push @old_result, (times)[2] - $t0;

$t0 = (times)[2];
system ./alloc_test new $cnt $e;
push @new_result, (times)[2] - $t0;
}

print  bytes   : ;
foreach $e (@test_bytes) {
$e = 'mix' if($e =~ /\d+ \d+/);
printf(%5s , $e);
}
print \n;

print  original: ;
foreach $e (@old_result) { printf(%.3f , $e); }
print \n;

print  patched : ;
foreach $e (@new_result) { printf(%.3f , $e); }
print \n;

#include stdio.h

#define Assert(condition)

#define ALLOC_MINBITS 3
#define ALLOCSET_NUM_FREELISTS 11
typedef size_t Size;

/*
 * faster version of AllocSetFreeIndex for x86 architecure.
 * this function runs in O(1).
 */
static inline int
AllocSetFreeIndex_new(Size size)
{
int idx;

if (__builtin_expect(size  (1  ALLOC_MINBITS), 0))
size = (1  ALLOC_MINBITS);

/* bsr(Bit Scan Reverse): Search the most significant set bit */
__asm__ (bsr %1, %0 :=r(idx) :g(size - 1));

return idx - (ALLOC_MINBITS - 1);
}

static inline int
AllocSetFreeIndex(Size size)
{
int idx = 0;

if (size  0)
{
size = (size - 1)  ALLOC_MINBITS;
while (size != 0)
{
idx++;
size = 1;
}
Assert(idx  ALLOCSET_NUM_FREELISTS);
}

return idx;
}

int main(int argc, char *argv[])
{
int loop_cnt;
int size[16];
int i, j;
int result = 0;

if(argc  4) {
fprintf(stderr, usage: asettest (new|old) loop_cnt size...\n);
return 1;
}

loop_cnt = atoi(argv[2]);

for(i = 0; i  16; i++) {
if(argc = i + 3) {
size[i] = size[0];
} else {
size[i] = atoi(argv[i + 3]);
}
}

if(strcmp(argv[1], new) == 0) {
for(i = 0; i  loop_cnt; i++) {
 

Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Aidan Van Dyk ai...@highrise.ca writes:
   * Markus Wanner mar...@bluegap.ch [090602 10:23]:

  You consider it a mess, I consider it a better and more valid
   representation of the mess that CVS is.

   So much better that it makes the history as useless as CVS... I think
   one of the reasons people are wanting tomove from CVS to git is that it
   makes things *better*...


 FWIW, the tool that I customarily use (cvs2cl) considers commits on
  different branches to be the same if they have the same commit message
  and occur sufficiently close together (within a few minutes).  My
  committing habits have been designed around that behavior for years,
  and I believe other PG committers have been doing likewise.

  I would consider a git conversion to be less useful to me, not more,
  if it insists on showing me such cases as separate commits --- and if
  it then adds useless merge messages on top of that, I'd start to get
  seriously annoyed.

They cannot be same commits in GIT as the resulting tree is different.
You could tie them with some sort of merge commits, but doubt the
result would be worth the noise.

Also I doubt there is tool grokking such commits anyway, the merge
discussion above was for full files with exact contents appearing
in several branches.

  What we want here is a readable equivalent of the CVS history, not
  necessarily something that is theoretically an exact equivalent.

I suggest setting the goal to be simple and clear representation
of CVS history that we can make sense later, instead of revising
CVS history to look like we used some better VCS system...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
 [academic nitpicking]

Sorry, not going there.  Just look at the state of VCS systems
that have prioritized academic issues insead of practicality...
(arch/darcs/monotone/etc..)

  So please turn the merge logic off.  If this cannot be turned off,
  cvs2git is not usable for conversion.
 

  As far as I know, it cannot be turned off. Use parsecvs if you want to get
 silly side effects later on in history. ;-)

--no-cross-branch-commits seems sort of that direction?

And what silly side effects are you talking about?  I see only cvs2git
doing silly things...

(I'm talking about only in context of Postgres CVS repo, not in general.)

  Seems it contains more complex logic to handle more complex CVS usage
  cases, but seems like overkill for us if it creates a mess of history.
 

  You consider it a mess, I consider it a better and more valid
 representation of the mess that CVS is.

Note that merge is no file-level but tree level.  Also note we don't
use branches for feature developement but for major version maintenance.

So how can single file appearing in 2 branches means merge of 2 trees?
How can that be valid?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] faster version of AllocSetFreeIndex for x86 architecture

2009-06-02 Thread Jeremy Kerr
Hi,

 I made a faster version of AllocSetFreeIndex for x86 architecture.

Neat, I have a version for PowerPC too.

In order to prevent writing multiple copies of AllocSetFreeIndex, I 
propose that we add a fls() function (find last set); this can be 
defined in an architecture-independent manner (ie, shift mask  test in 
a loop), and re-defined for arches that have faster ways of doing the 
same (ie, cntlz instruction on powerpc).

We can then change AllocSetFreeIndex to use fls().

Patches coming...



Jeremy

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Managing multiple branches in git

2009-06-02 Thread Tom Lane
[ it's way past time for a new subject thread ]

Marko Kreen mark...@gmail.com writes:
 They cannot be same commits in GIT as the resulting tree is different.

This brings up something that I've been wondering about: my limited
exposure to git hasn't shown me any sane way to work with multiple
release branches.

The way that I have things set up for CVS is that I have a checkout
of HEAD, and also sticky checkouts of the back branches:
pgsql/ ...
REL8_3/pgsql/ ... (made with -r REL8_3_STABLE)
REL8_2/pgsql/ ...
etc

Each of these is configured (using --prefix) to install into a separate
installation tree.  So I can switch my attention to one branch or
another by cd'ing to the right place and adjusting a few environment
variables such as PATH and PGDATA.

The way I prepare a patch that has to be back-patched is first to make
and test the fix in HEAD.  Then apply it (using diff/patch and perhaps
manual adjustments) to the first back branch, and test that.  Repeat for
each back branch as far as I want to go.  Almost always, there is a
certain amount of manual adjustment involved due to renamings,
historical changes of pgindent rules, etc.  Once I have all the versions
tested, I prepare a commit message and commit all the branches.  This
results in one commit message per branch in the pgsql-committers
archives, and just one commit in the cvs2cl representation of the
history --- which is what I want.

I don't see any even-approximately-sane way to handle similar cases
in git.  From what I've learned so far, you can have one checkout
at a time in a git working tree, which would mean N copies of the
entire repository if I want N working trees.  Not to mention the
impossibility of getting it to regard parallel commits as related
in any way whatsoever.

So how is this normally done with git?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 1/2] Add bit operations util header

2009-06-02 Thread Jeremy Kerr
Add a utility header for simple bit operatios - bitops.h.

At present, just contains the fls() (find last set bit) function.

Signed-off-by: Jeremy Kerr j...@ozlabs.org

---
 src/include/utils/bitops.h |   52 +
 1 file changed, 52 insertions(+)

diff --git a/src/include/utils/bitops.h b/src/include/utils/bitops.h
new file mode 100644
index 000..de11624
--- /dev/null
+++ b/src/include/utils/bitops.h
@@ -0,0 +1,52 @@
+/*-
+ *
+ * bitops.h
+ *   Simple bit operations.
+ *
+ * Portions Copyright (c) 2009, PostgreSQL Global Development Group
+ *
+ * $PostgreSQL$
+ *
+ *-
+ */
+#ifndef BITOPS_H
+#define BITOPS_H
+
+#if defined(__ppc__) || defined(__powerpc__) || \
+   defined(__ppc64__) || defined (__powerpc64__)
+
+static inline int
+fls(unsigned int x)
+{
+   int lz;
+   asm(cntlz %0,%1 : =r (lz) : r (x));
+   return 32 - lz;
+}
+
+#else /* !powerpc */
+
+/* Architecture-independent implementations */
+
+/*
+ * fls: find last set bit.
+ *
+ * Returns the 1-based index of the most-significant bit in x. The MSB
+ * is bit number 32, the LSB is bit number 1. If x is zero, returns zero.
+ */
+static inline int
+fls(unsigned int x)
+{
+   int ls = 0;
+
+   while (x != 0)
+   {
+   ls++;
+   x = 1;
+   }
+
+   return ls;
+}
+
+#endif
+
+#endif /* BITOPS_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH 2/2] Use fls() to find chunk set

2009-06-02 Thread Jeremy Kerr
Results in a ~2% performance increase by using the powerpc fls()
implementation.

Signed-off-by: Jeremy Kerr j...@ozlabs.org

---
 src/backend/utils/mmgr/aset.c |8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/mmgr/aset.c b/src/backend/utils/mmgr/aset.c
index 0e2d4d5..762cf72 100644
--- a/src/backend/utils/mmgr/aset.c
+++ b/src/backend/utils/mmgr/aset.c
@@ -65,6 +65,7 @@
 #include postgres.h
 
 #include utils/memutils.h
+#include utils/bitops.h
 
 /* Define this to detail debug alloc information */
 /* #define HAVE_ALLOCINFO */
@@ -270,12 +271,7 @@ AllocSetFreeIndex(Size size)
 
if (size  0)
{
-   size = (size - 1)  ALLOC_MINBITS;
-   while (size != 0)
-   {
-   idx++;
-   size = 1;
-   }
+   idx = fls((size - 1)  ALLOC_MINBITS);
Assert(idx  ALLOCSET_NUM_FREELISTS);
}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Robert Haas
On Tue, Jun 2, 2009 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Aidan Van Dyk ai...@highrise.ca writes:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 You consider it a mess, I consider it a better and more valid
 representation of the mess that CVS is.

 So much better that it makes the history as useless as CVS... I think
 one of the reasons people are wanting tomove from CVS to git is that it
 makes things *better*...

 FWIW, the tool that I customarily use (cvs2cl) considers commits on
 different branches to be the same if they have the same commit message
 and occur sufficiently close together (within a few minutes).  My
 committing habits have been designed around that behavior for years,
 and I believe other PG committers have been doing likewise.

Interesting.  I was wondering why all your commit messages always show
up simultaneously for all the back branches.

 I would consider a git conversion to be less useful to me, not more,
 if it insists on showing me such cases as separate commits --- and if
 it then adds useless merge messages on top of that, I'd start to get
 seriously annoyed.

There's no help for them being separate commits, but I agree that
useless merge commits are a bad thing.  There are plenty of ways to
avoid that, though; I've been using git cherry-pick a lot recently,
and I think git rebase --onto also has some potential.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Tom Lane t...@sss.pgh.pa.us:

FWIW, the tool that I customarily use (cvs2cl) considers commits on
different branches to be the same if they have the same commit message
and occur sufficiently close together (within a few minutes).  My
committing habits have been designed around that behavior for years,
and I believe other PG committers have been doing likewise.


Yeah, that's how I see things as well.


I would consider a git conversion to be less useful to me, not more,
if it insists on showing me such cases as separate commits --- and if
it then adds useless merge messages on top of that, I'd start to get
seriously annoyed.


Hm.. well, in git, there's no such thing as a commit that spans  
multiple branches. So it's impossible to fulfill both of your wishes  
here.


parsecvs creates multiple independent commits in such a case.

cvs2git creates a single commit and propagates this to the back  
branches with merge commits (however, only if new files are added,  
otherwise it does the same as parsecvs).



What we want here is a readable equivalent of the CVS history, not
necessarily something that is theoretically an exact equivalent.


Understood. However, readability depends on the user's habits. But  
failing to merge due to a lacking conversion potentially hurts  
everybody who wants to merge.


Having used merging (in combination with renaming) often enough, I'd  
certainly be pretty annoyed if merges suddenly begin to bring up  
spurious file duplicates.


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread David E. Wheeler

On Jun 2, 2009, at 8:43 AM, Tom Lane wrote:

Each of these is configured (using --prefix) to install into a  
separate

installation tree.  So I can switch my attention to one branch or
another by cd'ing to the right place and adjusting a few environment
variables such as PATH and PGDATA.


Yeah, with git, rather than cd'ing to another directory, you'd just do  
`git checkout rel8_3` and work from the same directory.



So how is this normally done with git?


For better or for worse, because git is project-oriented rather than  
filesystem-oriented, you can't commit to all the branches at once. You  
have to commit to each one independently. You can push them all back  
to the canonical repository at once, and the canonical repository's  
commit hooks can trigger for all of the commits at once (or so I  
gather from getting emails from GitHub with a bunch of commits listed  
in a single message), but each commit is still independent.


It has to do with the fundamentally different way in which Git works:  
snapshots of your code rather than different directories.


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] User-facing aspects of serializable transactions

2009-06-02 Thread Kevin Grittner
Greg Stark st...@enterprisedb.com wrote:
 On Tue, Jun 2, 2009 at 2:44 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 
 We have next to nothing which can be deleted after three months.
 
 That's reassuring for a courts system.
 
  :-)
 
 But i said I could easily imagine. The point was that even in a
 big complex system with thousands of queries being constantly
 modified by hundreds of people, it's possible there might be some
 baseline rules.  Those rules can even be enforced using tools like
 views. So it's not true that no programmer could ever expect that
 they've written their code to ensure there's no risk of
 serialization failures.
 
Now I see what you're getting at.
 
I think we've beat this horse to death and then some.
 
Recap:
 
(1)  There is abstract, conceptual agreement that support for
serializable transactions would be A Good Thing.
 
(2)  There is doubt that an acceptably performant implementation is
possible in PostgreSQL.
 
(3)  Some, but not all, don't want to see an implementation which
produces false positive serialization faults with some causes, but
will accept them for other causes.
 
(4)  Nobody believes that an implementation with acceptable
performance is possible without the disputed false positives mentioned
in (3).
 
(5)  There is particular concern about how to handle repeated
rollbacks gracefully if we use the non-blocking technique.
 
(6)  There is particular concern about how to protect long-running
transactions from rollback.  (I'm not sure those concerns are confined
to the new technique.)
 
(7)  Some, but not all, feel that it would be beneficial to have a
correct implementation (no false negatives) even if it had significant
false positives, as it would allow iterative refinement of the locking
techniques.
 
(8)  One or two people feel that there would be benefit to an
implementation which reduces the false negatives, even if it doesn't
eliminate them entirely.  (Especially if this could be a step toward a
full implementation.)
 
Are any of those observations in dispute?
 
What did I miss?
 
Where do we go from here?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 Yeah, with git, rather than cd'ing to another directory, you'd just do  
 `git checkout rel8_3` and work from the same directory.

That's what I'd gathered, and frankly it is not an acceptable answer.
Sure, the checkout operation is remarkably fast, but it does nothing
for derived files.  What would really be involved here (if I wanted to
be sure of having a non-broken build) is
make maintainer-clean
git checkout rel8_3
configure
make
which takes long enough that I'll have plenty of time to consider
how much I hate git.  If there isn't a better way proposed, I'm
going to flip back to voting against this conversion.  I need tools
that work for me not against me.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Markus Wanner

Hi,

Quoting Marko Kreen mark...@gmail.com:

Sorry, not going there.  Just look at the state of VCS systems
that have prioritized academic issues insead of practicality...
(arch/darcs/monotone/etc..)


I already am there. And I don't want to go back, thanks. But my bias  
for monotone certainly shines through, yes ;-)



--no-cross-branch-commits seems sort of that direction?


Yes, that could lead to the same defect. Uhm.. thank you for pointing  
that out, I'm not gonna try it, sorry.



And what silly side effects are you talking about?


I'm talking about spurious file duplicates popping up after a rename  
and a merge, see my example in this thread.



 You consider it a mess, I consider it a better and more valid
representation of the mess that CVS is.


Note that merge is no file-level but tree level.


Depends on your point of view. Each file gets merged pretty  
indivitually, but the result ends up in a single commit, yes.



Also note we don't
use branches for feature developement but for major version maintenance.


So? You think you are never going to merge?


So how can single file appearing in 2 branches means merge of 2 trees?
How can that be valid?


I'm not sure what you are questioning here.

I find it perfectly reasonable to build something on top of  
REL8_3_STABLE and later on wanting to merge to REL8_4_STABLE. And I  
don't want to manually merge my changes, just because of a rename in  
8.4 and a bad decision during the migration to git.


(And no, I don't think any of the other git tools will help with this,  
due to the academic-nitpick-reasons above).


Regards

Markus Wanner

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 1/2] Add bit operations util header

2009-06-02 Thread Tom Lane
Jeremy Kerr j...@ozlabs.org writes:
 Add a utility header for simple bit operatios - bitops.h.

This will fail outright on any non-gcc compiler.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread David E. Wheeler

On Jun 2, 2009, at 9:03 AM, Tom Lane wrote:


David E. Wheeler da...@kineticode.com writes:
Yeah, with git, rather than cd'ing to another directory, you'd just  
do

`git checkout rel8_3` and work from the same directory.


That's what I'd gathered, and frankly it is not an acceptable answer.
Sure, the checkout operation is remarkably fast, but it does nothing
for derived files.  What would really be involved here (if I wanted to
be sure of having a non-broken build) is
make maintainer-clean
git checkout rel8_3
configure
make
which takes long enough that I'll have plenty of time to consider
how much I hate git.  If there isn't a better way proposed, I'm
going to flip back to voting against this conversion.  I need tools
that work for me not against me.


Well, you can have as many clones of a repository as you like. You can  
keep one with master checked out, another with rel8_3, another with  
rel8_2, etc. You'd just have to write a script to keep them in sync  
(shouldn't be too difficult, each just as all the others as an origin  
-- or maybe you have one that's canonical on your system).


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Simon Riggs

On Mon, 2009-06-01 at 14:47 +0900, Fujii Masao wrote:

 pg_standby can use ln command to restore an archived file,
 which might destroy the archived file as follows.
 
 1) pg_standby creates the symlink to the archived file '102'
 2) '102' is applied
 3) the next file '103' doesn't exist and the trigger file is created
 4) '102' is re-fetched
 5) at the end of recovery, the symlink to '102' is rename to '202',
 but it still points '102'
 6) after recovery, '202' is recycled (rename to '208', which still
 points '102')
 7) '208' is written new xlog records over
 -- the archived file '102' comes down!
 
 One simple solution to fix this problem...

err...I don't see *any* problem at all, since pg_standby does not do
step (1) in the way you say and therefore never does step (5). Any links
created are explicitly deleted in all cases at the end of recovery.

General comment on thread: What's going on with all these fixes?
Anybody reading the commit log and/or weekly news is going to get fairly
worried for no reason at all. For that reason I ask for longer
consideration and wider discussion before committing something - it
would certainly avoid lengthy post commit discussion as has occurred
twice recently. I see no reason for such haste on these fixes. If
there's a need for haste, ship them to your customers directly, please
don't scare other people's. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Alvaro Herrera
David E. Wheeler wrote:

 Well, you can have as many clones of a repository as you like. You can  
 keep one with master checked out, another with rel8_3, another with  
 rel8_2, etc. You'd just have to write a script to keep them in sync  
 (shouldn't be too difficult, each just as all the others as an origin -- 
 or maybe you have one that's canonical on your system).

Hmm, but is there a way to create those clones from a single local
database?

(I like the monotone model much better.  This mixing of working copies
and databases as if they were a single thing is silly and uncomfortable
to use.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Marko Kreen
On 6/2/09, Markus Wanner mar...@bluegap.ch wrote:
  Quoting Marko Kreen mark...@gmail.com:
  And what silly side effects are you talking about?
 

  I'm talking about spurious file duplicates popping up after a rename and a
 merge, see my example in this thread.

The example was not actual case from Postgres CVS history,
but hypotetical situation without checking if it already works
with GIT.

  Also note we don't
  use branches for feature developement but for major version maintenance.
 

  So? You think you are never going to merge?


  So how can single file appearing in 2 branches means merge of 2 trees?
  How can that be valid?
 

  I'm not sure what you are questioning here.

  I find it perfectly reasonable to build something on top of REL8_3_STABLE
 and later on wanting to merge to REL8_4_STABLE. And I don't want to manually
 merge my changes, just because of a rename in 8.4 and a bad decision during
 the migration to git.

  (And no, I don't think any of the other git tools will help with this, due
 to the academic-nitpick-reasons above).

Merging between branches with GIT is fine workflow in the future.

But we are currently discussing how to convert CVS history to GIT.
My point is that we should avoid fake merges, to avoid obfuscating
history.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Developer meeting minutes up

2009-06-02 Thread Ron Mayer
Aidan Van Dyk wrote:
 * Markus Wanner mar...@bluegap.ch [090602 10:23]:
 As mentioned before, I'd personally favor *all* of the back-ports to  
 actually be merges of some sort, because that's what they effectively  
 are. However, that also bring up the question of how we are going to do 
 back-patches in the future with git.
 
 Well, if people get comfortable with it, I expect that backports don't
 happenen.. Bugs are fixed where they happen, and merged forward into
 all affected later development based on the bugged area.

I imagine the closest thing to existing practices would be that people
would to use git-cherry-pick -x -n to backport only the commits they
wanted from the current branch into the back branches.

AFAICT, this doesn't record a merge in the GIT history, but looks a lot
like the linear history from CVS - with the exception that the comment
added by -x explicitly refers to the exact commit from the main branch.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Greg Stark

Yeah I was annoyed by the issue with having to reconfigure as well.

There are various tricks you can do though with separate repositories.

You could have the older branch repositories be clones of HEAD branch  
repository so when you push from them the changes just go to that  
repository then you can push all three branches together (not sure if  
you can do it all in one command though)


You can also have the different repositories share data files which I  
think will mean you don't have to pull other people's commits  
repeatedly. (the default is to have local clones use hard links so  
they don't take a lot of space and they're quick to sync anyways.)


There's also an option to make a clone without the full history but  
for local clones they're fast enough to create anyways that there's  
probably no point.



Incidentally I use git-clean -x -d -f instead of make maintainer-clean.

--
Greg


On 2 Jun 2009, at 17:07, David E. Wheeler da...@kineticode.com  
wrote:



On Jun 2, 2009, at 9:03 AM, Tom Lane wrote:


David E. Wheeler da...@kineticode.com writes:
Yeah, with git, rather than cd'ing to another directory, you'd  
just do

`git checkout rel8_3` and work from the same directory.


That's what I'd gathered, and frankly it is not an acceptable answer.
Sure, the checkout operation is remarkably fast, but it does  
nothing
for derived files.  What would really be involved here (if I wanted  
to

be sure of having a non-broken build) is
   make maintainer-clean
   git checkout rel8_3
   configure
   make
which takes long enough that I'll have plenty of time to consider
how much I hate git.  If there isn't a better way proposed, I'm
going to flip back to voting against this conversion.  I need tools
that work for me not against me.


Well, you can have as many clones of a repository as you like. You  
can keep one with master checked out, another with rel8_3, another  
with rel8_2, etc. You'd just have to write a script to keep them in  
sync (shouldn't be too difficult, each just as all the others as an  
origin -- or maybe you have one that's canonical on your system).


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread David E. Wheeler

On Jun 2, 2009, at 9:16 AM, Alvaro Herrera wrote:

Well, you can have as many clones of a repository as you like. You  
can

keep one with master checked out, another with rel8_3, another with
rel8_2, etc. You'd just have to write a script to keep them in sync
(shouldn't be too difficult, each just as all the others as an  
origin --

or maybe you have one that's canonical on your system).


Hmm, but is there a way to create those clones from a single local
database?


Yeah, that's what I meant by a canonical copy on your system.


(I like the monotone model much better.  This mixing of working copies
and databases as if they were a single thing is silly and  
uncomfortable

to use.)


Monotone?

Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Dave Page
On Tue, Jun 2, 2009 at 5:16 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 David E. Wheeler wrote:

 Well, you can have as many clones of a repository as you like. You can
 keep one with master checked out, another with rel8_3, another with
 rel8_2, etc. You'd just have to write a script to keep them in sync
 (shouldn't be too difficult, each just as all the others as an origin --
 or maybe you have one that's canonical on your system).

 Hmm, but is there a way to create those clones from a single local
 database?

Just barely paying attention here, but isn't 'git clone --local' what you need?


-- 
Dave Page
EnterpriseDB UK:   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Aidan Van Dyk
* David E. Wheeler da...@kineticode.com [090602 11:56]:
 On Jun 2, 2009, at 8:43 AM, Tom Lane wrote:

 Each of these is configured (using --prefix) to install into a  
 separate
 installation tree.  So I can switch my attention to one branch or
 another by cd'ing to the right place and adjusting a few environment
 variables such as PATH and PGDATA.

 Yeah, with git, rather than cd'ing to another directory, you'd just do  
 `git checkout rel8_3` and work from the same directory.

But that looses his configured and compiled state...

But git isn't forcing him to change his workflow at all...

He *can* keep completely separate git repositories for each release
and work just as before.  This will carry with it a full separate
history in each repository, and I think that extra couple hundred MB is
what he's hoping to avoid.

But git has concepts of object alternates and reference
repositories.  To mimic your workflow, I would probably do something
like:

## Make my reference repository, cloned from offical where everyone 
pushes
moun...@pumpkin:~/projects/postgresql$ git clone --bare --mirror 
git://repo.or.cz/PostgreSQL.git PostgreSQL.git

## Make my local master development repository
moun...@pumpkin:~/projects/postgresql$ git clone --reference 
PostgreSQL.git git://repo.or.cz/PostgreSQL.git master
Initialized empty Git repository in 
/home/mountie/projects/postgresql/master/.git/

## Make my local REL8_3_STABLE development repository
moun...@pumpkin:~/projects/postgresql$ git clone --reference 
PostgreSQL.git git://repo.or.cz/PostgreSQL.git REL8_3_STABLE
Initialized empty Git repository in 
/home/mountie/projects/postgresql/REL8_3_STABLE/.git/
moun...@pumpkin:~/projects/postgresql$ cd REL8_3_STABLE/
moun...@pumpkin:~/projects/postgresql/REL8_3_STABLE$ git checkout 
--track -b REL8_3_STABLE origin/REL8_3_STABLE
Branch REL8_3_STABLE set up to track remote branch 
refs/remotes/origin/REL8_3_STABLE.
Switched to a new branch 'REL8_3_STABLE'



Now, the master/REL8_3_STABLE directories are both complete git
repositories, independant of eachother, except that they both reference
the objects in the PostgreSQL.git repository.  They don't contain the
historical objects in their own object store.  And I would couple that
with a cronjob:

*/15 * * *  git --git-dir=$HOME/projects/postgresql/PostgreSQL.git 
fetch --quiet

which will keep my reference project up2date (a la rsync-the-CVSROOT,
or cvsup-a-mirror anybody currently has when working with CVS)...

Then Tom can keep working pretty much as he currently does.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 [ it's way past time for a new subject thread ]

  Marko Kreen mark...@gmail.com writes:
   They cannot be same commits in GIT as the resulting tree is different.

  This brings up something that I've been wondering about: my limited
  exposure to git hasn't shown me any sane way to work with multiple
  release branches.

  The way that I have things set up for CVS is that I have a checkout
  of HEAD, and also sticky checkouts of the back branches:
 pgsql/ ...
 REL8_3/pgsql/ ... (made with -r REL8_3_STABLE)
 REL8_2/pgsql/ ...
 etc

  Each of these is configured (using --prefix) to install into a separate
  installation tree.  So I can switch my attention to one branch or
  another by cd'ing to the right place and adjusting a few environment
  variables such as PATH and PGDATA.

  The way I prepare a patch that has to be back-patched is first to make
  and test the fix in HEAD.  Then apply it (using diff/patch and perhaps
  manual adjustments) to the first back branch, and test that.  Repeat for
  each back branch as far as I want to go.  Almost always, there is a
  certain amount of manual adjustment involved due to renamings,
  historical changes of pgindent rules, etc.  Once I have all the versions
  tested, I prepare a commit message and commit all the branches.  This
  results in one commit message per branch in the pgsql-committers
  archives, and just one commit in the cvs2cl representation of the
  history --- which is what I want.

  I don't see any even-approximately-sane way to handle similar cases
  in git.  From what I've learned so far, you can have one checkout
  at a time in a git working tree, which would mean N copies of the
  entire repository if I want N working trees.  Not to mention the
  impossibility of getting it to regard parallel commits as related
  in any way whatsoever.

Whether you use several branches in one tree or several checked out
trees should be a personal preference, both ways are possible with GIT.

  So how is this normally done with git?

If you are talking about backbranch fixes, then the most-version
controlled-way to do would be to use lowest branch as base, commit
fix there and then merge it upwards.

Now whether it succeeds depends on merge points between branches,
as VCS system takes nearest merge point as base to launch merge logic on.

I think that is also the actual thing that Markus is concerned about.

But instead of having random merge points between branches that depend
on when some new file was added, we could simply import all branches
with linear history and later simply say to git that:

 * 7.4 is merged into 8.0
 ..
 * 8.2 is merged into 8.3
 * 8.3 is merged into HEAD

without any file changes.  Logically this would mean that any changes in
branch N-1 are already in N.

So afterwards when working with fully with GIT any upwards merges
work without any fuss as it does not need to consider old history
imported from CVS at all.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Andres Freund

On 06/02/2009 05:43 PM, Tom Lane wrote:

Marko Kreenmark...@gmail.com  writes:

They cannot be same commits in GIT as the resulting tree is different.

I don't see any even-approximately-sane way to handle similar cases
in git.  From what I've learned so far, you can have one checkout
at a time in a git working tree, which would mean N copies of the
entire repository if I want N working trees.  Not to mention the
impossibility of getting it to regard parallel commits as related
in any way whatsoever.
You can use the --reference option to git clone to refer to objects in 
another clone. That way most of the commits will only be stored in there 
- only the local commits will be in the local checkout.



Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Hmm, but is there a way to create those clones from a single local
 database?

 (I like the monotone model much better.  This mixing of working copies
 and databases as if they were a single thing is silly and uncomfortable
 to use.)

I agree, .git as a subdirectory of the working directory doesn't make
much sense to me.

I wondered for a second about symlinking .git from several checkout
directories to a common master, but AFAICT .git stores both the
repository and status information about the current checkout, so
that's not gonna work.

In the one large project that I have a git tree for, .git seems to
eat only about as much disk space as the checkout (so apparently the
compression is pretty effective).  So it wouldn't be totally impractical
to have a separate repository for each branch, but it sure seems like
an ugly and klugy way to do it.  And we'd still end up with the same
commit on different branches appearing entirely unrelated.

At the same time, I don't really buy the theory that relating commits on
different branches via merges will work.  In my experience it is very
seldom the case that a patch applies to each back branch with no manual
effort whatever, which is what I gather the merge functionality could
help with.  So maybe there's not much help to be had on this ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

2009-06-02 Thread Bruce Momjian
Hans-Juergen Schoenig wrote:
 hello everybody,
 
 from my side the goal of this discussion is to extract a consensus so 
 that we can go ahead and implement this issue for 8.5.
 our customer here needs a solution to this problem and we have to come 
 up with something which can then make it into PostgreSQL core.
 how shall we proceed with the decision finding process here?
 i am fine with a GUC and with an grammar extension - i just need a 
 decision which stays unchanged.

Do we have answer for Hans-Juergen here?

I have added a vague TODO:

Consider a lock timeout parameter

* http://archives.postgresql.org/pgsql-hackers/2009-05/msg00485.php 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 err...I don't see *any* problem at all, since pg_standby does not do
 step (1) in the way you say and therefore never does step (5). Any links
 created are explicitly deleted in all cases at the end of recovery.

That's a good point; don't we recover files under names like
RECOVERYXLOG, not under names that could possibly conflict with regular
WAL files?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Andres Freund

On 06/02/2009 06:33 PM, Tom Lane wrote:

At the same time, I don't really buy the theory that relating commits on
different branches via merges will work.  In my experience it is very
seldom the case that a patch applies to each back branch with no manual
effort whatever, which is what I gather the merge functionality could
help with.  So maybe there's not much help to be had on this ...
You can do a merge and change the commit during that - this way you get 
the merge tracking information correct although you did a merge so that 
further merge operations can consider the specific change to be applied 
on both/some/all branches.
This will happen by default if there is a merge conflict or can be 
forced by using the --no-commit option to merge.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Heikki Linnakangas

Simon Riggs wrote:

On Mon, 2009-06-01 at 14:47 +0900, Fujii Masao wrote:


pg_standby can use ln command to restore an archived file,
which might destroy the archived file as follows.

1) pg_standby creates the symlink to the archived file '102'
2) '102' is applied
3) the next file '103' doesn't exist and the trigger file is created
4) '102' is re-fetched
5) at the end of recovery, the symlink to '102' is rename to '202',
but it still points '102'
6) after recovery, '202' is recycled (rename to '208', which still
points '102')
7) '208' is written new xlog records over
-- the archived file '102' comes down!

One simple solution to fix this problem...


err...I don't see *any* problem at all, since pg_standby does not do
step (1) in the way you say and therefore never does step (5). Any links
created are explicitly deleted in all cases at the end of recovery.


I don't know how you came to that conclusion, but Fujii-sans description 
seems accurate to me, and I can reproduce the behavior on my laptop.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Marko Kreen
On 6/2/09, Tom Lane t...@sss.pgh.pa.us wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
   Hmm, but is there a way to create those clones from a single local
   database?

   (I like the monotone model much better.  This mixing of working copies
   and databases as if they were a single thing is silly and uncomfortable
   to use.)


 I agree, .git as a subdirectory of the working directory doesn't make
  much sense to me.

  I wondered for a second about symlinking .git from several checkout
  directories to a common master, but AFAICT .git stores both the
  repository and status information about the current checkout, so
  that's not gonna work.

You cannot share .git, but you can share object directory (.git/objects).
Which contains the bulk data.  There are various ways to do it, symlink
should be one of them.

  In the one large project that I have a git tree for, .git seems to
  eat only about as much disk space as the checkout (so apparently the
  compression is pretty effective).  So it wouldn't be totally impractical
  to have a separate repository for each branch, but it sure seems like
  an ugly and klugy way to do it.  And we'd still end up with the same
  commit on different branches appearing entirely unrelated.

  At the same time, I don't really buy the theory that relating commits on
  different branches via merges will work.  In my experience it is very
  seldom the case that a patch applies to each back branch with no manual
  effort whatever, which is what I gather the merge functionality could
  help with.  So maybe there's not much help to be had on this ...

Sure, if branches are different enough, the merge commit would
contain lot of code changes.  But still - you would get single main
commit with log message, plus bunch of merge commits, which may be
nicer than several duplicate commits.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread David E. Wheeler

On Jun 2, 2009, at 9:23 AM, Aidan Van Dyk wrote:

Yeah, with git, rather than cd'ing to another directory, you'd just  
do

`git checkout rel8_3` and work from the same directory.


But that looses his configured and compiled state...

But git isn't forcing him to change his workflow at all...


I defer to your clearly superior knowledge. Git is simple, but there  
is *so* much to learn!


David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Ron Mayer
Tom Lane wrote:
 Marko Kreen mark...@gmail.com writes:
 They cannot be same commits in GIT as the resulting tree is different.
 The way I prepare a patch that has to be back-patched is first to make
 and test the fix in HEAD.  Then apply it (using diff/patch and perhaps
 manual adjustments) to the first back branch, and test that.  Repeat for
 each back branch as far as I want to go.  Almost always, there is a
 certain amount of manual adjustment involved due to renamings,
 historical changes of pgindent rules, etc.  Once I have all the versions
 tested, I prepare a commit message and commit all the branches.  This
 results in one commit message per branch in the pgsql-committers
 archives, and just one commit in the cvs2cl representation of the
 history --- which is what I want.

I think the closest equivalent to what you're doing here is:

  git cherry-pick -n -x the commit you want to pull

The git cherry-pick command does similar to the diff/patch work.
The -n prevents an automatic checking to allow for manual adjustments.
The -x flag adds a note to the commit comment describing the relationship
between the commits.

It seems to me we could make a cvs2cl like script that's aware
of the comments git-cherry-pick -x inserts and rolls them up
in a similar way that cvs2cl does.




 The way that I have things set up for CVS is that I have a checkout
 of HEAD, and also sticky checkouts of the back branches...
 Each of these is configured (using --prefix) to install into a separate
 installation tree. ...

I think the most similar thing here would be for you to have one
normal clone of the official repository, and then use
git-clone --local
when you set up the back branch directories.  The --local flag will
use hard-links to avoid wasting space  time of maintaining multiple
copies of histories.

 I don't see any even-approximately-sane way to handle similar cases
 in git.  From what I've learned so far, you can have one checkout
 at a time in a git working tree, which would mean N copies of the
 entire repository if I want N working trees

git-clone --local avoids that.

 ... Not to mention the
 impossibility of getting it to regard parallel commits as related
 in any way whatsoever.

Well - related in any way whatsoever seems possible either
through the comments added in the -x flag in git-cherry-pick, or
with the other workflows people described where you fix the bug in
a new branch off some ancestor of all the releases (ideally near
where the bug occurred) and merge them into the branches.


 So how is this normally done with git?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Aidan Van Dyk
* Tom Lane t...@sss.pgh.pa.us [090602 12:35]:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Hmm, but is there a way to create those clones from a single local
  database?
 
  (I like the monotone model much better.  This mixing of working copies
  and databases as if they were a single thing is silly and uncomfortable
  to use.)
 
 I agree, .git as a subdirectory of the working directory doesn't make
 much sense to me.

The main reason why git uses this is that the index (git equivilant of
the CVS/*) resides in 1 place instead of in each directory.  So, if you
have multiple working directories sharing a single .git, you get them
tromping on each others index.

That said, you can symlink almost everything *inside* .git to other
repositories.

For instance, if you had the Reference repository I shows last time,
instead of doing the git clone, you could do:

#Make a new REL8_2_STABLE working area
moun...@pumpkin:~/pg-work$ REF=$(pwd)/PostgreSQL.git
moun...@pumpkin:~/pg-work$ mkdir REL8_2_STABLE
moun...@pumpkin:~/pg-work$ cd REL8_2_STABLE/
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ git init

# And now make everything point back
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ mkdir .git/refs/remotes  ln 
-s $REF/refs/heads .git/refs/remotes/origin
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ rm -Rf .git/objects  ln -s 
$REF/objects .git/objects
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ rmdir .git/refs/tags   ln -s 
$REF/refs/tags .git/refs/tags
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ rm -Rf .git/info  ln -s 
$REF/info .git/info
moun...@pumpkin:~/pg-work/REL8_2_STABLE$ rm -Rf .git/hooks  ln -s 
$REF/hooks

This will leave you with an independent config, independent index,
independent heads, and independent reflogs, with a shared remote
tracking branches, shared object store, shared tags, and shared
hooks.

And make sure you don't purge any unused objects out of any of these
subdirs, because they don't know that the object might be in use in
another subdir...  This warning is the one reason why it's usually
recommended to just use a reference repository, and not have to worry..

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Mark Mielke

Tom Lane wrote:

I agree, .git as a subdirectory of the working directory doesn't make
much sense to me.

I wondered for a second about symlinking .git from several checkout
directories to a common master, but AFAICT .git stores both the
repository and status information about the current checkout, so
that's not gonna work.

In the one large project that I have a git tree for, .git seems to
eat only about as much disk space as the checkout (so apparently the
compression is pretty effective).  So it wouldn't be totally impractical
to have a separate repository for each branch, but it sure seems like
an ugly and klugy way to do it.  And we'd still end up with the same
commit on different branches appearing entirely unrelated


I am curious about why an end user would really care? CVS and SVN both 
kept local workspace directories containing metadata. If anything, I 
find GIT the least intrusive of these three, as the .git is only in the 
top-level directory, whereas CVS and SVN like to pollute every directory.


Assuming you don't keep binaries under source control, the .git 
containing all history is very often smaller than the pristine copy 
kept by CVS or SVN in their metadata directories, so space isn't really 
the issue.


Maybe think of it more like a feature. GIT keeps a local cache of the 
entire repo, whereas SVN and CVS only keeps a local cache of the commit 
you are based on. It's a feature that you can review history without 
network connectivity.


Cheers,
mark

--
Mark Mielke m...@mielke.cc


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Heikki Linnakangas

Tom Lane wrote:

Simon Riggs si...@2ndquadrant.com writes:

err...I don't see *any* problem at all, since pg_standby does not do
step (1) in the way you say and therefore never does step (5). Any links
created are explicitly deleted in all cases at the end of recovery.


That's a good point; don't we recover files under names like
RECOVERYXLOG, not under names that could possibly conflict with regular
WAL files?


Yes. But we rename RECOVERYXLOG to 00010057 or similar 
at the end of recovery, in exitArchiveRecovery().


Thinking about this some more, I think we should've changed 
exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more 
robust if exitArchiveRecovery() always copied the last WAL file rather 
than just renamed it. It doesn't seem safe to rely on the file the 
symlink points to to be valid after recovery is finished, and we might 
write to it before it's recycled, so the current fix isn't complete.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Alvaro Herrera
Mark Mielke wrote:

 I am curious about why an end user would really care? CVS and SVN both  
 kept local workspace directories containing metadata. If anything, I  
 find GIT the least intrusive of these three, as the .git is only in the  
 top-level directory, whereas CVS and SVN like to pollute every directory.

That's not the problem.  The problem is that it is kept in the same
directory as the checked out copy.  It would be a lot more usable if it
was possible to store it elsewhere.

Yes, the .svn directories are a PITA.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Marko Kreen
On 6/2/09, Alvaro Herrera alvhe...@commandprompt.com wrote:
 Mark Mielke wrote:

   I am curious about why an end user would really care? CVS and SVN both
   kept local workspace directories containing metadata. If anything, I
   find GIT the least intrusive of these three, as the .git is only in the
   top-level directory, whereas CVS and SVN like to pollute every directory.


 That's not the problem.  The problem is that it is kept in the same
  directory as the checked out copy.  It would be a lot more usable if it
  was possible to store it elsewhere.

export GIT_DIR=...

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] from_collapse_limit vs. geqo_threshold

2009-06-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann sel...@endpoint.com wrote:
 Suggested revision of Robert's prose:
 
 Because genetic query optimization may be triggered, increasing
 from_collapse_limit should be considered relative to xref
 linkend=guc-geqo-threshold.

 Here's my attempt.

I applied the attached, along with some other wordsmithing.

regards, tom lane

***
*** 2252,2261 
  The planner will merge sub-queries into upper queries if the
  resulting literalFROM/literal list would have no more than
  this many items.  Smaller values reduce planning time but might
! yield inferior query plans.  The default is eight.  It is usually
! wise to keep this less than xref linkend=guc-geqo-threshold.
  For more information see xref linkend=explicit-joins.
 /para
/listitem
   /varlistentry
  
--- 2261,2275 
  The planner will merge sub-queries into upper queries if the
  resulting literalFROM/literal list would have no more than
  this many items.  Smaller values reduce planning time but might
! yield inferior query plans.  The default is eight.
  For more information see xref linkend=explicit-joins.
 /para
+ 
+para
+ Setting this value to xref linkend=guc-geqo-threshold or more
+ may trigger use of the GEQO planner, resulting in nondeterministic
+ plans.  See xref linkend=runtime-config-query-geqo.
+/para
/listitem
   /varlistentry
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Heikki Linnakangas

Andres Freund wrote:

On 06/02/2009 06:33 PM, Tom Lane wrote:

At the same time, I don't really buy the theory that relating commits on
different branches via merges will work.  In my experience it is very
seldom the case that a patch applies to each back branch with no manual
effort whatever, which is what I gather the merge functionality could
help with.  So maybe there's not much help to be had on this ...
You can do a merge and change the commit during that - this way you get 
the merge tracking information correct although you did a merge so that 
further merge operations can consider the specific change to be applied 
on both/some/all branches.
This will happen by default if there is a merge conflict or can be 
forced by using the --no-commit option to merge.


Yeah, that should work fine.

However, handling fixes to multiple branches by merging the release 
branches to master seems awkward to me. A merge will merge *all* commits 
in the release branch. Including stamp 8.3.1 commits, and fixes for 
issues in release branches that are not present in master.


Cherry-picking seems like the best approach.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] It's June 1; do you know where your release is?

2009-06-02 Thread Tom Lane
Kris Jurka bo...@ejurka.com writes:
 On Mon, 1 Jun 2009, Robert Haas wrote:
 tgl says: whether or not we think PL/Java is bulletproof, there are
 other problems, for instance this one
 http://archives.postgresql.org/message-id/87zlnwnvjg@news-spur.riddles.org.uk
 
 That's a pretty overwhelming argument for leaving it as-is.  I think
 we should remove this from the list of open items.

 Yes, that makes sense to me as the original requester of this open item. 
 I thought it had been taken off a while ago.

Removed now.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Aidan Van Dyk
* Alvaro Herrera alvhe...@commandprompt.com [090602 13:25]:

 That's not the problem.  The problem is that it is kept in the same
 directory as the checked out copy.  It would be a lot more usable if it
 was possible to store it elsewhere.
 
 Yes, the .svn directories are a PITA.

You can export GIT_DIR to make the .git directory be somewhere else...
and you'll probalby want a corresponding GIT_WORK_TREE (or core.worktree
config) set.

If your careful (i.e. don't make a mistake), you can set GIT_DIR and
GIT_INDEX_FILE AND GIT_WORK_TREE, and use a single git repository
among multiple independent working directories.

That said, is the carefulness needed to work that worth the  200KB
you save?

On a referenced style development repository:
moun...@pumpkin:~/pg-work/REL8_3_STABLE$ du -shc .git/*
4.0K.git/branches
4.0K.git/config
4.0K.git/description
4.0K.git/HEAD
48K .git/hooks
328K.git/index
8.0K.git/info
36K .git/logs
16K .git/objects
4.0K.git/packed-refs
32K .git/refs
488Ktotal

488K total in the .git directory, 328K of that is the index.

a.

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Andrew Dunstan



Tom Lane wrote:

David E. Wheeler da...@kineticode.com writes:
  
Yeah, with git, rather than cd'ing to another directory, you'd just do  
`git checkout rel8_3` and work from the same directory.



That's what I'd gathered, and frankly it is not an acceptable answer.
Sure, the checkout operation is remarkably fast, but it does nothing
for derived files.  What would really be involved here (if I wanted to
be sure of having a non-broken build) is
make maintainer-clean
git checkout rel8_3
configure
make
which takes long enough that I'll have plenty of time to consider
how much I hate git.  If there isn't a better way proposed, I'm
going to flip back to voting against this conversion.  I need tools
that work for me not against me.





Hmm.  I confess that I never switch between CVS branches. Instead I keep 
a separate tree for each maintained branch.  And that's what the 
buildfarm does and will continue doing with git. Maybe that's not as 
efficient a way for a developer to work, I don't know.


Of course, your work rate gives you much more weight in this discussion 
than me ;-)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Hmm.  I confess that I never switch between CVS branches. Instead I keep 
 a separate tree for each maintained branch.

Right, exactly, and that's the workflow I want to maintain with git.
Having to rebuild the derived files every time I look at a different
branch is too much overhead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Fix LOCK TABLE to eliminate the race condition that could make it

2009-06-02 Thread Bruce Momjian
Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  If we're going to require cascaded permissions like this, would it make
  sense to make GRANT cascade down the inheritance tree also? 
 
 That's been discussed before.  I forget whether we decided it was a good
 idea or not, but in any case it looks like a new feature not a bug fix.

Is this a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby -l might destory the archived file

2009-06-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Tom Lane wrote:
 That's a good point; don't we recover files under names like
 RECOVERYXLOG, not under names that could possibly conflict with regular
 WAL files?

 Yes. But we rename RECOVERYXLOG to 00010057 or similar 
 at the end of recovery, in exitArchiveRecovery().

 Thinking about this some more, I think we should've changed 
 exitArchiveRecovery() rather than RemoveOldXlogFiles(): it would be more 
 robust if exitArchiveRecovery() always copied the last WAL file rather 
 than just renamed it. It doesn't seem safe to rely on the file the 
 symlink points to to be valid after recovery is finished, and we might 
 write to it before it's recycled, so the current fix isn't complete.

Hmm.  I think really the reason it's coded that way is that we assumed
the recovery command would be physically copying the file from someplace
else.  pg_standby is violating the backend's expectations by using a
symlink.  And I really doubt that the technique is saving anything, since
the data has to be read in from the archive location anyway.

I'm leaning back to the position that pg_standby's -l option is simply a
bad idea and should be removed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Mark Mielke

Alvaro Herrera wrote:

Mark Mielke wrote:
  
I am curious about why an end user would really care? CVS and SVN both  
kept local workspace directories containing metadata. If anything, I  
find GIT the least intrusive of these three, as the .git is only in the  
top-level directory, whereas CVS and SVN like to pollute every directory.



That's not the problem.  The problem is that it is kept in the same
directory as the checked out copy.  It would be a lot more usable if it
was possible to store it elsewhere.
  


I'm not following. CVS and SVN both kept such directories in the 
checked out copy. Recall the CSV/*,v files?


As for storing it elsewhere - if you absolute must, you can. There is a 
--git-dir=GIT_DIR and --work-tree=GIT_WORK_TREE option to all git 
commands, and GIT_DIR / GIT_WORK_TREE environment variables.


I just don't understand why you care. If the CVS directories didn't bug 
you before, why does the single .git directory bug you now? I'm 
genuinely interested as I don't get it. :-)


Cheers,
mark

--
Mark Mielke m...@mielke.cc



Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Alvaro Herrera
Mark Mielke wrote:

 I just don't understand why you care. If the CVS directories didn't bug  
 you before, why does the single .git directory bug you now? I'm  
 genuinely interested as I don't get it. :-)

It doesn't.  What bugs me is that the database (the pulled tree if you
will) is stored in it.  It has already been pointed out how to put it
elsewhere, so no need to explain that.

What *really* bugs me is that it's so difficult to have one pulled
tree and create a bunch of checked out copies from that.

(In the CVS world, I kept a single rsync'ed copy of the anoncvs
repository, and I could do multiple cvs checkout copies from there
using different branches.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Mark Mielke

Alvaro Herrera wrote:

Mark Mielke wrote:
  
I just don't understand why you care. If the CVS directories didn't bug  
you before, why does the single .git directory bug you now? I'm  
genuinely interested as I don't get it. :-)



It doesn't.  What bugs me is that the database (the pulled tree if you
will) is stored in it.  It has already been pointed out how to put it
elsewhere, so no need to explain that.

What *really* bugs me is that it's so difficult to have one pulled
tree and create a bunch of checked out copies from that.

(In the CVS world, I kept a single rsync'ed copy of the anoncvs
repository, and I could do multiple cvs checkout copies from there
using different branches.)
  


You say database, but unless you assume you know what is in it, .git 
isn't really different from CVS/ or .svn. It's workspace metadata. Size 
might concern you, except that it's generally smaller than CVS/ or .svn. 
Content might concern you, until you realize that being able to look 
through history without accessing the network is a feature, not a 
problem. Time to prepare the workspace might concern you, but I haven't 
seen people time the difference between building a cvs checkout vs a git 
clone.


You talk about avoiding downloads by rsync'ing the CVS repository. You 
can do nearly the exact same thing in GIT:


1) Create a 'git clone --bare' that is kept up-to-date with 'git fetch'. 
This is your equivalent to an rsync'ed copy of the anoncvs repository.
2) Use 'git clone' from your local bare repo, or from the remote using 
the local bare repo as a reference. Either hard links, or as a reference 
no links at all will keep your clone smaller than either a CVS or an SVN 
checkout.


Mainly, I want to point out that the existence of .git is not a real 
problem - it's certainly no worse than before.


Cheers,
mark

--
Mark Mielke m...@mielke.cc



Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Andres Freund

On 06/02/2009 09:38 PM, Alvaro Herrera wrote:

Mark Mielke wrote:


I just don't understand why you care. If the CVS directories didn't bug
you before, why does the single .git directory bug you now? I'm
genuinely interested as I don't get it. :-)


It doesn't.  What bugs me is that the database (the pulled tree if you
will) is stored in it.  It has already been pointed out how to put it
elsewhere, so no need to explain that.

What *really* bugs me is that it's so difficult to have one pulled
tree and create a bunch of checked out copies from that.

I dont see were the difficulty resides?

#Setup a base repository
cd /../master
git [--bare] clone git://git.postgresql.org/whatever .


#Method 1
cd /../child1
git clone --reference /../master/ git://git.postgresql.org/whatever .
cd /../child2
git clone --reference /../master/ git://git.postgresql.org/whatever .

This way you can fetch from the git url without problem, but when a 
object is available locally it is not downloaded again.


#Method2
cd /../child3
git clone --shared /../postgresql/ child3
...
This way you only fetch from your pulled tree and never possibly from 
the upstream one.


Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Andrew Dunstan



Tom Lane wrote:

Once I have all the versions
tested, I prepare a commit message and commit all the branches.  This
results in one commit message per branch in the pgsql-committers
archives, and just one commit in the cvs2cl representation of the
history --- which is what I want.


  


I think the 'just one commit' view is going to be the hard piece. Other 
than that, there will probably be some minor annoyances, but that's to 
be expected in any switch, I think.


Of course, it's open source so if someone wants to work on multibranch 
commit to make our life easier ... ;-)


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Tom Lane
Mark Mielke m...@mark.mielke.cc writes:
 Alvaro Herrera wrote:
 That's not the problem.  The problem is that it is kept in the same
 directory as the checked out copy.  It would be a lot more usable if it
 was possible to store it elsewhere.

 I'm not following. CVS and SVN both kept such directories in the 
 checked out copy. Recall the CSV/*,v files?

I can't speak to SVN, but that is *not* how CVS does it.  There's a
small CVS/ directory, but the repository (with all the ,v files)
is somewhere else.  In particular I can have N different checked-out
working copies without duplicating the repository.

 I just don't understand why you care. If the CVS directories didn't bug 
 you before, why does the single .git directory bug you now?

(1) size (ok, not a showstopper)
(2) potential for error

Blowing away your working directory shouldn't result in loss of your
entire project history.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Robert Haas
On Tue, Jun 2, 2009 at 3:38 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 What *really* bugs me is that it's so difficult to have one pulled
 tree and create a bunch of checked out copies from that.

Yeah.  It basically doesn't work, hacks to the contrary on this thread
nonwithstanding, and I'm sympathetic to Tom's pain as I spend a fair
amount of time switching branches, doing git-clean -dfx  configure
 make check  make install.

Of course in my cases they are usually private branches rather than
back branches, but the problem is the same.

And, unfortunately, I'm not sure there's a good solution.  Tom could
create 1 local repository cloned from the origin and then N-1 copies
cloned with --local from that one, but this sort of defeats the
purpose of using git, because now if he commits a change to one of
them and then wants to apply that change to each back branch, he's got
to fetch that change on each one, cherry-pick it, make his changes,
commit, and then push it back to his main repository.  Some of this
could probably be automated using scripts and post-commit hooks, but
even so it's a nuisance, and if you ever want to reset or rebase
(before pushing to origin, of course) it gets even more annoying.

I wonder whether it would help with this problem if we had a way to
locate the build products outside the tree, and maybe fix things up so
that you can make the build products go to a different location
depending on which branch you're on.  I personally find it incredibly
convenient to be able to check out a different branch without losing
track of where I am in the tree.  So if I'm in
$HOME/pgsql-git/src/backend/commands and I switch to a new branch, I'm
still in that same directory, versus having to cd around.  So in
general I find the git way of doing things to be very convenient, but
needing to rebuild all the intermediates sucks.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Managing multiple branches in git

2009-06-02 Thread Robert Haas
On Tue, Jun 2, 2009 at 3:58 PM, Andres Freund and...@anarazel.de wrote:
 On 06/02/2009 09:38 PM, Alvaro Herrera wrote:

 Mark Mielke wrote:

 I just don't understand why you care. If the CVS directories didn't bug
 you before, why does the single .git directory bug you now? I'm
 genuinely interested as I don't get it. :-)

 It doesn't.  What bugs me is that the database (the pulled tree if you
 will) is stored in it.  It has already been pointed out how to put it
 elsewhere, so no need to explain that.

 What *really* bugs me is that it's so difficult to have one pulled
 tree and create a bunch of checked out copies from that.

 I dont see were the difficulty resides?

 #Setup a base repository
 cd /../master
 git [--bare] clone git://git.postgresql.org/whatever .


 #Method 1
 cd /../child1
 git clone --reference /../master/ git://git.postgresql.org/whatever .
 cd /../child2
 git clone --reference /../master/ git://git.postgresql.org/whatever .

 This way you can fetch from the git url without problem, but when a object
 is available locally it is not downloaded again.

Yeah but now you have to push and pull commits between your numerous
local working copies.  Boo, hiss.

 #Method2
 cd /../child3
 git clone --shared /../postgresql/ child3
 ...
 This way you only fetch from your pulled tree and never possibly from the
 upstream one.

This is so unsafe it's not even worth talking about.  See git-clone(1).

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Locks on temp table and PREPARE

2009-06-02 Thread Emmanuel Cecchet

Hi,

As we discussed during PGCon, we are using temp tables in 2PC 
transactions. The temp tables are dropped before PREPARE (or have an ON 
COMMIT DROP option) and never cross transaction boundaries.
In 8.3.1, a patch was introduced to disallow temp tables in 2PC 
transactions and we tried to provide a fix for it (see the long thread 
with Heikki on this list). I am still working on a cleaner patch to 
allow temp tables to be used in 2PC transactions but I did hit a new 
problem that I don't know how to solve cleanly.


Take PG 8.3.0 and try:
BEGIN;
CREATE TEMP TABLE foo (x int) ON COMMIT DROP;
PREPARE TRANSACTION 't1';
[BEGIN;] -- doesn't really matter if you start a new transaction or not
CREATE TEMP TABLE foo (x int); -- blocks until t1 commits

I have been tracking down the problem and it looks like 
PostPrepare_Locks is holding the locks on 'foo' for some reason I don't 
really get.


Any suggestion on what should be done differently for temp tables there?

Thanks,
Emmanuel

--
Emmanuel Cecchet
FTO @ Frog Thinker 
Open Source Development  Consulting

--
Web: http://www.frogthinker.org
email: m...@frogthinker.org
Skype: emmanuel_cecchet


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >