date:20100818

Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Heikki Linnakangas


On 19/08/10 04:46, Robert Haas wrote:

At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.


Agreed. EnterpriseDB is interested in having that feature, so I'm on the 
hook to spend time on it regardless of commitfests.



I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.


Well, that's the only uncontroversial thing about it that doesn't 
require any fighting over the UI or desired behavior. That's why I've 
focused on that first, and also because it's useful regardless of 
synchronous replication. But once that's done, we'll have to nail down 
how synchronous replication is supposed to behave, and how to configure it.



 And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.


Fujii is on vacation, but I've started working on it. The two issues 
with Fujii's latest patch are that it would not respond promptly on 
platforms where signals don't interrupt sleep, and it suffers the 
classic race condition that pselect() was invented for. I'm going to 
replace pg_usleep() with select(), and use the so called "self-pipe 
trick" to get over the race condition. I have that written up but I want 
to do some testing and cleanup before posting the patch.



  It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not.  We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November.  We'll be lucky to
commit sync rep that early.


Agreed. We need to decide the scope and minimum set of features real 
soon to get something concrete finished.


BTW, on what platforms signals don't interrupt sleep? Although that 
issue has been discussed many times before, I couldn't find any 
reference to a real platform in the archives.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Magnus Hagander wrote:
> Is there some way to make cvs2git work this way, and just not bother
> even trying to create merge commits, or is that fundamentally
> impossible and we need to look at another tool?

The good news: (I just reminded myself/realized that) Max Bowsher has
already implemented pretty much exactly what you want in the cvs2svn
trunk version, including noting in the commit messages any cherry-picks
that are not reflected in the repo ancestry.

The bad news: It is broken [1].  But I don't think it should be too much
work to fix it.

Michael

[1]
http://cvs2svn.tigris.org/ds/viewMessage.do?dsForumId=1670&dsMessageId=2624153

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas

On Wed, Aug 18, 2010 at 7:46 PM, Greg Smith  wrote:
> Kevin Grittner wrote:
>>
>> I don't think I want to try to handle two in a row, and I think your style
>> is better suited
>> than mine to the final CF for a release, but I might be able to take on
>> the 2010-11 CF if people want that
>
> Ha, you just put yourself right back on the hook with that comment, and
> Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the
> question of what's going to happen with CF-2 next month.

My reputation precedes me, apparently.  Although I appreciate everyone
so far being willing to avoid mentioning exactly what that reputation
might be.  :-)

> I think the crucial thing with the 2010-09 CF is that we have to get serious
> progress made sorting out all the sync rep ideas before/during that one.
>  The review Yeb did and subsequent discussion was really helpful, but the
> scope on that needs to actually get nailed down to *something* concrete if
> it's going to get built early enough in the 9.1 release to be properly
> reviewed and tested for more than one round.  Parts of the design and scope
> still feel like they're expanding to me, and I think having someone heavily
> involved in the next CF who is willing to push on nailing down that
> particular area is pretty important.  Will volunteer myself if I can stay on
> schedule to make it past the major time commitment sink I've had so far this
> year by then.

Sitting on Sync Rep is a job and a half by itself, without adding all
the other CF work on top of it.  Maybe we should try to find two
vi^Holunteers: a CommitFest Manager (CFM) and a Major Feature
Babysitter (MBS).  At any rate, we should definitely NOT wait another
month to start thinking about Sync Rep again.  I haven't actually
looked at any of the Sync Rep code AT ALL but IIRC Heikki expressed
the view that the biggest thing standing in the way of a halfway
decent Sync Rep implementation was a number of polling loops that
needed to be replaced with something that wouldn't introduce
up-to-100ms delays.  And so far we haven't seen a patch for that.
Somebody write one.  And then let's get it reviewed and committed RSN.
 It may seem like we're early in the release cycle yet, but for a
feature of this magnitude we are not.  We committed way too much big
stuff at the very end of the last release cycle; Hot Standby was still
being cleaned up in May after commit in November.  We'll be lucky to
commit sync rep that early.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Alvaro Herrera wrote:
> Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:
> 
>> 3. Run
>>
>> git filter-branch
>>
>> This rewrites the commits using any parentage changes from the grafts
>> file.  This changes most commits' SHA1 hashes.  After this you can
>> discard the .git/info/grafts file.  You would then want to remove the
>> original references, which were moved to "refs/original".
> 
> Hmm.  If I need to do two changes in the same branch, do I need to
> mention the new SHA1 for the second one (after filter-branch changes its
> SHA1), or the original one?  If the former, then this is going to be a
> very painful process.

No, all SHA1s refer to the values for the *old* versions of the commits.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security label support, part.2

2010-08-18 Thread KaiGai Kohei

>>> How about an idea to add a new flag in RangeTblEntry which shows where
>>> the RangeTblEntry came from, instead of clearing requiredPerms?
>>> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
>>> on the child tables.
>>
>> How about the external module just checks if the current object being
>> queried has parents, and if so, goes and checks the
>> labels/permissions/etc on those children?  That way the query either
>> always fails or never fails for a given caller, rather than sometimes
>> working and sometimes not depending on the query.
>>
> Hmm, this idea may be feasible. The RangeTblEntry->inh flag of the parent
> will give us a hint whether we also should check labels on its children.
> 

http://code.google.com/p/sepgsql/source/browse/trunk/sepgsql/relation.c#293

At least, it seems to me this logic works as expected.

  postgres=# CREATE TABLE tbl_p (a int, b text);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_1 (check (a < 100)) inherits (tbl_p);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_2 (check (a >= 100 and a < 200)) inherits (tbl_p);
  CREATE TABLE
  postgres=# CREATE TABLE tbl_3 (check (a >= 300)) inherits (tbl_p);
  CREATE TABLE
  postgres=# SECURITY LABEL on TABLE tbl_p IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL
  postgres=# SECURITY LABEL on COLUMN tbl_p.a IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL
  postgres=# SECURITY LABEL on COLUMN tbl_p.b IS 
'system_u:object_r:sepgsql_table_t:s0';
  SECURITY LABEL

  postgres=# set sepgsql_debug_audit = on;
  SET

  postgres=# SELECT a FROM ONLY tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p
  STATEMENT:  SELECT a FROM ONLY tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a
  STATEMENT:  SELECT a FROM ONLY tbl_p WHERE a = 150;
   a
  ---
  (0 rows)

-> ONLY tbl_p was not expanded

  postgres=# SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_p
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_p.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_1
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_1.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_2
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_2.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_table name=tbl_3
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
  LOG:  SELinux: allowed { select } 
scontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 
tcontext=system_u:object_r:sepgsql_table_t:s0 tclass=db_column name=tbl_3.a
  STATEMENT:  SELECT a FROM tbl_p WHERE a = 150;
   a
  ---
  (0 rows)

-> tbl_p was expanded to tbl_1, tbl_2 and tbl_3

  postgres=# set sepgsql_debug_audit = off;
  SET
  postgres=# EXPLAIN SELECT a FROM tbl_p WHERE a = 150;
 QUERY PLAN
  
   Result  (cost=0.00..50.75 rows=12 width=4)
 ->  Append  (cost=0.00..50.75 rows=12 width=4)
   ->  Seq Scan on tbl_p  (cost=0.00..25.38 rows=6 width=4)
 Filter: (a = 150)
   ->  Seq Scan on tbl_2 tbl_p  (cost=0.00..25.38 rows=6 width=4)
 Filter: (a = 150)
  (6 rows)

-> Actually, it does not scan tbl_1 and tbl_3 due to the a = 150.

-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security label support, part.2

2010-08-18 Thread KaiGai Kohei

(2010/08/18 21:52), Stephen Frost wrote:
> * KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
>> If rte->requiredPerms would not be cleared, the user of the hook will
>> be able to check access rights on the child tables, as they like.
> 
> This would only be the case for those children which are being touched
> in the current query, which would depend on what conditionals are
> applied, what the current setting of check_constraints is, and possibly
> other factors.  I do *not* like this approach.
> 
Indeed, the planner might omit scan on the children which are not obviously
referenced, but I'm not certain whether its RangeTblEntry would be also
removed from the PlannedStmt->rtable, or not.

>> How about an idea to add a new flag in RangeTblEntry which shows where
>> the RangeTblEntry came from, instead of clearing requiredPerms?
>> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
>> on the child tables.
> 
> How about the external module just checks if the current object being
> queried has parents, and if so, goes and checks the
> labels/permissions/etc on those children?  That way the query either
> always fails or never fails for a given caller, rather than sometimes
> working and sometimes not depending on the query.
> 
Hmm, this idea may be feasible. The RangeTblEntry->inh flag of the parent
will give us a hint whether we also should check labels on its children.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Greg Smith


Kevin Grittner wrote:

I don't think I want to try to handle two in a row, and I think your style is 
better suited
than mine to the final CF for a release, but I might be able to take on the 
2010-11 CF if people want that


Ha, you just put yourself right back on the hook with that comment, and 
Robert does seem like the right guy for CF-4 @ 2011-01.  Leaving the 
question of what's going to happen with CF-2 next month.


I think the crucial thing with the 2010-09 CF is that we have to get 
serious progress made sorting out all the sync rep ideas before/during 
that one.  The review Yeb did and subsequent discussion was really 
helpful, but the scope on that needs to actually get nailed down to 
*something* concrete if it's going to get built early enough in the 9.1 
release to be properly reviewed and tested for more than one round.  
Parts of the design and scope still feel like they're expanding to me, 
and I think having someone heavily involved in the next CF who is 
willing to push on nailing down that particular area is pretty 
important.  Will volunteer myself if I can stay on schedule to make it 
past the major time commitment sink I've had so far this year by then.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CommitFest 2010-07 final report

2010-08-18 Thread Kevin Grittner

At the close of the 2010-07 CommitFest, the numbers were:
 
72 patches were submitted
 3 patches were withdrawn (deleted) by their authors
14 patches were moved to CommitFest 2010-09
--
55 patches in CommitFest 2010-07
--
 3 committed to 9.0
--
52 patches for 9.1
--
 1 rejected
20 returned with feedback
31 committed for 9.1
 
When we hit the end of the allotted time, I moved the last two
patches to the next CF, for want of a better idea for disposition. 
One is "Ready for Committer" with an author who is a committer.  The
other is my WiP patch for serializable transactions -- there's a lot
to review and the reviewer had unexpected demands on his time during
the CF; he said he'll continue work on that outside the CF.
 
-Kevin
 
 
At the end of week four:
 
> 72 patches were submitted
>  3 patches were withdrawn (deleted) by their authors
> 12 patches were moved to CommitFest 2010-09
> --
> 57 patches in CommitFest 2010-07
> --
>  3 committed to 9.0
> --
> 54 patches for 9.1
> --
>  1 rejected
> 18 returned with feedback
> 28 committed for 9.1
> --
> 47 disposed
> --
>  7 pending
>  2 ready for committer
> --
>  5 will still need reviewer attention
>  1 waiting on author to respond to review
> --
>  4 patches need review now and have a reviewer assigned


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread A.M.


On Aug 18, 2010, at 9:02 AM, Robert Haas wrote:

> On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark  wrote:
>> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page  wrote:
>>> Which is ideal for monitoring your own connection - having the info in
>>> the pg_stat_activity is also valuable for monitoring and system
>>> administration. Both would be ideal :-)
>> 
>> Hm, I think I've come around to the idea that having the info in
>> pg_stat_activity would be very nice. I can just picture sitting in
>> pgadmin while a bunch of reports are running and seeing progress bars
>> for all of them...
>> 
>> But progress bars alone aren't really the big prize. I would really
>> love to see the explain plans for running queries. This would improve
>> the DBAs view of what's going on in the system immensely. Currently
>> you have to grab the query and try to set up a similar environment for
>> it to run explain on it. If analyze has run since or if the tables
>> have grown or shrank or if the query was run with some constants as
>> parameters it can be awkward. If some of the tables in the query were
>> temporary tables it can be impossible. You can never really be sure
>> you're looking at precisely the same plan than the other user's
>> session is running.
>> 
>> But stuffing the whole json or xml explain plan into pg_stat_activity
>> seems like it doesn't really fit the same model that the existing
>> infrastructure is designed around. It could be quite large and if we
>> want to support progress feedback it could change quite frequently.
>> 
>> We do stuff the whole query there (up to a limited size) so maybe I'm
>> all wet and stuffing the explain plan in there would be fine?
> 
> It seems to me that progress reporting could add quite a bit of
> overhead.  For example, in the whole-database vacuum case, the most
> logical way to report progress would be to compute pages visited
> divided by pages to be visited.  But the total number of pages to be
> visited is something that doesn't need to be computed in advance
> unless someone cares about progress.  I don't think we want to incur
> that overhead in all cases just on the off chance someone might ask.
> We need to think about ways to structure this so that it only costs
> when someone's using it.

I wish that I could get explain analyze output step-by-step while running a 
long query instead of seeing it jump out at the end of execution. Some queries 
"never" end and it would be nice to see which step is spinning (explain can be 
a red herring). To me the "progress bar" is nice, but I don't see how it would 
be reliable enough to draw any inferences (such as execution time). If I could 
get the explain analyze results *and* the actual query results, that would be a 
huge win, too.

Cheers,
M


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Kevin Grittner

Robert Haas  wrote:

> I'd just like to take a minute to thank him publicly for his
> efforts.  We started this CommitFest with something like 60
> patches, which is definitely on the larger side for a CommitFest,
> and Kevin did a great job staying on top of what was going on with
> all of them and, I felt, really helped keep us on track.  At the
> same time, I felt he did this with a very light touch that made
> the whole thing go very smoothly.  So -- thanks, Kevin!

You're welcome.  It was educational for me.  I don't think I want to
try to handle two in a row, and I think your style is better suited
than mine to the final CF for a release, but I might be able to take
on the 2010-11 CF if people want that.

My hand was not always so light behind the scenes, though -- I sent
or received about 100 off-list emails to try to keep things moving. 
Hopefully nobody was too offended by my nagging.  :-)

Oh, and thanks for putting together the CF web application.  Without
that, I couldn't have done half as well as I did.

> I also appreciate the efforts of all those who reviewed.

Yes, I'll second that!  I've always been impressed with the
PostgreSQL community, and managing this CF gave me new insights and
appreciation for the intelligence, professionalism, and community
spirit of its members -- authors, reviewers, and committers.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane

Josh Berkus  writes:
>> Most likely that's the libc implementation of the select()-based sleeps
>> for vacuum_cost_delay.  I'm still suspicious that the writes are eating
>> more cost_delay points than you think.

> Tested that.  It does look like if I increase vacuum_cost_limit to 1
> and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
> 2-3 before each pollsys.  The math seems completely wrong on that,
> though -- it should be 50 and 30 pages, or similar.

I think there could be a lot of cost_delay points getting expended
without any effects visible at the level of strace.  Maybe try fooling
with vacuum_cost_page_hit and vacuum_cost_page_miss, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Peter Eisentraut

On ons, 2010-08-18 at 13:45 +0100, Greg Stark wrote:
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries.

The auto_explain module does that already.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Peter Eisentraut

On tis, 2010-08-17 at 13:52 -0400, Stephen Frost wrote:
> I don't like how the backend would have to send something NOTICE-like,
> I had originally been thinking "gee, it'd be nice if psql could query
> pg_stat while doing something else", but that's not really possible...
> So, I guess NOTICE-like messages would work, if the backend could be
> taught to do it.

That should be doable; you'd just have to do some ereport(NOTICE)
variant inside pgstat_report_progress and have a switch to turn it on
and off, and have psql do something with it.  The latter is really the
interesting part; the former is relatively easy once the general
framework is in place.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus


> Tested that.  It does look like if I increase vacuum_cost_limit to 1
> and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
> 2-3 before each pollsys.  The math seems completely wrong on that,
> though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
> a vacuum without cost_delay and make sure the pollsys() are connected to
> the cost delay and not something else.

Hmmm.  Looks like, at least in 8.3, running a manual vacuum on a table
doesn't prevent anti-wraparound vacuum from restarting.   So I can't do
any further testing until we can restart the server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-tuple memory leak in 9.0

2010-08-18 Thread Tom Lane

Dean Rasheed  writes:
> The problem is that the trigger code assumes that anything it
> allocates in the per-tuple memory context will be freed per-tuple
> processed, which used to be the case because the loop in ExecutePlan()
> calls ResetPerTupleExprContext() once each time round the loop, and
> that used to correspond to once per tuple.

> However, with the refactoring of that code out to nodeModifyTable.c,
> this is no longer the case because the ModifyTable node processes all
> the tuples from the subquery before returning, so I guess that the
> loop in ExecModifyTable() needs to call ResetPerTupleExprContext()
> each time round.

Hmmm ... it seems a bit unclean to be resetting the output-tuple
exprcontext at a level below the top of the plan.  I agree that that's
probably the sanest fix at the moment, but I fear we may need to revisit
this in connection with writable CTEs.  We might need a separate output
tuple context for each ModifyTable node, or something like that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!

2010-08-18 Thread Robert Haas

Kevin didn't send out an official gavel-banging announcement of the
end of CommitFest 2009-07 (possibly because I neglected until today to
give him privileges to actually change it in the web application), but
I'd just like to take a minute to thank him publicly for his efforts.
We started this CommitFest with something like 60 patches, which is
definitely on the larger side for a CommitFest, and Kevin did a great
job staying on top of what was going on with all of them and, I felt,
really helped keep us on track.  At the same time, I felt he did this
with a very light touch that made the whole thing go very smoothly.
So -- thanks, Kevin!

I also appreciate the efforts of all those who reviewed.  Good reviews
are really critical to keep the burden from building up on committers,
and I appreciate the efforts of everyone who contributed, in many
cases probably on their own time.  I'm particularly grateful to the
people who were vigilant about spelling, grammar, coding style,
whitespace, and other nitpicky little issues that are not much fun,
but which at least for me are a major time sink if they're still
lingering when it comes time to do the actual commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus


> That would explain all the writes, but it doesn't seem to explain why
> your two servers aren't behaving similarly.

Well, that's why I said "ostensibly identical".  There may in fact be
differences, not just in the databases but in some OS libs as well.
These servers have been in production for quite a while, and the owner
has a messy deployment process.

> Most likely that's the libc implementation of the select()-based sleeps
> for vacuum_cost_delay.  I'm still suspicious that the writes are eating
> more cost_delay points than you think.

Tested that.  It does look like if I increase vacuum_cost_limit to 1
and lower vacuum_cost_page_dirty to 10, it reads 5-7 pages and writes
2-3 before each pollsys.  The math seems completely wrong on that,
though -- it should be 50 and 30 pages, or similar.  If I can, I'll test
a vacuum without cost_delay and make sure the pollsys() are connected to
the cost delay and not something else.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: utf8_to_unicode (trivial)

2010-08-18 Thread Tom Lane

Robert Haas  writes:
> Anyway, it's not really important enough to me to have a protracted
> argument about it.  Let's wait and see if anyone else has an opinion,
> and perhaps a consensus will emerge.

Well, nobody else seems to care, so I went ahead and committed the
shorter form of the patch, ie just rename & export the function.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Kevin Grittner

Tom Lane  wrote:
> Josh Berkus  writes:
 
>> This is an anti-wraparound vacuum, so it could have something to
>> do with the hint bits.  Maybe it's setting the freeze bit on
>> every page, and writing them one page at a time?
> 
> That would explain all the writes, but it doesn't seem to explain
> why your two servers aren't behaving similarly.
 
One was bulk-loaded from the other, or they were bulk-loaded at
different times?  Or one had some other activity that boosted the
xid count, possibly in another database?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] trace_recovery_messages

2010-08-18 Thread Tom Lane

Fujii Masao  writes:
> The explanation of trace_recovery_messages in the document
> is inconsistent with the definition of it in guc.c.

Setting the default to WARNING is confusing and useless, because
there are no trace_recovery calls with that debug level.  IMO the
default setting should be LOG, which makes trace_recovery() a clear
no-op (rather than not clearly a no-op).  There is circumstantial
evidence in the code that this was the original intention:

int trace_recovery_messages = LOG;

The documentation of the parameter is about as clear as mud, too.
We need to explain what it does rather than just copy-and-paste
a lot of text from log_min_messages.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane

Josh Berkus  writes:
>> Rather, what you need to be thinking about is how
>> come vacuum seems to be making lots of pages dirty on only one of these
>> machines.

> This is an anti-wraparound vacuum, so it could have something to do with
> the hint bits.  Maybe it's setting the freeze bit on every page, and
> writing them one page at a time?

That would explain all the writes, but it doesn't seem to explain why
your two servers aren't behaving similarly.

> Still don't understand the call to pollsys, even so, though.

Most likely that's the libc implementation of the select()-based sleeps
for vacuum_cost_delay.  I'm still suspicious that the writes are eating
more cost_delay points than you think.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus


> On further reflection, though: since we put in the BufferAccessStrategy
> code, which was in 8.3, the background writer isn't *supposed* to be
> very much involved in writing pages that are dirtied by VACUUM.  VACUUM
> runs in a small ring of buffers and is supposed to have to clean its own
> dirt most of the time.  So it's wrong to blame this on the bgwriter not
> holding up its end.  Rather, what you need to be thinking about is how
> come vacuum seems to be making lots of pages dirty on only one of these
> machines.

This is an anti-wraparound vacuum, so it could have something to do with
the hint bits.  Maybe it's setting the freeze bit on every page, and
writing them one page at a time?  Still don't understand the call to
pollsys, even so, though.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Tom Lane

Josh Berkus  writes:
>> What I find interesting about that trace is the large proportion of
>> writes.  That appears to me to indicate that it's *not* a matter of
>> vacuum delays, or at least not just a matter of that.  The process seems
>> to be getting involved in having to dump dirty buffers to disk.  Perhaps
>> the background writer is malfunctioning?

> You appear to be correct in that it's write-related.  Will be testing on
> what specificially is producing it.

> Note that this is one of two ostensibly duplicate servers, and the issue
> has never appeared on the other server.

On further reflection, though: since we put in the BufferAccessStrategy
code, which was in 8.3, the background writer isn't *supposed* to be
very much involved in writing pages that are dirtied by VACUUM.  VACUUM
runs in a small ring of buffers and is supposed to have to clean its own
dirt most of the time.  So it's wrong to blame this on the bgwriter not
holding up its end.  Rather, what you need to be thinking about is how
come vacuum seems to be making lots of pages dirty on only one of these
machines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Return of the Solaris vacuum polling problem -- anyone remember this?

2010-08-18 Thread Josh Berkus


> What I find interesting about that trace is the large proportion of
> writes.  That appears to me to indicate that it's *not* a matter of
> vacuum delays, or at least not just a matter of that.  The process seems
> to be getting involved in having to dump dirty buffers to disk.  Perhaps
> the background writer is malfunctioning?

You appear to be correct in that it's write-related.  Will be testing on
what specificially is producing it.

Note that this is one of two ostensibly duplicate servers, and the issue
has never appeared on the other server.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Jaime Casanova

On Wed, Aug 18, 2010 at 11:29 AM, Peter Eisentraut  wrote:
> On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
>> >> creating collations ...FATAL:  invalid byte sequence for encoding
>> >> "UTF8": 0xe56c09
>> >> CONTEXT:  COPY tmp_pg_collation, line 86
>> >> STATEMENT:  COPY tmp_pg_collation FROM
>> >> E'/usr/local/pgsql/9.1/share/locales.txt';
>> >> """
>> >
>> > Hmm, what is in that file on that line?
>> >
>> >
>>
>> bokmål  ISO-8859-1
>
> Hey, that borders on genius: Use a non-ASCII letter in the name of a
> locale whose purpose it is to configure how non-ASCII letters are
> interpreted. :-/
>
> Interestingly, I don't see this on a Debian system.  Good thing to know
> that this needs separate testing on different Linux variants.
>
>

Yeah! and when installing centos 5 i don't have a chance to choose
what locales i want, it just installs all of them

-- 
Jaime Casanova         www.2ndQuadrant.com
Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié ago 18 13:10:19 -0400 2010:

> I think what is frustrating is that we have a mental image of what the
> history looks like in CVS based on what we actually do, and it doesn't
> look anything like the history that cvs2git created.  You can to all
> kinds of crazy things in CVS, like tag the whole tree and then move
> the tags on half a dozen individual files forward or backward in time,
> or delete the tags off them altogether.  But we believe (perhaps
> naively) that we haven't done those things, so we're expecting to get
> a simple linear history without merges, and definitely without commits
> from one branch jumping into the midst of other branches.

In fact, we went some lengths to remove some of the more problematic
artifacts in our original CVS repository, so that a Git conversion
wouldn't have a problem with them.  It's disappointing that it ends up
punting in this manner.

I do welcome the offer of Michael's development time to solve our
problems.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Per-tuple memory leak in 9.0

2010-08-18 Thread Dean Rasheed

While testing triggers, I came across the following memory leak.
Here's a simple test case:

CREATE TABLE foo(a int);

CREATE OR REPLACE FUNCTION trig_fn() RETURNS trigger AS
$$
BEGIN
  RETURN NEW;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER ins_trig BEFORE INSERT ON foo
  FOR EACH ROW EXECUTE PROCEDURE trig_fn();

INSERT INTO foo SELECT g
  FROM generate_series(1, 500) AS g;

Memory usage goes up by around 100 bytes per row for the duration of the query.

The problem is that the trigger code assumes that anything it
allocates in the per-tuple memory context will be freed per-tuple
processed, which used to be the case because the loop in ExecutePlan()
calls ResetPerTupleExprContext() once each time round the loop, and
that used to correspond to once per tuple.

However, with the refactoring of that code out to nodeModifyTable.c,
this is no longer the case because the ModifyTable node processes all
the tuples from the subquery before returning, so I guess that the
loop in ExecModifyTable() needs to call ResetPerTupleExprContext()
each time round.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Robert Haas

On Wed, Aug 18, 2010 at 12:18 PM, Michael Haggerty  wrote:
> Tom Lane wrote:
>> Michael Haggerty  writes:
>>> The "exclusive" possibility is to ignore the fact that some of the
>>> content of B4 came from trunk and to pretend that FILE1 just appeared
>>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>
>>> T0 -- T1 -- T2  T3 -- T4        TRUNK
>>>        \
>>>         B1 -- B2 -- B3 -- B4            BRANCH1
>>
>>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>>
>> Maybe not, but that *is* how things appeared in the CVS history, [...]
>
> I forgot to point out that "the CVS history" looks nothing like this,
> because the CVS history is only defined file by file.  So the CVS
> history of FILE0 might look like this:
>
>  1.0 - 1.1 -- 1.2 - 1.3 - 1.4        TRUNK
>        \
>         1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4            BRANCH1
>
> whereas the history of FILE1 probably looks more like this:
>
>                  1.1 - 1.2 - 1.3        TRUNK
>                                         \
>                                          1.2.2.1 -- 1.2.2.2 BRANCH1
>
> (here I've tried to put corresponding commits in the same relative
> location) and there might be a FILE2 that looks like this:
>
>  1.0  1.1 --- 1.2        TRUNK
>                   \
>                    *no commit here*                         BRANCH1
>
> Perhaps this makes it clearer why creating a single git history requires
> some compromises.

I think we all understand that the conversion process may create some
artifacts.  Also, since I think this has not yet been mentioned, I
really appreciate you being willing to jump into this discussion and
possibly try to write some code to help us get what we want.

I think what is frustrating is that we have a mental image of what the
history looks like in CVS based on what we actually do, and it doesn't
look anything like the history that cvs2git created.  You can to all
kinds of crazy things in CVS, like tag the whole tree and then move
the tags on half a dozen individual files forward or backward in time,
or delete the tags off them altogether.  But we believe (perhaps
naively) that we haven't done those things, so we're expecting to get
a simple linear history without merges, and definitely without commits
from one branch jumping into the midst of other branches.  What was
really alarming to me about what I found yesterday is that - even
after reading your explanation - I can't understand why it did that.
I think it's human nature to like it when good things happen to us and
to dislike it when bad things happen to us, but we tend to hate the
bad things a lot more when we feel like we didn't deserve it.  If
you're going 90 MPH and get a speeding ticket, you may be steamed, but
at some level you know you deserved it.  If you were going 50 MPH on a
road where the speed limit is 55 MPH and the cop tickets you for 60
MPH, even the most mild-mannered driver may feel an urge to say
something less polite than "thank you, officer".  Hence our
consternation.  Perhaps there is some way to tilt your head so that
these merge commits are the Right Thing To Do, but to me at least it
feels extremely weird and inexplicable.  If at some point, we had
taken the majority of the deltas between 9.0 and 8.3 and put them into
8.3 and the converter said "oh, that's a merge", well, we might want
an option to turn that behavior off, but at least it would be clear
why it happened.  But the merge commit that got fabricated here almost
by definition has to be ignoring the vast bulk of the activity on one
side, which just doesn't feel right.

To what degree does your proposed solution (an "exclusive" option)
resemble "don't ever create merge commits"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Joshua D. Drake

On Wed, 2010-08-18 at 12:26 -0400, Alvaro Herrera wrote:
> Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
> > On Wed, Aug 18, 2010 at 17:33, Khee Chin  wrote:
> > > I previously proposed off-list an alternate solution to generate the git
> > > repository which was turned down due to it not being able to handle
> > > incremental updates. However, since we are now looking at a one-time
> > > conversion, this method might come in handy.
> > 
> > cvs2git *is* the tool we've been using now that it's a one-off
> > conversion. It's the one that's causing the current problems.

We had a lot of luck with cvs to svn conversion in the past. And
supposedly the git-svn stuff is top notch. It may be worth a shot.

JD

> -- 
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera

Excerpts from Michael Haggerty's message of mié ago 18 12:00:44 -0400 2010:

> 3. Run
> 
> git filter-branch
> 
> This rewrites the commits using any parentage changes from the grafts
> file.  This changes most commits' SHA1 hashes.  After this you can
> discard the .git/info/grafts file.  You would then want to remove the
> original references, which were moved to "refs/original".

Hmm.  If I need to do two changes in the same branch, do I need to
mention the new SHA1 for the second one (after filter-branch changes its
SHA1), or the original one?  If the former, then this is going to be a
very painful process.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Per-column collation, proof of concept

2010-08-18 Thread Peter Eisentraut

On tis, 2010-08-17 at 01:16 -0500, Jaime Casanova wrote:
> >> creating collations ...FATAL:  invalid byte sequence for encoding
> >> "UTF8": 0xe56c09
> >> CONTEXT:  COPY tmp_pg_collation, line 86
> >> STATEMENT:  COPY tmp_pg_collation FROM
> >> E'/usr/local/pgsql/9.1/share/locales.txt';
> >> """
> >
> > Hmm, what is in that file on that line?
> >
> >
> 
> bokmål  ISO-8859-1

Hey, that borders on genius: Use a non-ASCII letter in the name of a
locale whose purpose it is to configure how non-ASCII letters are
interpreted. :-/

Interestingly, I don't see this on a Debian system.  Good thing to know
that this needs separate testing on different Linux variants.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera

Excerpts from Magnus Hagander's message of mié ago 18 11:52:58 -0400 2010:
> On Wed, Aug 18, 2010 at 17:33, Khee Chin  wrote:
> > I previously proposed off-list an alternate solution to generate the git
> > repository which was turned down due to it not being able to handle
> > incremental updates. However, since we are now looking at a one-time
> > conversion, this method might come in handy.
> 
> cvs2git *is* the tool we've been using now that it's a one-off
> conversion. It's the one that's causing the current problems.

I think the point is to run the repo through cvsclone, which apparently
changes the repo in some (not documented) ways, removing "corruption".
Not sure how this is an essential part of Khee Chin's proposal.

The cited URL is no longer valid however.  The code can be found here
http://samba.org/ftp/tridge/rtc/cvsclone.l

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Tom Lane wrote:
> Michael Haggerty  writes:
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
> 
>> T0 -- T1 -- T2  T3 -- T4TRUNK
>>\
>> B1 -- B2 -- B3 -- B4BRANCH1
> 
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
> 
> Maybe not, but that *is* how things appeared in the CVS history, [...]

I forgot to point out that "the CVS history" looks nothing like this,
because the CVS history is only defined file by file.  So the CVS
history of FILE0 might look like this:

 1.0 - 1.1 -- 1.2 - 1.3 - 1.4TRUNK
\
 1.1.2.1 -- 1.1.2.2 -- 1.1.2.3 -- 1.1.2.4BRANCH1

whereas the history of FILE1 probably looks more like this:

  1.1 - 1.2 - 1.3TRUNK
 \
  1.2.2.1 -- 1.2.2.2 BRANCH1

(here I've tried to put corresponding commits in the same relative
location) and there might be a FILE2 that looks like this:

 1.0  1.1 --- 1.2TRUNK
   \
*no commit here* BRANCH1

Perhaps this makes it clearer why creating a single git history requires
some compromises.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Robert Haas wrote:
> Exactly.  IMHO, the way this should work is by starting at the
> beginning of time and working forward.  [...]

What you are describing is more or less the algorithm that was used by
cvs2svn version 1.x.  It mostly works, but has nasty edge cases that are
impossible to fix.

cvs2svn version 2.x uses a better algorithm [1].  It can be changed to
add an "exclusive" mode, it's a simple matter of programming.  I will
try to find some time to work on it.

Michael

[1]
http://cvs2svn.tigris.org/source/browse/cvs2svn/trunk/doc/design-notes.txt?view=markup

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Alvaro Herrera wrote:
> Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:
> 
>> [...]  Alternatively, you
>> could write a tool that would rewrite the ancestry information in the
>> repository *after* the cvs2git conversion using .git/info/grafts (see
>> git-filter-branch(1)).  Such rewriting would have to occur before the
>> repository is published, because the rewriting will change the hashes of
>> most commits.
> 
> AFAICT, graft points are not checked in[1], thus they don't propagate; are
> you saying that we should run the migration, then manually inject the
> graft points, then run some conversion tool that writes a different
> repository with those graft points welded into the history?  This sounds
> like it needs some manual work (namely find out the appropriate graft
> points for each branch), that can be prepared beforehand.  Otherwise it
> seems easier than reworking the cvs2git code for the "mostly-exclusive"
> option.

It is true that grafts are not propagated, but they can be baked into a
repository (at the cost of rewriting the SHA1 hashes) using "git
filter-branch".  The procedure would be as follows:

1. Convert using cvs2git

2. Create a file .git/info/grafts containing the changes that you want
to make to the project's ancestry.  The file has the format

commit parent0 parent1 ...

where each of the entries is a SHA1 hash from the existing repository.
Only commits whose parentage should be changed need to be mentioned.
This is the tricky step because it requires some logic to decide what
needs changing.  And it can only be done after the cvs2git conversion,
because it requires the SHA1s resulting from the conversion.

3. Run

git filter-branch

This rewrites the commits using any parentage changes from the grafts
file.  This changes most commits' SHA1 hashes.  After this you can
discard the .git/info/grafts file.  You would then want to remove the
original references, which were moved to "refs/original".

4. Publish the repository.

As long as the repository is only published after the grafts have been
baked in, there is no reason that anybody else would need the grafts file.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Robert Haas

On Wed, Aug 18, 2010 at 11:03 AM, Tom Lane  wrote:
> Michael Haggerty  writes:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4.  How should this series of events be
>> represented in a git repository?
>> ...
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>
>> T0 -- T1 -- T2  T3 -- T4        TRUNK
>>        \
>>         B1 -- B2 -- B3 -- B4            BRANCH1
>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>
> Maybe not, but that *is* how things appeared in the CVS history, and
> we'd rather have a git history that looks like the CVS history than
> one that claims that boatloads of utterly unrelated commits are part
> of a branch's history.

Exactly.  IMHO, the way this should work is by starting at the
beginning of time and working forward.  At each step, we examine the
earliest revision of each file for which no git commit has yet been
written.  From among those, we select the one with the earliest
timestamp.  We then also select all other files whose most recent
unprocessed revision is nearly contemporaneous and shares the same
author and log message.  From the results, we generate a commit.  Then
we repeat.  When we arrive at a branch point, the branch gets
processed separately from the trunk.  If there is no trunk rev which
has every file at the rev where it starts on the branch, then we use
some sane algorithm to pick the best one (perhaps, the one that has
the right revs of the most files) and then insert a fixup commit on
the branch to remove the deltas and carry on as before.

> The "inclusive" possibility might be tolerable if it restricted itself
> to mentioning commits that actually touched FILE1 in between its
> addition to TRUNK and its addition to BRANCH1.  So far as I can see,
> though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
> ... not even between T3 and B4, but back to the branch point.  How can
> you possibly justify that as either sane or useful?

git can't do that.  It's finding those commits by following parent
pointers from the merge commits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander

On Wed, Aug 18, 2010 at 17:33, Khee Chin  wrote:
> I previously proposed off-list an alternate solution to generate the git
> repository which was turned down due to it not being able to handle
> incremental updates. However, since we are now looking at a one-time
> conversion, this method might come in handy.

cvs2git *is* the tool we've been using now that it's a one-off
conversion. It's the one that's causing the current problems.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Tom Lane wrote:
> Michael Haggerty  writes:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4.  How should this series of events be
>> represented in a git repository?
>> ...
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
> 
>> T0 -- T1 -- T2  T3 -- T4TRUNK
>>\
>> B1 -- B2 -- B3 -- B4BRANCH1
> 
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
> 
> Maybe not, but that *is* how things appeared in the CVS history, and
> we'd rather have a git history that looks like the CVS history than
> one that claims that boatloads of utterly unrelated commits are part
> of a branch's history.
> 
> The "inclusive" possibility might be tolerable if it restricted itself
> to mentioning commits that actually touched FILE1 in between its
> addition to TRUNK and its addition to BRANCH1.  So far as I can see,
> though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
> ... not even between T3 and B4, but back to the branch point.  How can
> you possibly justify that as either sane or useful?

There is no way, in git, to claim that (say) T3 was incorporated into B4
but that T2 was not.  If T3 is listed as a parent of B4, then it is
implied that all ancestors of T3 are also incorporated into B4.  This is
a crucial simplification that helps DVCSs merge reliably.  So an
"exclusive" option is definitely the way to go for the postgresql project.

[By the way, it *is* possible to list the commits that touched FILE1:

git log BRANCH1 -- FILE1

The user would first have to find out that FILE1 is the file that is the
subject of merge B4, which could be done using "git diff B3..B4".  But I
am not arguing that this is the preferred solution, given your project's
practice to do cherry-picks and never full merges.]

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

2010/8/18 Tom Lane :
> Pavel Stehule  writes:
>> 2010/8/18 Tom Lane :
>>> There would be plenty of scope to re-use the machinery without any
>>> SQL-level extensions.  All you need is a polymorphic aggregate
>>> transition function that maintains a tuplestore or whatever.
>
>> Have we to use a transisdent function? If we implement median as
>> special variant of aggregate - because we need to push an sort, then
>> we can skip a transident function function - and call directly final
>> function.
>
> Well, that would require a whole bunch of *other* mechanisms, which you
> weren't saying anything about in your original proposal.  But driving
> it off the transtype declaration would be quite inappropriate anyway IMO.
>

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

Regards

Pavel Stehule

>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane

Pavel Stehule  writes:
> 2010/8/18 Tom Lane :
>> There would be plenty of scope to re-use the machinery without any
>> SQL-level extensions. Â All you need is a polymorphic aggregate
>> transition function that maintains a tuplestore or whatever.

> Have we to use a transisdent function? If we implement median as
> special variant of aggregate - because we need to push an sort, then
> we can skip a transident function function - and call directly final
> function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal.  But driving
it off the transtype declaration would be quite inappropriate anyway IMO.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Khee Chin

I previously proposed off-list an alternate solution to generate the git
repository which was turned down due to it not being able to handle
incremental updates. However, since we are now looking at a one-time
conversion, this method might come in handy.

---
Caveat: cvs2git apparently requires CVSROOT somewhere in the path for
it to work. I did a symbolic link of the current directory $PWD with
CVSROOT to bypass the quirk cvs2git requires.

mkdir work
cd work
wget http://ftp.netbsd.se/pkgsrc/distfiles/cvsclone-0.00/cvsclone.l
flex cvsclone.l && gcc -Wall -O2 lex.yy.c -o cvsclone
cvsclone -d :pserver:anon...@anoncvs.postgresql.org:/projects/cvsroot pgsql
ln -s $PWD CVSROOT
cvs2git --blobfile=blobfile --dumpfile=dumpfile --username pgdude
--encoding=UTF8 --fallback-encoding=UTF8 CVSROOT/pgsql > cvs2git.log
mkdir git && cd git && git init .
cat ../blobfile ../dumpfile | git fast-import
git reset --hard
cd ..
---


Regards,
Khee Chin.


On Wed, Aug 18, 2010 at 11:14 PM, Alvaro Herrera  wrote:

> Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:
>
> > cvs2git doesn't currently have this option.  I'm not sure how much work
> > it would be to implement; probably a few days'.  Alternatively, you
> > could write a tool that would rewrite the ancestry information in the
> > repository *after* the cvs2git conversion using .git/info/grafts (see
> > git-filter-branch(1)).  Such rewriting would have to occur before the
> > repository is published, because the rewriting will change the hashes of
> > most commits.
>
> AFAICT, graft points are not checked in[1], thus they don't propagate; are
> you saying that we should run the migration, then manually inject the
> graft points, then run some conversion tool that writes a different
> repository with those graft points welded into the history?  This sounds
> like it needs some manual work (namely find out the appropriate graft
> points for each branch), that can be prepared beforehand.  Otherwise it
> seems easier than reworking the cvs2git code for the "mostly-exclusive"
> option.
>
> I am sort of assuming that this "conversion tool" already exists, but
> maybe this is not the case?
>
> [1]
> http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor
>
> --
> Álvaro Herrera 
> The PostgreSQL Company - Command Prompt, Inc.
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Alvaro Herrera

Excerpts from Michael Haggerty's message of mié ago 18 05:01:29 -0400 2010:

> cvs2git doesn't currently have this option.  I'm not sure how much work
> it would be to implement; probably a few days'.  Alternatively, you
> could write a tool that would rewrite the ancestry information in the
> repository *after* the cvs2git conversion using .git/info/grafts (see
> git-filter-branch(1)).  Such rewriting would have to occur before the
> repository is published, because the rewriting will change the hashes of
> most commits.

AFAICT, graft points are not checked in[1], thus they don't propagate; are
you saying that we should run the migration, then manually inject the
graft points, then run some conversion tool that writes a different
repository with those graft points welded into the history?  This sounds
like it needs some manual work (namely find out the appropriate graft
points for each branch), that can be prepared beforehand.  Otherwise it
seems easier than reworking the cvs2git code for the "mostly-exclusive"
option.

I am sort of assuming that this "conversion tool" already exists, but
maybe this is not the case?

[1] 
http://stackoverflow.com/questions/1488753/how-to-merge-two-branches-without-a-common-ancestor

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter

On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote:
> 2010/8/18 Tom Lane :
> > David Fetter  writes:
> >> Apart from the medians, which "median-like" aggregates do you
> >> have in mind to start with?  If you can provide examples of
> >> "median-like" aggregates that people might need to implement as
> >> user-defined aggregates, or other places where people would use
> >> this machinery, it will make your case stronger for this
> >> refactoring.
> >
> > There would be plenty of scope to re-use the machinery without any
> > SQL-level extensions.  All you need is a polymorphic aggregate
> > transition function that maintains a tuplestore or whatever.  I
> > don't see that extra syntax in CREATE AGGREGATE is really buying
> > much of anything.
> >
> 
> Have we to use a transisdent function? If we implement median as
> special variant of aggregate - because we need to push an sort, then
> we can skip a transident function function - and call directly final
> function. This mechanism is used for aggregates with ORDER BY now.
> So there can be a special path for direct call of final func. There
> is useles to call transident function.

Just a wacky idea here.  Could we make a special state transition
function called IDENTITY or some such that would turn into a noop?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Tom Lane

Michael Haggerty  writes:
> So let's take the simplest example: a branch BRANCH1 is created from
> trunk commit T1, then some time later another FILE1 from trunk commit T3
> is added to BRANCH1 in commit B4.  How should this series of events be
> represented in a git repository?
> ...
> The "exclusive" possibility is to ignore the fact that some of the
> content of B4 came from trunk and to pretend that FILE1 just appeared
> out of nowhere in commit B4 independent of the FILE1 in TRUNK:

> T0 -- T1 -- T2  T3 -- T4TRUNK
>\
> B1 -- B2 -- B3 -- B4BRANCH1

> This is also wrong, because it doesn't reflect the true lineage of FILE1.

Maybe not, but that *is* how things appeared in the CVS history, and
we'd rather have a git history that looks like the CVS history than
one that claims that boatloads of utterly unrelated commits are part
of a branch's history.

The "inclusive" possibility might be tolerable if it restricted itself
to mentioning commits that actually touched FILE1 in between its
addition to TRUNK and its addition to BRANCH1.  So far as I can see,
though, cvs2git is mentioning *every* commit on TRUNK between T1 and B4
... not even between T3 and B4, but back to the branch point.  How can
you possibly justify that as either sane or useful?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane

David Fetter  writes:
> On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
>> There would be plenty of scope to re-use the machinery without any
>> SQL-level extensions.  All you need is a polymorphic aggregate
>> transition function that maintains a tuplestore or whatever.
>> I don't see that extra syntax in CREATE AGGREGATE is really buying
>> much of anything.

> Thanks for clarifying.  Might this help out with things like GROUPING
> SETS or wCTEs?

Don't see how --- this is just about what you can do within an aggregate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

2010/8/18 Tom Lane :
> David Fetter  writes:
>> Apart from the medians, which "median-like" aggregates do you have in
>> mind to start with?  If you can provide examples of "median-like"
>> aggregates that people might need to implement as user-defined
>> aggregates, or other places where people would use this machinery, it
>> will make your case stronger for this refactoring.
>
> There would be plenty of scope to re-use the machinery without any
> SQL-level extensions.  All you need is a polymorphic aggregate
> transition function that maintains a tuplestore or whatever.
> I don't see that extra syntax in CREATE AGGREGATE is really buying
> much of anything.
>

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now. So
there can be a special path for direct call of final func. There is
useles to call transident function.

Regards

Pavel




>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter

On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
> David Fetter  writes:
> > Apart from the medians, which "median-like" aggregates do you have in
> > mind to start with?  If you can provide examples of "median-like"
> > aggregates that people might need to implement as user-defined
> > aggregates, or other places where people would use this machinery, it
> > will make your case stronger for this refactoring.
> 
> There would be plenty of scope to re-use the machinery without any
> SQL-level extensions.  All you need is a polymorphic aggregate
> transition function that maintains a tuplestore or whatever.
> I don't see that extra syntax in CREATE AGGREGATE is really buying
> much of anything.

Thanks for clarifying.  Might this help out with things like GROUPING
SETS or wCTEs?

Cheers,
David (a little slow today).
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

2010/8/18 David Fetter :
> On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
>> 2010/8/18 David Fetter :
>> > Which median do you plan to implement?  Or do you plan to implement
>> > several different medians, each with distinguishing names?
>>
>> my proposal enabled implementation of any "median like" function. But
>> if we implement median as special case of aggregate, then some basic
>> "median" will be implemented.
>
> Apart from the medians, which "median-like" aggregates do you have in
> mind to start with?  If you can provide examples of "median-like"
> aggregates that people might need to implement as user-defined
> aggregates, or other places where people would use this machinery, it
> will make your case stronger for this refactoring.
>

I didn't think about some special median - this proposal is just about
aggregates with large a transistent data. Then the access to
tuplestore can be very usefull.

> Otherwise, it seems like a more reasonable thing to make the medians
> special case code.

yes, minimally for this moment.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fet...@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane

David Fetter  writes:
> Apart from the medians, which "median-like" aggregates do you have in
> mind to start with?  If you can provide examples of "median-like"
> aggregates that people might need to implement as user-defined
> aggregates, or other places where people would use this machinery, it
> will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter

On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
> 2010/8/18 David Fetter :
> > Which median do you plan to implement?  Or do you plan to implement
> > several different medians, each with distinguishing names?
> 
> my proposal enabled implementation of any "median like" function. But
> if we implement median as special case of aggregate, then some basic
> "median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with?  If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

2010/8/18 David Fetter :
> On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
>> 2010/8/18 Tom Lane :
>> > Pavel Stehule  writes:
>> >> I still thinking about a "median" type functions. My idea is to
>> >> introduce a new syntax for stype definition - like
>> >
>> >> stype = type, or
>> >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
>> >> stype = TUPLESTORE OF type, or
>> >> stype = TUPLESORT OF type [ DESC | ASC ]
>> >
>> > This seems like a fairly enormous amount of conceptual (and code)
>> > infrastructure just to make it possible to build median() out of
>> > spare parts.  It's also exposing some implementation details that
>> > I'd just as soon not expose in SQL.  I'd rather just implement
>> > median as a special-purpose aggregate.
>>
>> yes, it is little bit strange - but when we talked last time about
>> this topic, I understand, so you dislike any special solution for
>> this functionality. So I searched different more general way. On the
>> other hand, I agree so special purpose aggregate (with a few changes
>> in nodeAgg) can be enough. The median (and additional forms) is
>> really special and there are not wide used use case.
>
> Which median do you plan to implement?  Or do you plan to implement
> several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fet...@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread David Fetter

On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
> 2010/8/18 Tom Lane :
> > Pavel Stehule  writes:
> >> I still thinking about a "median" type functions. My idea is to
> >> introduce a new syntax for stype definition - like
> >
> >> stype = type, or
> >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
> >> stype = TUPLESTORE OF type, or
> >> stype = TUPLESORT OF type [ DESC | ASC ]
> >
> > This seems like a fairly enormous amount of conceptual (and code)
> > infrastructure just to make it possible to build median() out of
> > spare parts.  It's also exposing some implementation details that
> > I'd just as soon not expose in SQL.  I'd rather just implement
> > median as a special-purpose aggregate.
> 
> yes, it is little bit strange - but when we talked last time about
> this topic, I understand, so you dislike any special solution for
> this functionality. So I searched different more general way. On the
> other hand, I agree so special purpose aggregate (with a few changes
> in nodeAgg) can be enough. The median (and additional forms) is
> really special and there are not wide used use case.

Which median do you plan to implement?  Or do you plan to implement
several different medians, each with distinguishing names?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

2010/8/18 Tom Lane :
> Pavel Stehule  writes:
>> I still thinking about a "median" type functions. My idea is to
>> introduce a new syntax for stype definition - like
>
>> stype = type, or
>> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
>> stype = TUPLESTORE OF type, or
>> stype = TUPLESORT OF type [ DESC | ASC ]
>
> This seems like a fairly enormous amount of conceptual (and code)
> infrastructure just to make it possible to build median() out of spare
> parts.  It's also exposing some implementation details that I'd just as
> soon not expose in SQL.  I'd rather just implement median as a
> special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for this
functionality. So I searched different more general way. On the other
hand, I agree so special purpose aggregate (with a few changes in
nodeAgg) can be enough. The median (and additional forms) is really
special and there are not wide used use case.

Regards

Pavel

>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Tom Lane

Pavel Stehule  writes:
> I still thinking about a "median" type functions. My idea is to
> introduce a new syntax for stype definition - like

> stype = type, or
> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
> stype = TUPLESTORE OF type, or
> stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts.  It's also exposing some implementation details that I'd just as
soon not expose in SQL.  I'd rather just implement median as a
special-purpose aggregate.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Robert Haas

On Wed, Aug 18, 2010 at 8:45 AM, Greg Stark  wrote:
> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page  wrote:
>> Which is ideal for monitoring your own connection - having the info in
>> the pg_stat_activity is also valuable for monitoring and system
>> administration. Both would be ideal :-)
>
> Hm, I think I've come around to the idea that having the info in
> pg_stat_activity would be very nice. I can just picture sitting in
> pgadmin while a bunch of reports are running and seeing progress bars
> for all of them...
>
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries. This would improve
> the DBAs view of what's going on in the system immensely. Currently
> you have to grab the query and try to set up a similar environment for
> it to run explain on it. If analyze has run since or if the tables
> have grown or shrank or if the query was run with some constants as
> parameters it can be awkward. If some of the tables in the query were
> temporary tables it can be impossible. You can never really be sure
> you're looking at precisely the same plan than the other user's
> session is running.
>
> But stuffing the whole json or xml explain plan into pg_stat_activity
> seems like it doesn't really fit the same model that the existing
> infrastructure is designed around. It could be quite large and if we
> want to support progress feedback it could change quite frequently.
>
> We do stuff the whole query there (up to a limited size) so maybe I'm
> all wet and stuffing the explain plan in there would be fine?

It seems to me that progress reporting could add quite a bit of
overhead.  For example, in the whole-database vacuum case, the most
logical way to report progress would be to compute pages visited
divided by pages to be visited.  But the total number of pages to be
visited is something that doesn't need to be computed in advance
unless someone cares about progress.  I don't think we want to incur
that overhead in all cases just on the off chance someone might ask.
We need to think about ways to structure this so that it only costs
when someone's using it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Thom Brown

On 18 August 2010 13:45, Greg Stark  wrote:
> On Tue, Aug 17, 2010 at 11:29 PM, Dave Page  wrote:
>> Which is ideal for monitoring your own connection - having the info in
>> the pg_stat_activity is also valuable for monitoring and system
>> administration. Both would be ideal :-)
>
> Hm, I think I've come around to the idea that having the info in
> pg_stat_activity would be very nice. I can just picture sitting in
> pgadmin while a bunch of reports are running and seeing progress bars
> for all of them...
>
> But progress bars alone aren't really the big prize. I would really
> love to see the explain plans for running queries.

Do you mean just see the explain plan?  Or see at what stage of the
plan the query has reached?  I think the latter would be awesome.  And
if it's broken down by step, wouldn't it be feasible to knew how far
through that step it's got for some steps?  Obviously for ones with a
LIMIT applied it wouldn't know how far through it had got, but for
things like a sequential scan or sort it should be able to indicate
how far through it is.

-- 
Thom Brown
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security label support, part.2

2010-08-18 Thread Robert Haas

On Wed, Aug 18, 2010 at 8:49 AM, Stephen Frost  wrote:
> In the end, I'm thinking that if the external security module wants to
> enforce a check against all the children of a parent, they could quite
> possibly handle that already and do it in such a way that it won't break
> depending on the specific query.  To wit, it could query the catalog to
> determine if the current table is a parent of any children, and if so,
> go check the labels/permissions/etc on those children.  I'd much rather
> have something where the permissions check either succeeds or fails
> against the parent, depending on the permissions of the parent and its
> children, than on what the query is itself and what conditionals are
> applied to it.

Interesting idea.  Again, I haven't read the code, but seems worth
further investigation, at least.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security label support, part.2

2010-08-18 Thread Stephen Frost

* KaiGai Kohei (kai...@ak.jp.nec.com) wrote:
> If rte->requiredPerms would not be cleared, the user of the hook will
> be able to check access rights on the child tables, as they like.

This would only be the case for those children which are being touched
in the current query, which would depend on what conditionals are
applied, what the current setting of check_constraints is, and possibly
other factors.  I do *not* like this approach.

> How about an idea to add a new flag in RangeTblEntry which shows where
> the RangeTblEntry came from, instead of clearing requiredPerms?
> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
> on the child tables.

How about the external module just checks if the current object being
queried has parents, and if so, goes and checks the
labels/permissions/etc on those children?  That way the query either
always fails or never fails for a given caller, rather than sometimes
working and sometimes not depending on the query.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] security label support, part.2

2010-08-18 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> If C1, C2, and C3 inherit from P, it's perfectly reasonable to grant
> permissions to X on C1 and C2, Y on C3, and Z on C1, C2, C3, and P.  I
> don't think we should disallow that.  Sure, it's possible to do things
> that are less sane, but if we put ourselves in the business of
> removing useful functionality because it might be misused, we'll put
> ourselves out of business.
> 
> Having said that, I'm not sure that the same arguments really hold
> water in the world of label based security.  Suppose we have
> compartmentalized security: P is a table of threats, with C1
> containing data on nukes, C2 containing data on terrorists, and C3
> containing data on foreign militaries.  If we create a label for each
> of these threat types, we can apply that label to the corresponding
> table; but what label shall we assign P?  Logically, the label for P
> should be set up in such a fashion that the only people who can read P
> are those who can read C1, C2, and C3 anyway, but who is to say that
> such a label exists? Even if KaiGai's intended implementation of
> SE-PostgreSQL supports construction of such a label, who is to say
> that EVERY conceivable labeling system will also do so?

I don't see why using labels in the second case changes anything.
Consider roles.  If you only had a role that could see threats, a role
that could see nukes, and a role that could see terrorists, but no role
that could see all of them, it's the same problem.  Additionally, this
kind of problem *isn't* typically addressed with the semantics or the
structure of inheiritance- it's done with row-level security and is
completely orthogonal to the inheiritance issue.

Imagine a new table, C4, is added to P and the admin configures it such
that only the 'view_c4' role has access to that child table directly.
Now, Z can see what's in C4 through P, even though Z doesn't have access
to C4.  In the old system, if Z's query happened to hit C4, the whole
query would fail but at least Z wouldn't see any C4 data.  Other queries
on P done by Z would be fine, so long as they didn't hit C4.

> In fact, it
> seems to me that it might be far more reasonable, in a case like this,
> to ignore the *parent* label and look only at each *child* label,
> which to me is an argument that we should set this up so as to allow
> individual users of this hook to do as they like.

I think it'd be more reasonable to do this for inheiritance in general,
but the problem is that people use it for partitioning, and there is a
claim out there that it's against what the SQL spec says.  The folks
using inheiritance for partitioning would probably prefer to not have to
deal with setting up the permissions on the child tables.  I think
that's less of an issue now, but I didn't like the previous behavior
where certain queries would work and certain queries wouldn't work
against the parent table, either.

> It's also worth pointing out that the hook in ExecCheckRTPerms() does
> not presuppose label-based security.  It could be used to implement
> some other policy altogether, which only strengthens the argument that
> we can't know how the user of the hook wants to handle these cases.

This comes back around, in my view, to the distinction between really
using inheiritance for inheiritance, vs using it for partitioning.  If
it's used for partitioning (which certainly seems to be the vast
majority of the cases I've seen it used) then I think it should really
be considered and viewed as a single object to the authentication
system.  I don't suppose we're going to get rid of inheiritance for
inheiritance any time soon though.

In the end, I'm thinking that if the external security module wants to
enforce a check against all the children of a parent, they could quite
possibly handle that already and do it in such a way that it won't break
depending on the specific query.  To wit, it could query the catalog to
determine if the current table is a parent of any children, and if so,
go check the labels/permissions/etc on those children.  I'd much rather
have something where the permissions check either succeeds or fails
against the parent, depending on the permissions of the parent and its
children, than on what the query is itself and what conditionals are
applied to it.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Progress indication prototype

2010-08-18 Thread Greg Stark

On Tue, Aug 17, 2010 at 11:29 PM, Dave Page  wrote:
> Which is ideal for monitoring your own connection - having the info in
> the pg_stat_activity is also valuable for monitoring and system
> administration. Both would be ideal :-)

Hm, I think I've come around to the idea that having the info in
pg_stat_activity would be very nice. I can just picture sitting in
pgadmin while a bunch of reports are running and seeing progress bars
for all of them...

But progress bars alone aren't really the big prize. I would really
love to see the explain plans for running queries. This would improve
the DBAs view of what's going on in the system immensely. Currently
you have to grab the query and try to set up a similar environment for
it to run explain on it. If analyze has run since or if the tables
have grown or shrank or if the query was run with some constants as
parameters it can be awkward. If some of the tables in the query were
temporary tables it can be impossible. You can never really be sure
you're looking at precisely the same plan than the other user's
session is running.

But stuffing the whole json or xml explain plan into pg_stat_activity
seems like it doesn't really fit the same model that the existing
infrastructure is designed around. It could be quite large and if we
want to support progress feedback it could change quite frequently.

We do stuff the whole query there (up to a limited size) so maybe I'm
all wet and stuffing the explain plan in there would be fine?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: tuplestore, tuplesort aggregate functions

2010-08-18 Thread Pavel Stehule

Hello

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

when stype is ARRAY of then final and transistent functions can be a
PL functions. When stype isn't scalar, then sfunc can be undefined (it
use a buildin functions). Then we can implement a aggregate only with
final functions.

so median function can be defined:

CREATE FUNCTION num_median_final(internal) RETURNS numeric AS ...
CREATE AGGREGATE median(numeric) (stype = TUPLESORT OF numeric,
finalfunc = num_median_final);

This feature has impact primary on agg executor, and can be relative
simple - no planner changes (or not big), minimal parser changes.

Main reason for this feature is possible access to tuplesort and
tuplesort. I hope, so this can solve a problems with computing a
median and similar functions on very large datasets.

comments?

regards

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] security label support, part.2

2010-08-18 Thread Robert Haas

2010/8/18 KaiGai Kohei :
>> It's also worth pointing out that the hook in ExecCheckRTPerms() does
>> not presuppose label-based security.  It could be used to implement
>> some other policy altogether, which only strengthens the argument that
>> we can't know how the user of the hook wants to handle these cases.
>>
> If rte->requiredPerms would not be cleared, the user of the hook will
> be able to check access rights on the child tables, as they like.
> How about an idea to add a new flag in RangeTblEntry which shows where
> the RangeTblEntry came from, instead of clearing requiredPerms?
> If the flag is true, I think ExecCheckRTEPerms() can simply skip checks
> on the child tables.

Something along those lines might work, although I haven't yet
scrutinized the code well enough to have a real clear opinion on what
the best way of dealing with this is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GROUPING SETS revisited

2010-08-18 Thread Pavel Stehule

Hello

I found a break in GROUPING SETS implementation. Now I am playing with
own executor and planner node and I can't to go forward :(. Probably
this feature will need a significant update of our agg implementation.
Probably needs a some similar structure like CTE but it can be a
little bit reduced - there are a simple relation between source query
and result query - I am not sure, if this has to be implemented via
subqueries? The second question is relative big differencies between
GROUP BY behave and GROUP BY GROUPING SETS behave. Now I don't know
about way to join GROUP BY and GROUPING SETS together

Any ideas welcome

Regards

Pavel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander

On Wed, Aug 18, 2010 at 11:01, Michael Haggerty  wrote:
> Martijn van Oosterhout wrote:
>> On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
>>> So let's take the simplest example: a branch BRANCH1 is created from
>>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>>> is added to BRANCH1 in commit B4.  How should this series of events be
>>> represented in a git repository?
>>
>> 
>>
>>> The "exclusive" possibility is to ignore the fact that some of the
>>> content of B4 came from trunk and to pretend that FILE1 just appeared
>>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>>
>>> T0 -- T1 -- T2  T3 -- T4        TRUNK
>>>        \
>>>         B1 -- B2 -- B3 -- B4            BRANCH1
>>>
>>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
>>
>> But the "true lineage" is not stored anywhere in CVS so I don't see why
>> you need to fabricate it for git. Sure, it would be really nice if you
>> could, but if you can't do it reliably, you may as well not do it at
>> all. What's the loss?
>
> CVS does record (albeit somewhat ambiguously) the branch from which a
> new branch sprouted.  The history above might result from commands like
>
> cvs update -A
> cvs tag -b BRANCH1
>                    cvs update -r BRANCH1
> cvs commit -m T2              
> touch FILE1                   cvs commit -m B1
> cvs add FILE1                 
> cvs commit -m T3              cvs commit -m B2
>                              
>                              cvs commit -m B3
> cvs tag -b BRANCH1 FILE1
>
> or the last step might have been an explicit merge into BRANCH1:
>
>                              cvs update -j T1 -j T3
>                              cvs commit -m B4
>
> Either way, the CVS history relatively clearly indicates that content
> was ported from TRUNK to BRANCH1.  There is no way to distinguish
> whether it was a cherry-pick (not recordable in git's history) vs. a
> full merge without more information or more intelligence.

Well, in *our* case we know that it was a "cherry-pick". Because we've
done no full merges ;) So if there's a way for us to short-wire the
tool, that'd be great.


> Magnus Hagander wrote:
>> Our requirements are simple: our cvs history is linear, the git
>> history should be linear. It is *not* the same commit that's on head
>> and the branch. They are two different commits, that happen to have
>> the same commit message and mostly the same content.
>
> I don't think this is at all an issue of cvs2svn merging commits that
> happen to have the same commit message and/or commit time.  The merge
> commits are all manufactured by cvs2svn to do two things:
>
> 1. Add content that needs to be on the branch, because a file was added
> to the branch after the branch's creation.  This *needs* to be done to
> ensure that the branch has the correct content.

Ok.


> 2. Indicate the origin of the new branch content.  This goal is debatable.

I agree this is debatable. We've kind of debated it already (though
not in exactly this context) and decided we'd rather have it appear as
brand new content on this branch and not as a merge.


>> Bottom line is, we want zero merge commits in the git repository. We
>> may start using that sometime in the future (but for now, we've
>> decided we don't want that even in the future), but we most
>> *definitely* don't want it in the past. We don't care about
>> "representing the proper heritage of FILE1" in git, because we never
>> did in cvs.
>>
>> Is there some way to make cvs2git work this way, and just not bother
>> even trying to create merge commits, or is that fundamentally
>> impossible and we need to look at another tool?
>
> A merge is just a special case of content being taken from one branch
> and added to another.  Logically, the same thing happens when a branch
> is created, and some of the same problems can occur in that situation.
> A branch can be created using content from multiple source branches,
> which cvs2git currently also represents as a merge.

Can be, yes. AFAIK, we don't ever do that (though I can't swear to
that, since there have been some funky things in our cvs repository
earlier)


> Assuming that you don't want to discard all record of where a branch
> sprouted from, it is therefore necessary to choose a single parent
> branch for each branch creation.  To be sure, this choice can be
> incorrect the same way as the merge commits discussed above are
> incorrect.  But one reasonable "mostly-exclusive" approach would be to
> choose the most likely parent as the source of the branch and ignore all
> others.

Yes, I believe that is what we'd prefer, as it's what most closely
matches how *we*'ve been using CVS.


> cvs2git doesn't currently have this option.  I'm not sure how much work
> it would be to implement; probably a few days'.  Alternatively, you

Would this be something you'd consider doing, since it might be of
interest to others? I'm sure if it's

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Michael Haggerty

Martijn van Oosterhout wrote:
> On Wed, Aug 18, 2010 at 08:25:45AM +0200, Michael Haggerty wrote:
>> So let's take the simplest example: a branch BRANCH1 is created from
>> trunk commit T1, then some time later another FILE1 from trunk commit T3
>> is added to BRANCH1 in commit B4.  How should this series of events be
>> represented in a git repository?
> 
> 
> 
>> The "exclusive" possibility is to ignore the fact that some of the
>> content of B4 came from trunk and to pretend that FILE1 just appeared
>> out of nowhere in commit B4 independent of the FILE1 in TRUNK:
>>
>> T0 -- T1 -- T2  T3 -- T4TRUNK
>>\
>> B1 -- B2 -- B3 -- B4BRANCH1
>>
>> This is also wrong, because it doesn't reflect the true lineage of FILE1.
> 
> But the "true lineage" is not stored anywhere in CVS so I don't see why
> you need to fabricate it for git. Sure, it would be really nice if you
> could, but if you can't do it reliably, you may as well not do it at
> all. What's the loss?

CVS does record (albeit somewhat ambiguously) the branch from which a
new branch sprouted.  The history above might result from commands like

cvs update -A
cvs tag -b BRANCH1
   cvs update -r BRANCH1
cvs commit -m T2  
touch FILE1   cvs commit -m B1
cvs add FILE1 
cvs commit -m T3  cvs commit -m B2

  cvs commit -m B3
cvs tag -b BRANCH1 FILE1

or the last step might have been an explicit merge into BRANCH1:

  cvs update -j T1 -j T3
  cvs commit -m B4

Either way, the CVS history relatively clearly indicates that content
was ported from TRUNK to BRANCH1.  There is no way to distinguish
whether it was a cherry-pick (not recordable in git's history) vs. a
full merge without more information or more intelligence.

Magnus Hagander wrote:
> Our requirements are simple: our cvs history is linear, the git
> history should be linear. It is *not* the same commit that's on head
> and the branch. They are two different commits, that happen to have
> the same commit message and mostly the same content.

I don't think this is at all an issue of cvs2svn merging commits that
happen to have the same commit message and/or commit time.  The merge
commits are all manufactured by cvs2svn to do two things:

1. Add content that needs to be on the branch, because a file was added
to the branch after the branch's creation.  This *needs* to be done to
ensure that the branch has the correct content.

2. Indicate the origin of the new branch content.  This goal is debatable.

> Bottom line is, we want zero merge commits in the git repository. We
> may start using that sometime in the future (but for now, we've
> decided we don't want that even in the future), but we most
> *definitely* don't want it in the past. We don't care about
> "representing the proper heritage of FILE1" in git, because we never
> did in cvs.
> 
> Is there some way to make cvs2git work this way, and just not bother
> even trying to create merge commits, or is that fundamentally
> impossible and we need to look at another tool?

A merge is just a special case of content being taken from one branch
and added to another.  Logically, the same thing happens when a branch
is created, and some of the same problems can occur in that situation.
A branch can be created using content from multiple source branches,
which cvs2git currently also represents as a merge.

Assuming that you don't want to discard all record of where a branch
sprouted from, it is therefore necessary to choose a single parent
branch for each branch creation.  To be sure, this choice can be
incorrect the same way as the merge commits discussed above are
incorrect.  But one reasonable "mostly-exclusive" approach would be to
choose the most likely parent as the source of the branch and ignore all
others.

cvs2git doesn't currently have this option.  I'm not sure how much work
it would be to implement; probably a few days'.  Alternatively, you
could write a tool that would rewrite the ancestry information in the
repository *after* the cvs2git conversion using .git/info/grafts (see
git-filter-branch(1)).  Such rewriting would have to occur before the
repository is published, because the rewriting will change the hashes of
most commits.

Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] git: uh-oh

2010-08-18 Thread Magnus Hagander

On Wed, Aug 18, 2010 at 08:25, Michael Haggerty  wrote:
> Tom Lane wrote:
>> I lack git-fu pretty completely, but I do have the CVS logs ;-).
>> It looks like some of these commits that are being ascribed to the
>> REL8_3_STABLE branch were actually only committed on HEAD.  For
>> instance my commit in contrib/xml2 on 28 Feb 2010 21:31:57 was
>> only in HEAD.  It was back-patched a few hours later (1 Mar 3:41),
>> and that's also shown here, but the HEAD commit shouldn't be.
>>
>> I wonder whether the repository is completely OK and the problem
>> is that this webpage isn't filtering the commits correctly.
>
> Please don't panic :-)

We're not panic'ing just yet :-)


> The problem is that it is *impossible* to faithfully represent a CVS or
> Subversion history with its ancestry information in a git repository (or
> AFAIK any of the DVCS repositories).  The reason is that CVS
> fundamentally records the history of single files, and each file can
> have a branching history that is incompatible with those of other files.
>  For example, in CVS, a file can be added to a branch after the branch
> already exists, different files can be added to a branch from multiple
> parent branches, and even more perverse things are allowed.  The CVS
> history can record this mish-mash (albeit with much ambiguity).

It can. IIRC we have cleaned a couple of such things out.




> Given the choice between two wrong histories, cvs2git uses the
> "inclusive" style.  The result is that the ancestors of B4 include not
> only T0, T1, B1, B2, and B3 (as might be expected), but also T2 and T3.
>  The display in the website that was quoted [2] seems to mash all of the
> ancestors together without showing the topology of the history, making
> the result quite confusing.  The true history looks more like this:
>
> $ git log --oneline --graph REL8_3_10 master
> [...]
> | * 2a91f07 tag 8.3.10
> | * eb1b49f Preliminary release notes for releases 8.4.3, 8.3
> | * dcf9673 Use SvROK(sv) rather than directly checking SvTYP
> | * 1194fb9 Update time zone data files to tzdata release 201
> | * fdfd1ec Return proper exit code (3) from psql when ON_ERR
> | * 77524a1 Backport fix from HEAD that makes ecpglib give th
> | * 55391af Add missing space in example.
> | * 982aa23 Require hostname to be set when using GSSAPI auth
> | * cb58615 Update time zone data files to tzdata release 201
> | * ebe1e29 When reading pg_hba.conf and similar files, do no
> | * 5a401e6 Fix a couple of places that would loop forever if
> | * 5537492 Make contrib/xml2 use core xml.c's error handler,
> | * c720f38 Export xml.c's libxml-error-handling support so t
> | * 42ac390 Make iconv work like other optional libraries for
> | * b03d523 pgindent run on xml.c in 8.3 branch, per request
> | * 7efcdaa Add missing library and include dir for XSLT in M
> | * 6ab1407 Do not run regression tests for contrib/xml2 on M
> | * fff18e6 Backpatch MSVC build fix for XSLT
> | * 7ae09ef Fix numericlocale psql option when used with a nu
> | * de92a3d Fix contrib/xml2 so regression test still works w
> | *   80f81c3 This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | a08b04f Fix contrib/xml2 so regression test still works w
> * | 0d69e0f It's clearly now pointless to do backwards compat
> * | 4ad348c Buildfarm still unhappy, so I'll bet it's EACCES
> * | 6e96e1b Remove xmlCleanupParser calls from contrib/xml2.
> * | 5b65b67 add EPERM to the list of return codes to expect f
> | * a4067b3 Remove xmlCleanupParser calls from contrib/xml2.
> | * 91b76a4 Back-patch today's memory management fixups in co
> | * 5e74f21 Back-patch changes of 2009-05-13 in xml.c's memor
> | *   043041e This commit was manufactured by cvs2svn to crea
> | |\
> | |/
> |/|
> * | 98cc16f Fix up memory management problems in contrib/xml2
> * | 17e1420 Second try at fsyncing directories in CREATE DATA
> * | a350f70 Assorted code cleanup for contrib/xml2.  No chang
> * | 3524149 Update complex locale example in the documentatio
> [...]
>
> The left branch is master, the right branch is the one leading to
> REL8_3_10.  You can see that there are multiple merges from master to
> the branch, presumably when new files from trunk were ported to the
> branch.  This is even easier to see using a graphical history browser
> like gitk.

Yeah, this is clearly the problem.


> There are good arguments for both the "inclusive" and the "exclusive"
> representation of history.  The ideal would require a lot more
> intelligence and better heuristics (and slow down the conversion
> dramatically).  But even the smartest conversion would still be wrong,
> because git is simply incapable of representing an arbitrary CVS
> history.  The main practical result of the impedance mismatch is that it
> will be more difficult to merge between branches that originated in CVS
> (but that is no surprise!)

Our requirements are simple: our cvs history is linear, the git
history should be linear. It is *not* the same commit that's on

66 matches

Mail list logo