[HACKERS] pg_comparator table diff/sync

2007-05-12 Thread Erik 2.0
Is pg_comparator the only project out there that does what it does?  I
tried patching it, and it seems OK, but I'm not terribly confident in
my patch.  I'm hoping someone will tell me there's a great table-
driven rsync out there that everyone uses and I just don't know
about.

http://pgfoundry.org/tracker/index.php?func=detailaid=1004603group_id=137atid=232


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Simon Riggs
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote:
 For comparison, here's the test results with vanilla CVS HEAD:
 
   copy-head | 00:06:21.533137
   copy-head | 00:05:54.141285 

I'm slightly worried that the results for COPY aren't anywhere near as
good as the SELECT and VACUUM results. It isn't clear from those numbers
that the benefit really is significant.

Are you thinking that having COPY avoid cache spoiling is a benefit just
of itself? Or do you see a pattern of benefit from your other runs?

(BTW what was wal_buffers set to? At least twice the ring buffer size,
hopefully).

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Luke Lonergan
Hi Simon,

On 5/12/07 12:35 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 I'm slightly worried that the results for COPY aren't anywhere near as
 good as the SELECT and VACUUM results. It isn't clear from those numbers
 that the benefit really is significant.

COPY is bottlenecked on datum formation and format translation with very low
performance, so I don't think we should expect the ring buffer to make much
of a dent.

- Luke 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Use of ActiveSnapshot

2007-05-12 Thread Jan Wieck
The use of ActiveSnapshot throughout the code appears rather dangerous 
to me. It is a global pointer, assumed not to be set yet in some places, 
assumed to be saved and restored by the caller in others. The actual 
(context) memory it points to is sometimes explicitly freed, sometimes 
just left in the context and thrown away by MemoryContextDelete() 
without resetting ActiveSnapshot to NULL.


The comment for the call of pg_plan_queries in util/cache/plancache.c 
line 469 for example is fatally wrong. Not only should the snapshot be 
set by all callers at this point, but if the call actually does replan 
the queries, the existing ActiveSnapshot is replaced with one allocated 
on the current memory context. If this happens to be inside of a nested 
SPI call sequence, the innermost SPI stack frame will free the snapshot 
data without restoring ActiveSnapshot to the one from the caller.


Either calling pg_plan_queries() with needSnapshot=false or saving and 
restoring ActiveSnapshot will prevent the backend from dumping core in 
the mentioned example, but I am not entirely sure as to which one is the 
right solution.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)

2007-05-12 Thread Jim C. Nasby
Moving to -hackers.

On Fri, May 11, 2007 at 04:37:44PM +0100, Heikki Linnakangas wrote:
 If you know when the checkpoint ended, and you know how long each of the 
 pieces took, you can reconstruct the other times easily.  The way you 
 describe this it is true--that the summary is redundant given the 
 detail--but if you put yourself in the shoes of a log file parser the 
 other way around is easier to work with.  Piecing together log entries 
 is a pain, splitting them is easy.
 
 If I had to only keep one line out of this, it would be the one with the 
 summary.  It would be nice to have it logged at INFO.
 
 Yeah, if we have the summary line we don't need the other lines and vice 
 versa. I have sympathy for parsing log files, I've done that a lot in 
 the past and I can see what you mean. Having the individual lines is 
 nice when you're monitoring a running system; you don't get the summary 
 line until the checkpoint is finished. I suppose we can have both the 
 individual lines and the summary, the extra lines shouldn't hurt anyone, 
 and you won't get them unless you turn on the new log_checkpoints 
 parameter anyway.

Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Joshua D. Drake



Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring? Especially if that log parsing
is just going to result in data being inserted into a table anyway?

I know there's concern about performance of the stats system and maybe
that needs to be addressed, but pushing users to log parsing is a lot of
extra effort, non-standard, likely to be overlooked, and doesn't play
well with other tools. It also conflicts with all the existing
statistics framework.



One thing that doesn't seemed to be being looked at it is the cost of 
logging. Logging is very expensive. I don't know if it is more expensive 
than the stats system, but you can cut your tps in half by having any 
level of verbose logging on.


Yes that can be offset by pushing the logging to another spindle, and 
being careful about what you are logging but still.


Either way, we are taking the hit, it is just a matter of where. IMO it 
would be better to have the information in the database where it makes 
sense, than pushing out to a log that:


A. Will likely be forgotten
B. Is only accessible if you have shell access to the machine (not as 
common as all of us would like to think)


Sincerely,

Joshua D. Drake


--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] I am back, catching up on email

2007-05-12 Thread Bruce Momjian
I returned on Wednesday from my trip to Australia and India.  (I skipped
London.)  I returned early because my brother-in-law died on May 5 (for
details see http://momjian.us/main/news.html).

Anyway, I am slow catching up on email for that reason.  I should be
caught up by Tuesday/Wednesday.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] plperl vs. bytea

2007-05-12 Thread Bruce Momjian

Added to TODO:

o Allow data to be passed in native language formats, rather
  than only text

 http://archives.postgresql.org/pgsql-hackers/2007-05/msg00289$


---

Andrew Dunstan wrote:
 
 
 Tom Lane wrote:
  Andrew Dunstan [EMAIL PROTECTED] writes:

  Tino Wildenhain wrote:
  
  Andrew Dunstan schrieb:

  This does not need to be over-engineered, IMNSHO.
  
  Well could you explain where it would appear over-engineered?

 

  Anything that imposes extra requirements on type creators seems 
  undesirable.
  
 

  I'm not sure either that the UUID example is a very good one. This whole 
  problem arose because of performance problems handling large gobs of 
  data, not just anything that happens to be binary.
  
 
  Well, we realize that bytea has got a performance problem, but are we so
  sure that nothing else does?  I don't want to stick in a one-purpose
  wart only to find later that we need a few more warts of the same kind.
 
  An example of something else we ought to be considering is binary
  transmission of float values.  The argument in favor of that is not
  so much performance (although text-and-back conversion is hardly cheap)
  as it is that the conversion is potentially lossy, since float8out
  doesn't by default generate enough digits to ensure a unique
  back-conversion.
 
  ISTM there are three reasons for considering non-text-based
  transmission:
 
  1. Performance, as in the bytea case
  2. Avoidance of information loss, as for float
  3. Providing a natural/convenient mapping to the PL's internal data types,
 as we already do --- but incompletely --- for arrays and records
 
  It's clear that the details of #3 have to vary across PLs, but I'd
  like it not to vary capriciously.  For instance plperl currently has
  special treatment for returning perl arrays as SQL arrays, but AFAICS
  from the manual not for going in the other direction; plpython and
  pltcl overlook arrays entirely, even though there are natural mappings
  they could and should be using.
 
  I don't know to what extent we should apply point #3 to situations other
  than arrays and records, but now is the time to think about it.  An
  example: working with the geometric types in a PL function is probably
  going to be pretty painful for lack of simple access to the constituent
  float values (not to mention the lossiness problem).
 
  We should also be considering some non-core PLs such as PL/Ruby and
  PL/R; they might provide additional examples to influence our thinking.

 
 OK, we have a lot of work to do here, then.
 
 I can really only speak with any significant knowledge on the perl 
 front. Fundamentally, it has 3 types of scalars: IV, NV and PV (integer, 
 float, string). IV can accomodate at least the largest integer or 
 pointer type on the platform, NV a double, and PV an arbitrary string of 
 bytes.
 
 As for structured types, as I noted elsewhere we have some of the work 
 done for plperl. My suggestion would be to complete it for plperl and 
 get it fully orthogonal and then retrofit that to plpython/pltcl.
 
 I've actually been worried for some time that the conversion glue was 
 probably imposing significant penalties on the non-native PLs, so I'm 
 glad to see this getting some attention.
 
 
 cheers
 
 andrew
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Managing the community information stream

2007-05-12 Thread Bruce Momjian
Jim Nasby wrote:
 On May 6, 2007, at 8:18 AM, Andrew Dunstan wrote:
  Oh, the answer to Bruce's question about when to create a feature  
  item? You could well do it at the time when today you create a TODO  
  item. However, we might even do better. For example, we might well  
  add feature requests that are denied. That would help people to see  
  if something has been proposed before.
 
 The problem with our current TODO process is that whether an item  
 makes it onto the list is essentially determined by did the idea  
 catch a committer's attention, and did that committer happen to think  
 it was a good idea. That sets the bar pretty high for getting stuff  
 on the list (which you need for a simple list like TODO), but it also  
 means it's very subjective. (Of course 98% of the time that committer  
 is Bruce, but I don't think that matters here...)

Users often request items be added to the TODO list, and I usually comply.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Managing the community information stream

2007-05-12 Thread Bruce Momjian
Jim Nasby wrote:
 On May 6, 2007, at 8:18 AM, Andrew Dunstan wrote:
  Oh, the answer to Bruce's question about when to create a feature  
  item? You could well do it at the time when today you create a TODO  
  item. However, we might even do better. For example, we might well  
  add feature requests that are denied. That would help people to see  
  if something has been proposed before.

Uh, TODO has that:

Features We Do _Not_ Want

Do we need more items on that list?

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Managing the community information stream

2007-05-12 Thread Bruce Momjian

To follow up on Andrew's idea of tracking things back to the TODO or bug
number:

We could have a universal developer number, something like PGD#23432 as
a PostgreSQL Developer number.  We could assign them for submissions to
the bugs list, where we already assign a number.  I could easily add
them to TODO items that already don't have a number from the bugs list,
and we could use the number for postings to the patches list that again
don't already have a number.  (The PGD numbers would have value ranges
assigned for specific uses, like 0-10 are bugs, 11-20 are
assigned as TODO items, +30 are patches, etc.)

The idea is that if you are working on a TODO item you mention that
number in the email subject discussing it, and for postings to the
patches list.  A web application could then read from the email stream
and pull out information about any item.  The only overhead is people
mentioning the assigned number consistently.

One problem is that our development isn't linear --- often TODO items
are the result of several email threads, and TODO items are split and
merged regularly, meaning that a PGD number could be partially complete
or be merged with another number.  When this happens, the number might
cause confusion, and I don't see a way to fix that easily.

---

Dave Page wrote:
 Bruce Momjian wrote:
  The idea of the patch number in the subject line works with that
  streaming model because it merely marks streams so they can be grouped.
  The defining event that marks the stream is a post to the patches list.
  We already number posts to the bugs list, so in a way we could improve
  tracking there and somehow link it to TODO items and patch submissions,
  but because many TODO items are not the result of bug reports but come
  out of general discussions, I am not sure tracking would work as well
  there.  And what about features?  Do you start assigning numbers there,
  and what is your trigger event?  In my opinion, as you start trying to
  place more structure on the stream, the stream itself starts to degrade
  in its dynamism and ease of use.  To me, that is the fundamental issue,
  and risk.
 
 Bruce,
 
 I cannot really add to that except to say that you neatly summarized 
 what I've completely failed to in my last few emails to Andrew. I agree 
 completely.
 
 Regards, Dave.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Re: Performance monitoring (was: [PATCHES] Logging checkpoints and other slowdown causes)

2007-05-12 Thread Greg Smith

On Sat, 12 May 2007, Jim C. Nasby wrote:


Not to beat a dead horse, but do we really want to force folks to be
parsing logs for performance monitoring?


All of the really interesting DBA level information about checkpoints is 
now sitting in pg_stat_bgwriter.  There really is no reason I'd expect 
this information to be part of normal performance monitoring setup; for me 
it has fit into troubleshooting and RD.


When I'm talking about parsing logs in this context, it's for answering 
questions like how does fsync time correlate with amount of data written 
at checkpoint time?  Since I've found it educational to sort through 
that, I just didn't want the patch modified so that was harder to do than 
it had to be.


If I thought it worth the trouble to expose that level of information via 
the stats interface, I'd have submitted that instead of this log-based 
patch.  This circles back to the previous discussion of whether this 
particular information is strictly developer-level.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Greg Smith

On Sat, 12 May 2007, Joshua D. Drake wrote:

One thing that doesn't seemed to be being looked at it is the cost of 
logging.


If any of this executed at something like the query level, sure, that 
would be real important.  The majority of the logging I suggested here is 
of things that happen at checkpoint time.  The presumption is that the 
overhead of that is considerably greater than writing a log line or even 
five.


The least intensive event I would like to be loggable is when a new WAL 
segment is created and cleared, which is again a pretty small bit of log 
compared to the 16MB write.  I wouldn't mind seeing that exposed under 
pg_stats instead, just had more interesting things to statify first.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] What is happening on buildfarm member baiji?

2007-05-12 Thread Tom Lane
The last two runs on baiji have failed at the installcheck stage,
with symptoms that look a heck of a lot like the most recent system
catalog changes haven't taken effect (eg, it doesn't seem to know
about pg_type.typarray).  Given that the previous check step
passed, the most likely explanation seems to be that some part
of the install step failed --- I've not tried to reproduce the
behavior but it looks like it might be explained if the install
target's postgres.bki file was not getting overwritten.  So we
have two issues: what exactly is going wrong (some new form of
Vista brain death no doubt), and why isn't the buildfarm script
noticing?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Performance monitoring

2007-05-12 Thread Neil Conway
On Sat, 2007-12-05 at 14:26 -0700, Joshua D. Drake wrote:
 Either way, we are taking the hit, it is just a matter of where. IMO it 
 would be better to have the information in the database where it makes 
 sense, than pushing out to a log

If performance monitoring information is provided as a database object,
what would the right interface be? IMHO the problem with cleanly
presenting monitoring information within a normal database system is
that this sort of data is fundamentally dynamic and continuous: to
determine how the performance of the system changes over time, you need
to repeatedly rescan the table/view/SRF and recompute your analysis
essentially from scratch. Trying to get even simple information like
queries per second from pg_stat_activity is an example of how this can
be painful.

plug
BTW, if the system included the concept of a continuous data *stream* as
a kind of database object, this problem would be much more tractable :)
In fact, there is some code in a version of TelegraphCQ that exposes
various information about the runtime state of the system as a set of
system-defined data streams -- like any other stream, users could then
use those streams in arbitrary queries.
/plug

-Neil



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] What is happening on buildfarm member baiji?

2007-05-12 Thread Andrew Dunstan
Tom Lane wrote:
 The last two runs on baiji have failed at the installcheck stage,
 with symptoms that look a heck of a lot like the most recent system
 catalog changes haven't taken effect (eg, it doesn't seem to know
 about pg_type.typarray).  Given that the previous check step
 passed, the most likely explanation seems to be that some part
 of the install step failed --- I've not tried to reproduce the
 behavior but it looks like it might be explained if the install
 target's postgres.bki file was not getting overwritten.  So we
 have two issues: what exactly is going wrong (some new form of
 Vista brain death no doubt), and why isn't the buildfarm script
 noticing?



The script will not even run if the install directory exists:

  die $buildroot/$branch has $pgsql or inst directories!
if ((!$from_source  -d $pgsql) || -d inst);

But the install process is different for MSVC. It could be that we are
screwing up there.

I no longer have an MSVC box, so I can't tell so easily ;-(

cheers

andrew




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly