date:20070219

Re: [HACKERS] HOT WIP Patch - version 2

2007-02-19 Thread Hannu Krosing

Ühel kenal päeval, T, 2007-02-20 kell 12:08, kirjutas Pavan Deolasee:
> 
> Reposting - looks like the message did not get through in the first
> attempt. My apologies if multiple copies are received.
> 
> 
> This is the next version of the HOT WIP patch. Since the last patch
> that 
> I sent out, I have implemented the HOT-update chain pruning mechanism.
> 
> When following a HOT-update chain from the index fetch, if we notice
> that 
> the root tuple is dead and it is HOT-updated, we try to prune the
> chain to
> the smallest possible length. To do that, the share lock is upgraded
> to an
> exclusive lock and the tuple chain is followed till we find a
> live/recently-dead 
> tuple. At that point, the root t_ctid is made point to that tuple. In
> order to
> preserve the xmax/xmin chain, the xmax of the root tuple is also
> updated
> to xmin of the found tuple. Since this xmax is also <
> RecentGlobalXmin 
> and is a committed transaction, the visibility of the root tuple still
> remains
> the same.

What do you do, if there are no live tuples on the page ? will this
un-HOTify the root and free all other tuples in HOT chain ?

> 
> The intermediate heap-only tuples are  removed from the HOT-update
> chain.
> The HOT-updated status of these tuples is cleared and their respective
> t_ctid are made point to themselves. These tuples are not reachable
> now and ready for vacuuming.

Does this mean, that they are now indistinguishable from ordinary
tuples ?

Maybe they could be freed right away instead of changing HOT-updated
status and ctid ?

>  This entire action is logged in a single
> WAL record.
> 
> During vacuuming, we keep track of number of root tuples vacuumed. 
> If this count is zero, then the index cleanup step is skipped. This
> would avoid unnecessary index scans whenever possible.
> 
> This patch should apply cleanly on current CVS head and pass all
> regression
> tests. I am still looking for review comments from the first WIP
> patch. If anyone 
> has already looked through it and is interested in the incremental
> changes,
> please let me know. I can post that.
> 
> Whats Next ?
> -
> 
> ISTM that  the basic  HOT-updates and ability to prune the HOT-update
> chain, 
> should help us reduce the index bloat, limit the overhead of ctid
> following in
> index fetch and efficiently vacuum heap-only tuples. IMO the next
> important
> but rather less troublesome thing to tackle is to reuse space within a
> block 
> without complete vacuum of the table. This would help us do much more
> HOT-updates and thus further reduce index/heap bloat.
> 
> I am thinking of reusing the DEAD heap-only tuples which gets removed
> from
> the HOT-update chain as part of pruning operation. Since these tuples,
> once 
> removed from the chain, are neither reachable nor have any index
> references,
> could be readily used for storing newer versions of the same or other
> rows in
> the block. How about setting LP_DELETE on these tuples as part of the 
> prune operation ? LP_DELETE is unused for heap tuples, if I am not
> mistaken. Other information like length and offset are is maintained
> as it is.

Seems like a good idea.

> When we run out space for update-within-the-block, we traverse
> through all the line pointers looking for LP_DELETEd items. If any of
> these
> items have space large enough to store the new tuple, that item is
> reused.
> Does anyone see any issue with doing this ? Also, any suggestions 
> about doing it in a better way ?

IIRC the size is determined by the next tuple pointer, so you can store
new data without changing tuple pointer only if they are exactly the
same size.

> If the page gets really fragmented, we can try to grab a
> VACUUM-strength
> lock on the page and de-fragment it. The lock is tried conditionally
> to avoid
> any deadlocks. This is done in the heap_update() code path, so would
> add 
> some overhead, but may still prove better than putting the tuple in a 
> different block and having corresponding index insert(s). Also, since
> we are
> more concerned about the large tables, the chances of being able to
> upgrade 
> the exclusive lock to vacuum-strength lock are high. Comments ?

I'm not sure about the "we are more concerned about the large tables"
part. I see it more as a device for high-update tables. This may not
always be the same as "large", so there should be some fallbacks for
case where you can't get the lock. Maybe just give up and move to
another page ?

> If there are no objections, I am planning to work on the first part
> while Nikhil would take up the second task of block level
> retail-vacuum.
> Your comments on these issues and the patch are really appreciated.
> 
> Thanks,
> Pavan
> 
> -- 
> 
> EnterpriseDB http://www.enterprisedb.com 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 2

Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Tom Lane

"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> Nobody worked on this subject a lot of years. Who need it, knows about it 
> and use it. I am for this topic is documented and in doc is note about 
> difference from standard and about some incompletness of this operator.

Once we document it there's going to be a backward-compatibility issue
for any proposals to fix it.  And I do think it is broken in a number
of ways:
http://archives.postgresql.org/pgsql-patches/2003-08/msg00060.php

Per the link Joe cited, I'm not real thrilled with the spec's special
treatment of NULL, but I'd like to see us fix IS OF so that it can be
a substitute for testing tableoid to see which child table a tuple
came from.  And we definitely need to consider the domain issue.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] HOT WIP Patch - version 1

2007-02-19 Thread Simon Riggs

On Thu, 2007-02-15 at 10:49 +, Heikki Linnakangas wrote:

> Do we actually ever want to remove dead tuples from the middle of the 
> chain? If a tuple in the middle of the chain is dead, surely every tuple 
> before it in the chain is dead as well, and we want to remove them as 
> well. I'm thinking, removing tuples from the middle of the chain can be 
> problematic, because we'd need to fiddle with the xmin/xmax of the other 
> tuples to make them match. Or change the tuple-following logic to not do 
> the xmin=xmax check, but it's a nice robustness feature.

I think the phrase "middle of the chain" is being used in two ways here.

If a tuple in the chain is dead, then all prior tuples are dead also.
That gives us a live portion of the chain and a dead portion of the
chain.

We will only ever need to follow the chain along the live portion, so
breaking the live portion of the chain is not allowable. Breaking the
chain in the dead portion *is* allowable and that is what is being
proposed here. We break the chain rather than simply remove all of it
because there may still be index pointers to some of the dead rows. 

Pavan's code does this now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] HOT WIP Patch - version 1

2007-02-19 Thread Simon Riggs

On Fri, 2007-02-16 at 09:36 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
> Linnakangas:
> 
> > We already log tuple removals by normal vacuums. We can't use that wal 
> > entry as it is: if a dead tuple is in the middle of an update chain, it 
> > needs to be unlinked from the chain. But I don't see any particular 
> > problem with that, it just needs to be wal logged like every other data 
> > changing operation.
> > 
> > Do we actually ever want to remove dead tuples from the middle of the 
> > chain? If a tuple in the middle of the chain is dead, surely every tuple 
> > before it in the chain is dead as well, and we want to remove them as 
> > well. I'm thinking, removing tuples from the middle of the chain can be 
> > problematic, because we'd need to fiddle with the xmin/xmax of the other 
> > tuples to make them match. Or change the tuple-following logic to not do 
> > the xmin=xmax check, but it's a nice robustness feature.
> 
> What kind of robustness does it provide ? In other words - what failure
> scenario does this guard against ?
> 
> I can't see the case where the xmin=xmax check can not succeed, at least
> not for same page tuples.

The xmin == xmax case was required to support a complex VACUUM FULL
failure case discovered some time ago. I'll post the details.

With HOT, ISTM that testing this becomes even more important in some
form because of the extra complexity we're putting into that part of the
code. We need some way of detecting a corruption introduced by some
weird use case we haven't thought of.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Pavel Stehule

Joe Conway wrote:
> Bruce Momjian wrote:
> > Pavel Stehule wrote:
> >> Hello,
> >>
> >> I miss doc for this operator
> >
> > Strang IS [NOT] OF wasn't documented, especially seeing it was added 
in

> > PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.
>
> Here's the reason -- see this thread:
> http://archives.postgresql.org/pgsql-patches/2003-08/msg00062.php

Wow, interesting.  I do remember that now.  Should I revert the
documentation addition and add a comment to gram.y?

Nobody worked on this subject a lot of years. Who need it, knows about it 
and use it. I am for this topic is documented and in doc is note about 
difference from standard and about some incompletness of this operator.

regards
Pavel Stehule

_
Chcete sdilet sve obrazky a hudbu s prateli? http://messenger.msn.cz/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Robert Treat

On Monday 19 February 2007 11:27, Martijn van Oosterhout wrote:
> On Mon, Feb 19, 2007 at 05:10:36PM +0100, Dimitri Fontaine wrote:
> > > RAID and LVM too.  I can't get excited about re-inventing those wheels
> > > when perfectly good implementations already exist for us to sit on top
> > > of.
> >
> > I though moving some knowledge about data availability into PostgreSQL
> > code could provide some valuable performance benefit, allowing to
> > organize reads (for example parallel tables scan/indexes scan to
> > different volumes) and obtaining data from 'quicker' known volume (or
> > least used/charged).
>
> Well, organising requests to be handled quickly is a function of
> LVM/RAID, so we don't go there. However, speeding up scans by having
> multiple requests is an interesting approach, as would perhaps a
> different random_page_cost for different tablespaces.
>

On one of my systems I have 1 tablespace for read data (99-1), 1 for read 
mostly data (90-10), and 1 for write mostly (40-60).  The breakdown is based 
on a combination of the underlying hardware and usage patterns of the tables 
involved. I suspect that isn't that uncommon really.  I've often thought that 
being able to set guc variables to a specific tablespace (like you can do for 
users) would allow for a lot of flexibility in tuning queries that go across 
different tablespaces. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] HOT WIP Patch - version 2

2007-02-19 Thread Pavan Deolasee


Reposting - looks like the message did not get through in the first
attempt. My apologies if multiple copies are received.


This is the next version of the HOT WIP patch. Since the last patch that
I sent out, I have implemented the HOT-update chain pruning mechanism.

When following a HOT-update chain from the index fetch, if we notice that
the root tuple is dead and it is HOT-updated, we try to prune the chain to
the smallest possible length. To do that, the share lock is upgraded to an
exclusive lock and the tuple chain is followed till we find a
live/recently-dead
tuple. At that point, the root t_ctid is made point to that tuple. In order
to
preserve the xmax/xmin chain, the xmax of the root tuple is also updated
to xmin of the found tuple. Since this xmax is also < RecentGlobalXmin
and is a committed transaction, the visibility of the root tuple still
remains
the same.

The intermediate heap-only tuples are  removed from the HOT-update chain.
The HOT-updated status of these tuples is cleared and their respective
t_ctid are made point to themselves. These tuples are not reachable now
and ready for vacuuming. This entire action is logged in a single
WAL record.

During vacuuming, we keep track of number of root tuples vacuumed.
If this count is zero, then the index cleanup step is skipped. This
would avoid unnecessary index scans whenever possible.

This patch should apply cleanly on current CVS head and pass all regression
tests. I am still looking for review comments from the first WIP patch. If
anyone
has already looked through it and is interested in the incremental changes,
please let me know. I can post that.

Whats Next ?
-

ISTM that  the basic  HOT-updates and ability to prune the HOT-update chain,

should help us reduce the index bloat, limit the overhead of ctid following
in
index fetch and efficiently vacuum heap-only tuples. IMO the next important
but rather less troublesome thing to tackle is to reuse space within a block

without complete vacuum of the table. This would help us do much more
HOT-updates and thus further reduce index/heap bloat.

I am thinking of reusing the DEAD heap-only tuples which gets removed from
the HOT-update chain as part of pruning operation. Since these tuples, once
removed from the chain, are neither reachable nor have any index references,
could be readily used for storing newer versions of the same or other rows
in
the block. How about setting LP_DELETE on these tuples as part of the
prune operation ? LP_DELETE is unused for heap tuples, if I am not
mistaken. Other information like length and offset are is maintained as it
is.
When we run out space for update-within-the-block, we traverse
through all the line pointers looking for LP_DELETEd items. If any of these
items have space large enough to store the new tuple, that item is reused.
Does anyone see any issue with doing this ? Also, any suggestions
about doing it in a better way ?

If the page gets really fragmented, we can try to grab a VACUUM-strength
lock on the page and de-fragment it. The lock is tried conditionally to
avoid
any deadlocks. This is done in the heap_update() code path, so would add
some overhead, but may still prove better than putting the tuple in a
different block and having corresponding index insert(s). Also, since we are
more concerned about the large tables, the chances of being able to upgrade
the exclusive lock to vacuum-strength lock are high. Comments ?

If there are no objections, I am planning to work on the first part
while Nikhil would take up the second task of block level retail-vacuum.
Your comments on these issues and the patch are really appreciated.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


NewHOT-v2.0.patch.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Robert Treat

On Monday 19 February 2007 15:08, Bruce Momjian wrote:
> Joshua D. Drake wrote:
> > Andrew Sullivan wrote:
> > > On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> > >> Martijn van Oosterhout  writes:
> > >>> Somehow this seems like implementing RAID within postgres,
> > >>
> > >> RAID and LVM too.  I can't get excited about re-inventing those wheels
> > >> when perfectly good implementations already exist for us to sit on top
> > >> of.
> > >
> > > Ok, warning, this is a "you know what would be sweet" moment.
> >
> > The dreaded words from a developers mouth to every manager in the world.
>
> Yea, I just instinctively hit "delete" when I saw that phrase.

Too bad... I know oracle can do what he wants... possibly other db systems as 
well. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Plan invalidation design

2007-02-19 Thread Simon Riggs

On Sat, 2007-02-17 at 12:48 -0500, Tom Lane wrote:

> Relcache inval casts a fairly wide net; for example, adding or dropping an
> index will invalidate all plans using the index's table whether or not
> they used that particular index, and I believe that VACUUM will also
> result in a relcache inval due to updating the table's pg_class row.
> I think this is a good thing though --- for instance, after adding an
> index it seems a good idea to replan to see if the new index is useful,
> and replanning after a VACUUM is useful if the table has changed size
> enough to warrant a different plan.  OTOH this might mean that plans on a
> high-update-traffic table never survive very long because of autovacuum's
> efforts.  If that proves to be a problem in practice we can look at ways
> to dial down the number of replans, but for the moment I think it's more
> important to be sure we *can* replan at need than to find ways to avoid
> replans.

Just some info on that: In an update-intensive scenario, I'm seeing
VACUUMs every 2 minutes on the heaviest hit tables on CVS HEAD on a
medium-powered 4-CPU server. Re-planning multiple queries on 100+
sessions every few minutes would not be good.

It seems a reasonable working assumption that HOT will reduce that
requirement considerably, but its something to watch. Thanks for drawing
attention  to it.

Presumably ANALYZE would have the same effect?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> The arguments for COPY are performance and that you don't need to specify 
> the table name.  INSERT is slower and you need a name, but it's easier to 
> build a UNIX tool style pipeline to import it in real-time.

I can't believe that any production situation could tolerate the
overhead of one-commit-per-log-line.  So a realistic tool for this
is going to have to be able to wrap blocks of maybe 100 or 1000 or so
log lines with BEGIN/COMMIT, and that is exactly as difficult as
wrapping them with a COPY command.  Thus, I disbelieve your argument.
We should not be designing this around an assumed use-case that will
only work for toy installations.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> Elements of arrays are not subject to being toasted by themselves, so
>> I don't think you can make that work.  At least not without breaking
>> wide swaths of code that works fine today.

> You think it's more likely there are places that build arrays and then read
> the items back without passing through detoast than there are places that
> build tuples and do so?

The former is valid per the coding rules, the latter is not, so...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Jonah H. Harris


On 2/17/07, Joshua D. Drake <[EMAIL PROTECTED]> wrote:

My understanding is that the main difference is that rollbacks are
inexpensive for us, but expensive for Oracle.


Yes, Oracle is optimized for COMMIT, we're optimized for ROLLBACK :)

In all seriousness, last time I checked Oracle's MVCC was covered by
two patents.

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith


On Mon, 19 Feb 2007, Tom Lane wrote:


Why is this still under discussion?  I thought we'd agreed that COPY
format was the way to go.


Joshua Drake said "COPY would be a good option, but INSERT is probably 
what I would use as the default. The most use I see for this is something 
where I am tailing out the log and inserting live into a log db..." and I 
completely agreed with him--that's also how all the applications I'd like 
to build around this feature are expected to operate.  No one said 
anything else on this topic to defend COPY as the right choice until you 
just brought it back up here.


The arguments for COPY are performance and that you don't need to specify 
the table name.  INSERT is slower and you need a name, but it's easier to 
build a UNIX tool style pipeline to import it in real-time.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> On Mon, 19 Feb 2007, Alvaro Herrera wrote:
>> Also, "sql" is not really a destination -- it is a format.

> A log file with a different name is another destination.  eventlog is 
> certainly a different format and it's sitting happily as an option there. 
> I haven't heard anyone make a useful argument yet as to how insert/sql 
> logs are any different than the current way that stderr, syslog, and 
> eventlog are all possibilities now for log_destination, each with their 
> own little quirks (and in the case of syslog, their own additional GUC 
> parameters).

Since the "sql" format doesn't make any sense for syslog or eventlog
output, I tend to agree that treating it as a destination is a
reasonable answer.  It's going to be a bit non-orthogonal no matter
which way we jump, but this seems like the most natural and useful
extension from where we are.  To me anyway ... YMMV ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-02-19 Thread Greg Smith

I have a WIP patch that adds the main detail I have found I need to 
properly tune checkpoint and background writer activity.  I think it's 
almost ready to submit (you can see the current patch against 8.2 at 
http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt ) 
after making it a bit more human-readable.  But I've realized that along 
with that, I need some guidance in regards to what log level is 
appropriate for this information.


An example works better than explaining what the patch does:

2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote 
checkpoint_segments)

2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034 
write=279956 us sync=17739974 us


Remember that "Load distributed checkpoint" discussion back in December? 
You can see exactly how bad the problem is on your system with this log 
style (this is from a pgbench run where it's postively awful--it really 
does take over 17 seconds for the fsync to execute, and there are clients 
that are hung the whole time waiting for it).


I also instrumented the background writer.  You get messages like this:

2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5 
Unscanned = 23/54


This shows that we wrote (5) the maximum pages we were allowed to write 
(5) while failing to scan almost half (23) of the buffers we meant to look 
at (54).  By taking a look at this output while the system is under load, 
I found I was able to do bgwriter optimization that used to take me days 
of frustrating testing in hours.  I've been waiting for a good guide to 
bgwriter tuning since 8.1 came out.  Once you have this, combined with 
knowing how many buffers were dirty at checkpoint time because the 
bgwriter didn't get to them in time (the number you want to minimize), the 
tuning guide practically writes itself.


So my question is...what log level should all this go at?  Right now, I 
have the background writer stuff adjusting its level dynamically based on 
what happened; it logs at DEBUG2 if it hits the write limit (which should 
be a rare event once you're tuned properly), DEBUG3 for writes that 
scanned everything they were supposed to, and DEBUG4 if it scanned but 
didn't find anything to write.  The source of checkpoint information logs 
at DEBUG1, the fsync/write info at DEBUG2.


I'd like to move some of these up.  On my system, I even have many of the 
checkpoint messages logged at INFO (the source of the checkpoint and the 
total write time line).  It's a bit chatty, but when you get some weird 
system pause issue it makes it easy to figure out if checkpoints were to 
blame.  Given how useful I feel some of these messages are to system 
tuning, and to explaining what currently appears as inexplicable pauses, I 
don't want them to be buried at DEBUG levels where people are unlikely to 
ever see them (I think some people may be concerned about turning on 
things labeled DEBUG at all).  I am aware that I am too deep into this to 
have an unbiased opinion at this point though, which is why I ask for 
feedback on how to proceed here.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] pgsql: Fix for plpython functions; return true/false for boolean,

2007-02-19 Thread Bruce Momjian


I am still waiting for a plpython patch that has Python version
checking.

---

Guido Goldstein wrote:
> Peter Eisentraut wrote:
> > Guido Goldstein wrote:
> >> Is it possible to tell me which python versions you want to
> >> support?
> > 
> > The issue isn't so much which versions we want to support.  There is 
> > certainly some flexibility with that.  But when a patch breaks the 
> > buildfarm a) unannounced and b) without any apparent feature gain, then 
> > people get annoyed.
> 
> If this breaks the buildfarm it's not my failure.
> Except you can tell me what I've got to do with the
> buildfarm.
> 
> If you mean that plpython didn't compile, fine; simply tell
> the people what version they should consider when sending
> in patches.
> 
> I've checked the patch with postgres 8.1.3 and 8.2.1
> with python 2.4 and 2.5 on intel 32 bit and amd 64 bit
> systems; all systems running linux.
> 
> *And* it's not a feature patch but a bug-fixing one!
> Python is a language with strong typing, so silently
> converting a datatype is a bug -- not a feature.
> Btw, you'll lose the type information of boolean columns in
> trigger functions (NEW and OLD dicts, no explicit parameters),
> which does cause problems.
> 
> > That said, we certainly try to support a few more versions of Python 
> [...]
> 
> If you want to support python 2.3 use the attached patch, which also
> works for the newer python versions.
> The Python 2.3 branch is the oldest _officially_ supported python version.
> 
> Anyway, to circumvent the above mentiond point a) I herewith anncounce
> that the included patch might break the buildfarm.
> 
> Cheers
>Guido
> 


> 
> ---(end of broadcast)---
> TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Tom Lane

August Zajonc <[EMAIL PROTECTED]> writes:
> The key is how lightweight the setup could be, which matters because
> clients are not always willing to pay for a PITR setup. The low overhead
> would mean you'd feel fine about setting guc to 1hr or so.

This would have exactly the same performance consequences as always
having an hour-old open transaction.  I'm afraid that describing it
as "low overhead" is mere wishful thinking: it would cripple vacuuming
of high-update tables and greatly increase the typical load on pg_clog
and pg_subtrans.  We already know that pg_subtrans contention can be a
source of context-swap storms, with the size of the window back to
GlobalXmin being the controlling factor for how bad it gets.

It's possible that this last could be addressed by separating the
concept of "old enough to be vacuumed" from GlobalXmin, but it's
certainly not a trivial thing.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Tom Lane

Greg Smith <[EMAIL PROTECTED]> writes:
> A workable syntax might be
> INSERT INTO "pg_log" ...

Why is this still under discussion?  I thought we'd agreed that COPY
format was the way to go.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread David Fetter

On Mon, Feb 19, 2007 at 02:50:34PM -0500, Andrew Sullivan wrote:
> On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> > Martijn van Oosterhout  writes:
> > > Somehow this seems like implementing RAID within postgres,
> > 
> > RAID and LVM too.  I can't get excited about re-inventing those
> > wheels when perfectly good implementations already exist for us to
> > sit on top of.
> 
> Ok, warning, this is a "you know what would be sweet" moment.
> 
> What would be nice is to be able to detach one of the volumes, and
> know the span of the data in there without being able to access the
> data.
> 
> The problem that a lot of warehouse operators have is something like
> this: "We know we have all this data, but we don't know what we will
> want to do with it later.  So keep it all.  I'll get back to you
> when I want to know something."
> 
> It'd be nice to be able to load up all that data once, and then
> shunt it off into (say) read-only media.  If one could then run a
> query that would tell one which spans of data are candidates for the
> search, you could bring back online (onto reasonably fast storage,
> for instance) just the volumes you need to read.

Isn't this one of the big use cases for table partitioning?

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-19 Thread Andrew Dunstan



I think I'll go with Tom's Plan B for HEAD, but not do anything more for 
8.2 than has already been done.


cheers

andrew


Bruce Momjian wrote:

Did we come to a conclusion on this?

---

Andrew Dunstan wrote:
  

Andrew Dunstan wrote:


Tom Lane wrote:
  

Andrew Dunstan <[EMAIL PROTECTED]> writes:
 

The consequence will be, though, that psql will accept a syntax for 
"\copy (query) ..." that the corresponding backend command would 
reject were we not transforming it. That strikes me as potentially 
confusing.

  

Perhaps.  What about plan B: remove the legacy syntax support in \copy?
IIRC it has not been documented since 7.2, so maybe we can finally throw
it overboard.  Thoughts?


  

I like it for 8.3 - but  maybe my present patch would be better for 
8.2, as it involves less behaviour change.


  
While we decide this issue, which can be worked around in any case, I am 
going to commit the part of the patch that nobody has objected to (and 
which will fix Michael's original complaint), on HEAD and 8.2 stable, so 
we can get some testing going.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



  


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Galy Lee



Gregory Stark wrote:

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.


I think to make vacuum has this interrupted-resume capability is quite 
useful for large table.


It can provide more flexibility for autovacuum to create a good schedule 
scheme. Sometimes it takes a whole day to vacuum the large table 
(Hundreds-GB table may qualify); setting the cost_delay make it even 
lasts for several days. If the system has maintenance time, vacuum task 
of the large table can be split to fit into the maintenance time by 
interrupted-resume feature.



One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.



Even a single iteration may take a long time, so it is not so much 
useful to have a break in the boundary of the iteration. I think it is 
not so difficult to get vacuum to remember where it leaves and start 
from where it leaves last time. The following is a basic idea.


A typical vacuum process mainly have the following phases:
 Phase 1. scan heap
 Phase 2. scan and sweep index
 Phase 3. sweep heap
 Phase 4. update FSM
 Phase 5. truncate CLOG

Where vacuum is interrupted, we can just save the collected information 
into the disk, and restore it later when vacuum restarts. When vacuum 
process is interrupted, we can remember the dead tuple list and the 
block number it has scanned in phase 1; the indexes it has cleanup in 
phase 2; the tuples it has swept in phase 3. Before exiting from vacuum, 
we can also merge the free space information into FSM.


We are working on this feature now.  I will propose it latter to discuss 
with you.


Best Regards
Galy Lee
--
NTT Open Source Software Center

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Bruce Momjian

Joe Conway wrote:
> Bruce Momjian wrote:
> > Pavel Stehule wrote:
> >> Hello,
> >>
> >> I miss doc for this operator
> > 
> > Strang IS [NOT] OF wasn't documented, especially seeing it was added in
> > PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.
> 
> Here's the reason -- see this thread:
> http://archives.postgresql.org/pgsql-patches/2003-08/msg00062.php

Wow, interesting.  I do remember that now.  Should I revert the
documentation addition and add a comment to gram.y?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Joe Conway


Bruce Momjian wrote:

Pavel Stehule wrote:

Hello,

I miss doc for this operator


Strang IS [NOT] OF wasn't documented, especially seeing it was added in
PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.


Here's the reason -- see this thread:
http://archives.postgresql.org/pgsql-patches/2003-08/msg00062.php

Joe


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] ToDo: add documentation for operator IS OF

2007-02-19 Thread Bruce Momjian

Pavel Stehule wrote:
> Hello,
> 
> I miss doc for this operator

Strang IS [NOT] OF wasn't documented, especially seeing it was added in
PostgreSQL 7.3.  Anyway, documented and backpatched to 8.2.X.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/func.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.365
diff -c -c -r1.365 func.sgml
*** doc/src/sgml/func.sgml	19 Feb 2007 22:51:44 -	1.365
--- doc/src/sgml/func.sgml	20 Feb 2007 00:24:15 -
***
*** 411,416 
--- 411,432 
  IS NOT NULL, respectively, except that the input
  expression must be of Boolean type.
 
+  
+
+ 
+  IS OF
+ 
+ 
+  IS NOT OF
+ 
+ It is possible to check the data type of an expression using the
+ constructs
+ 
+ expression IS OF (typename, ...)
+ expression IS NOT OF (typename, ...)
+ 
+ They return a boolean value based on whether the expression's data
+ type is one of the listed data types.

  

Index: doc/src/sgml/syntax.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/syntax.sgml,v
retrieving revision 1.115
diff -c -c -r1.115 syntax.sgml
*** doc/src/sgml/syntax.sgml	6 Feb 2007 09:16:07 -	1.115
--- doc/src/sgml/syntax.sgml	20 Feb 2007 00:24:15 -
***
*** 847,853 

 IS
 
!IS TRUE, IS FALSE, IS UNKNOWN, IS NULL

  

--- 847,854 

 IS
 
!IS TRUE, IS FALSE, IS
!UNKNOWN, IS NULL, IS OF

  


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] TopPlan, again

2007-02-19 Thread Tom Lane

"Simon Riggs" <[EMAIL PROTECTED]> writes:
>> After looking over the code it seems that the executor needs a limited
>> subset of the Query fields, namely
>> ...
>> which I think we should put into a new TopPlan node type.

> All else sounds good, but why would we be caching a plan that used these
> fields?

Um, what's your point?  I certainly have no desire to support two
different Executor APIs depending on whether we think the command might
be worth cacheing or not.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] TopPlan, again

2007-02-19 Thread Simon Riggs

On Sun, 2007-02-18 at 18:19 -0500, Tom Lane wrote:
> While thinking about having a centralized plan cache for managing plan
> invalidation, I got annoyed again about the fact that the executor needs
> access to the Query tree.  This means that we'll be storing *three*
> representations of any cached query: raw parsetree for possible
> regeneration, plus parsed Query tree and Plan tree.
...
> After looking over the code it seems that the executor needs a limited
> subset of the Query fields, namely
...
>   into
>   intoOptions
>   intoOnCommit (why is this separate from intoOptions?)
>   intoTableSpaceName
...

> which I think we should put into a new TopPlan node type.

All else sounds good, but why would we be caching a plan that used these
fields? Anybody re-executing a CREATE TABLE AS SELECT on the same table
isn't somebody we should be helping. ISTM that we'd be able to exclude
them from the TopPlan on that basis, possibly creating an Into node to
reduce the clutter.



Couple of incidental points on plan invalidation:
- We need to consider how the planner uses parameter values. Currently
the unnamed query utilises the first bind parameters to plan the query.
Doing that when we have a central plan cache will definitely cause
problems in some applications which currently repeatedly re-specify the
same parameter on their session only, but differ across sessions. Sounds
bizarre, but assuming that all users of the same query want it optimised
the same way is not a good assumption in all cases. I'm completely in
favour of a centralized plan cache in all other ways...

- I'd like to make it impossible to re-plan the output columns of
queries with unspecified output columns e.g. * or foo.* 
This makes it possible for the results of the query to change during
re-execution. I've never seen an application that used dynamic query
that allowed for the possibility that the result metadata might change
as we re-execute and allowing it would seem likely to break more
applications than we'd really want. It will also allow us to remove the
Metadata call from the v3 Protocol at Exec time, as David Strong
suggested last year on pgsql-jdbc.

- It would be good to allow for exec-time constraint exclusion, which
would allow caching plans that used by CE and stable functions (e.g. col
< CURRENT_DATE). That may change the design, even though thats not an
8.3 thing at all.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] --enable-debug does not work with gcc

2007-02-19 Thread Bruce Momjian


Added to developer's FAQ:

src/Makefile.custom can be used to set environment variables,
like CUSTOM_COPT, that are used for every compile.



---

Magnus Hagander wrote:
> >> If we did what you suggest, then --enable-debug would cause performance
> >> degradation, which would cause people to not use it, which would result
> >> in most binaries being completely undebuggable rather than only partially.
> >> Doesn't sound like a good tradeoff to me.
> >>
> >> Personally, in my development tree I use a Makefile.custom containing
> >>
> >> # back off optimization unless profiling
> >> ifeq ($(PROFILE),)
> >>CFLAGS:= $(patsubst -O2,-O1,$(CFLAGS))
> >> endif
> >>
> >> -O1 still generates "uninitialized variable" warnings but the code is a
> >> lot saner to step through ... not perfect, but saner.  It's been a
> >> workable compromise for a long time.  I don't recommend developing with
> >> -O0, exactly because it disables some mighty valuable warnings.
> > 
> > Agreed.  I use -O1 by default myself, unless I am doing performance testing.
> > 
> Something for the developers FAQ perhaps? I confess I did not know of
> Makefile.custom :-D
> 
> //Magnus
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Simon Riggs

On Mon, 2007-02-19 at 17:35 -0300, Alvaro Herrera wrote:
> Andrew Sullivan wrote:
> > On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> > > Martijn van Oosterhout  writes:
> > > > Somehow this seems like implementing RAID within postgres,
> > > 
> > > RAID and LVM too.  I can't get excited about re-inventing those wheels
> > > when perfectly good implementations already exist for us to sit on top of.
> > 
> > Ok, warning, this is a "you know what would be sweet" moment.
> > 
> > What would be nice is to be able to detach one of the volumes, and
> > know the span of the data in there without being able to access the
> > data.
> > 
> > The problem that a lot of warehouse operators have is something like
> > this: "We know we have all this data, but we don't know what we will
> > want to do with it later.  So keep it all.  I'll get back to you when
> > I want to know something."
> 
> You should be able to do that with tablespaces and VACUUM FREEZE, the
> point of the latter being that you can take the disk containing the
> "read only" data offline, and still have the data readable after
> plugging it back in, no matter how far along the transaction ID counter
> is.

Doesn't work anymore because VACUUM FREEZE doesn't (and can't) take a
full table lock, so somebody can always update data after a data block
has been frozen. That can lead to putting a table onto read-only media
when it still needs vacuuming, which is a great way to break the DB. It
also doesn't freeze still visible data, so there's no easy way of doing
this. Waiting until the VF is the oldest Xid is prone to deadlock as
well.

Ideally, we'd have a copy to read-only media whilst freezing, as an
atomic operation, with some guarantees that it will actually have frozen
*everything*, or fail: 

ALTER TABLE SET TABLESPACE foo READONLY;

Can we agree that as a TODO item?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] \copy (query) delimiter syntax error

2007-02-19 Thread Bruce Momjian


Did we come to a conclusion on this?

---

Andrew Dunstan wrote:
> Andrew Dunstan wrote:
> >
> >
> > Tom Lane wrote:
> >> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> >>  
> >>> The consequence will be, though, that psql will accept a syntax for 
> >>> "\copy (query) ..." that the corresponding backend command would 
> >>> reject were we not transforming it. That strikes me as potentially 
> >>> confusing.
> >>> 
> >>
> >> Perhaps.  What about plan B: remove the legacy syntax support in \copy?
> >> IIRC it has not been documented since 7.2, so maybe we can finally throw
> >> it overboard.  Thoughts?
> >>
> >> 
> >>   
> >
> > I like it for 8.3 - but  maybe my present patch would be better for 
> > 8.2, as it involves less behaviour change.
> >
> 
> While we decide this issue, which can be worked around in any case, I am 
> going to commit the part of the patch that nobody has objected to (and 
> which will fix Michael's original complaint), on HEAD and 8.2 stable, so 
> we can get some testing going.
> 
> cheers
> 
> andrew
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Deadlock with pg_dump?

2007-02-19 Thread Simon Riggs

On Mon, 2007-02-19 at 19:38 +, Simon Riggs wrote:
> On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:
> > Simon Riggs wrote:
> > > On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:
> > > > Chris Campbell <[EMAIL PROTECTED]> writes:
> > > > > Is there additional logging information I can turn on to get more  
> > > > > details? I guess I need to see exactly what locks both processes  
> > > > > hold, and what queries they were running when the deadlock occurred?  
> > > > > Is that easily done, without turning on logging for *all* statements?
> > > > 
> > > > log_min_error_statement = error would at least get you the statements
> > > > reporting the deadlocks, though not what they're conflicting against.
> > > 
> > > Yeh, we need a much better locking logger for performance analysis.
> > > 
> > > We really need to dump the whole wait-for graph for deadlocks, since
> > > this might be more complex than just two statements involved. Deadlocks
> > > ought to be so infrequent that we can afford the log space to do this -
> > > plus if we did this it would likely lead to fewer deadlocks.
> > > 
> > > For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
> > > that would allow you to dump the wait-for graph for any data-level locks
> > > that wait too long, rather than just those that deadlock. Many
> > > applications experience heavy locking because of lack of holistic
> > > design. That will also show up the need for other utilities to act
> > > CONCURRENTLY, if possible.
> > 
> > Old email, but I don't see how our current output is not good enough?
> > 
> > test=> lock a;
> > ERROR:  deadlock detected
> > DETAIL:  Process 6855 waits for AccessExclusiveLock on relation 16394 of
> > database 16384; blocked by process 6795.
> > Process 6795 waits for AccessExclusiveLock on relation 16396 of database
> > 16384; blocked by process 6855.
> 
> This detects deadlocks, but it doesn't detect lock waits. 
> 
> When I wrote that it was previous experience driving me. Recent client
> experience has highlighted the clear need for this. We had a lock wait
> of 50 hours because of an RI check; thats the kind of thing I'd like to
> show up in the logs somewhere.
> 
> Lock wait detection can be used to show up synchronisation points that
> have been inadvertently designed into an application, so its a useful
> tool in investigating performance issues.
> 
> I have a patch implementing the logging as agreed with Tom, will post to
> patches later tonight.

Patch for discussion, includes doc entries at top of patch, so its
fairly clear how it works.

Output is an INFO message, to allow this to trigger
log_min_error_statement when it generates a message, to allow us to see
the SQL statement that is waiting. This allows it to generate a message
prior to the statement completing, which is important because it may not
ever complete, in some cases, so simply logging a list of pids won't
always tell you what the SQL was that was waiting.

Other approaches are possible...

Comments?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com

Index: doc/src/sgml/config.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.108
diff -c -r1.108 config.sgml
*** doc/src/sgml/config.sgml	1 Feb 2007 00:28:16 -	1.108
--- doc/src/sgml/config.sgml	6 Feb 2007 12:31:49 -
***
*** 2936,2941 
--- 2936,2965 

   
  
+  
+   log_lock_waits (boolean)
+   
+log_lock_waits configuration parameter
+   
+   
+
+ 		Controls whether log messages are produced when a statement is forced
+ 		to wait when trying to acquire locks on database objects. The threshold
+ 		time is the value of the  parameter.
+ 		The log messages generated are intended for use during specific 
+ 		investigations into application performance issues and subsequent tuning.
+ 		It is designed for use in conjunction with log_min_error_statement.
+ 		Log messages indicating long lock waits might indicate problems with
+ 		applications accessing the database or possibly disconnection issues. 
+ 		If no such problem exist it might indicate that deadlock_timeout
+ 		could be set higher. Log messages might also indicate that certain
+ 		deadlocks have been avoided. In those cases, decreasing the value of 
+ 		deadlock_timeout might resolve lock wait situations faster,
+ 		thereby reducing contention. By default, this form of logging is off.
+
+   
+  
+ 
   
  
 
Index: src/backend/storage/lmgr/deadlock.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/storage/lmgr/deadlock.c,v
retrieving revision 1.44
diff -c -r1.44 deadlock.c
*** src/backend/storage/lmgr/deadlock.c	5 Jan 2007 22:19:38 -	1.44
--- src/backend/storage/lmgr/deadlock.c	6 Feb 2007 12:31:55 -
**

Re: [HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Jakub Ouhrabka

Hi Tom,

> Hacking pg_control would be the hard part; you'll never get the CRC
> right if you do it manually.  Possibly pg_resetxlog could be adapted
> to the purpose.

thanks for your valuable answer! I looked at pg_resetxlog.c but I'm no 
pg internals' expert - would something like this work?

1) normally shut down database
2) hack pg_resetxlog to set locale to wanted value
3) run pg_resetxlog -f (rewrite pg_control - everything would be guessed 
with the exception of overloaded locale)

4) start database

We won't miss any transactions and there won't be any inconsistency in 
data because server was normally shut down, right?

Thanks,

Kuba

Tom Lane napsal(a):

Martijn van Oosterhout  writes:

But I guess something like this would work:
1) drop all indexes on text/varchar columns
2) change cluster locale
3) create all indexes on text/varchar columns

You're going to miss the "name" columns, ie. every string index in
pg_catalog.

But "name" is not locale-aware --- it just uses strcmp().  AFAIR there
aren't any locale-dependent indexes in the system catalogs.  So in
principle you could hack pg_control, restart the postmaster, and then
reindex every locale-dependent index.  Hacking pg_control would be the
hard part; you'll never get the CRC right if you do it manually.  Possibly
pg_resetxlog could be adapted to the purpose.

I'd suggest single user mode at least, and make backups!

Yup, a filesystem backup would be a *real* good idea.  Not to mention
testing the procedure on a toy installation.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] libpq docs about PQfreemem

2007-02-19 Thread Bruce Momjian


I have updated the PQfree documentation;  patch attached.  Backpatched
to 8.2.X.

---

Zeugswetter Andreas ADI SD wrote:
> 
> > > future for some reason. (the doc for the other functions say you
> have to
> > > use PQfreemem without mentioning any exceptions)
> > > 
> > > Thoughts? Rip out or update?
> > 
> > Are you saying that almost all Win32 binaries and libraries now can
> free
> > across DLLs?
> 
> You can under very narrow conditions. You need to force code generation 
> for "Multithreaded DLL" run-time libraries (e.g. in VC6 msvcrt.dll) 
> for all exe's and dll's.
> This is bad for debugging, memory checkers and probably impossible
> when using different compilers.
> So you really need PQfreemem.
> 
> Andreas

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/libpq.sgml
===
RCS file: /cvsroot/pgsql/doc/src/sgml/libpq.sgml,v
retrieving revision 1.231
diff -c -c -r1.231 libpq.sgml
*** doc/src/sgml/libpq.sgml	16 Feb 2007 16:37:29 -	1.231
--- doc/src/sgml/libpq.sgml	19 Feb 2007 22:04:28 -
***
*** 2617,2625 
 PQescapeBytea,
 PQunescapeBytea,
 and PQnotifies.
!It is needed by Microsoft Windows, which cannot free memory across
!DLLs, unless multithreaded DLLs (/MD in VC6) are used.
!On other platforms, this function is the same as the standard library function free().



--- 2617,2629 
 PQescapeBytea,
 PQunescapeBytea,
 and PQnotifies.
!It is particularly important that this function, rather than
!free(), be used on Microsoft Windows.  This is because
!allocating memory in a DLL and releasing it in the application works
!only if multithreaded/single-threaded, release/debug, and static/dynamic
!flags are the same for the DLL and the application.  On non-Microsoft
!Windows platforms, this function is the same as the standard library
!function free().




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Proposed adjustments in MaxTupleSize andtoastthresholds

2007-02-19 Thread Bruce Momjian


Added to TODO:

* Consider allowing configuration of TOAST thresholds

  http://archives.postgresql.org/pgsql-hackers/2007-02/msg00213.php

---

Simon Riggs wrote:
> On Mon, 2007-02-05 at 19:18 -0500, Jan Wieck wrote:
> > On 2/5/2007 11:52 AM, Tom Lane wrote:
> > > "Simon Riggs" <[EMAIL PROTECTED]> writes:
> > >> Sounds like a good time to suggest making these values configurable,
> > >> within certain reasonable bounds to avoid bad behaviour.
> > > 
> > > Actually, given what we've just learned --- namely that choosing these
> > > values at random is a bad idea --- I'd want to see a whole lot of
> > > positive evidence before adding such a configuration knob.
> > 
> > Some of the evidence is TOAST itself. Every time you do not SET a column 
> > that has been toasted into external storage during an UPDATE, you win 
> > because the columns data isn't read during the scan for the row to 
> > update, it isn't read during heap_update(), it isn't actually updated at 
> > all (the toast reference is copied as is and the external value reused), 
> > and not a single byte of the external data is bloating WAL. If someone 
> > knows that 99% of their updates will not hit certain text columns in 
> > their tables, actually forcing them to be compressed no matter what and 
> > to be stored external if they exceed 100 bytes will be a win.
> 
> Yes, thats the main use case.
> 
> > Of course, this is a bit different from Simon's approach. What I 
> > describe here is a per pg_attribute configuration to enforce a certain 
> > new toaster behavior. Since we already have something that gives the 
> > toaster a per column cluestick (like not to bother trying to compress), 
> > it might be much easier to implement then Simon's proposal. It would 
> > require that the toaster goes over the initial heap tuple for those 
> > specially configured columns even if the tuple is below the toast 
> > threshold, which suggests that a pg_class.relhasspecialtoastneeds could 
> > be useful. But I think as for fine tuning capabilities, a column 
> > insensitive maximum tuple size is insufficient anyway.
> 
> Well, sounds like we both want the same thing. The only discussion seems
> to be about user interface.
> 
> Setting it per column is much better for very fine tuning, but setting
> them in isolation doesn't help decide what to do when you have lots of
> medium length strings where the sum exceeds the toast target.
> 
> IMHO it would be better to have an col-level "storage priority" (default
> 0) and then an table-level settable toast target. So we start applying
> the storage handling mechanisms on the highest priority columns and keep
> going in descending order until we are under the limit for the table.
> 
> ALTER TABLE foo
>   ALTER COLUMN foocol 
>   SET STORAGE EXTERNAL PRIORITY 5
>   WITH 
>   (toast_target = 400);   /* must be MAXALIGNed value */
> 
> Equal priorities are allowed, in which case lowest attribute id wins,
> i.e. current behaviour remains the default.
> 
> -- 
>   Simon Riggs 
>   EnterpriseDB   http://www.enterprisedb.com
> 
> 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] referential Integrity and SHARE locks

2007-02-19 Thread Bruce Momjian


Added to TODO:

* Allow UPDATEs on only non-referential integrity columns not to 
conflict
  with referential integrity locks

  http://archives.postgresql.org/pgsql-hackers/2007-02/msg00073.php


---

Jan Wieck wrote:
> On 2/8/2007 2:46 PM, Marc Munro wrote:
> > On Thu, 2007-08-02 at 14:33 -0500, Tom Lane wrote:
> >> Marc Munro <[EMAIL PROTECTED]> writes:
> >> > Yes in this case, T1 must abort because the record it was going to
> >> > update has disappeared from underneath it.  I don't see how this is
> >> > significantly different from the same race for the record if the table
> >> > had no RI constraints.  The only difference that I can see, is that T1
> >> > now has some locks that it must relinquish as the transaction aborts.
> >> 
> >> No, the difference is there would have been no error at all before;
> >> if the record were deleted before T1 got to it then it wouldn't have
> >> attempted to update it.  I really don't think you can make it work
> >> to perform updates or deletes on a record you have not yet locked.
> > 
> > The record would be locked before the update or delete is attempted,
> > however it would not be locked until the referential integrity
> > constraints have succeeded in acquiring their locks.
> > 
> > It is becoming clear to me that I am missing something but I still don't
> > know what it is.  If anyone can see it and explain it I'd really
> > appreciate it.
> 
> I think you are missing the fact that the exclusive row lock on UPDATE 
> is taken before any triggers are fired at all, even BEFORE ROW triggers. 
> This is necessary in order to prevent the row being updated or removed 
> concurrently while the triggers are executing. Since BEFORE ROW triggers 
> can modify the content of the row (including the foreign key), the RI 
> check and lock of the referenced row cannot happen before other BR 
> triggers are completed.
> 
> In order to make your idea fly, the RI check trigger on INSERT or UPDATE 
> would have to be fired before taking the row lock considering the NEW 
> values for referencing columns as they are thus far. Since the row isn't 
> locked at this time, it can change or disappear while the RI trigger is 
> executing, so the check and lock has to be redone later with the actual 
> row that got locked and after all BR triggers are done with it.
> 
> 
> Jan
> 
> -- 
> #==#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me.  #
> #== [EMAIL PROTECTED] #
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Alvaro Herrera

Andrew Sullivan wrote:
> On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> > Martijn van Oosterhout  writes:
> > > Somehow this seems like implementing RAID within postgres,
> > 
> > RAID and LVM too.  I can't get excited about re-inventing those wheels
> > when perfectly good implementations already exist for us to sit on top of.
> 
> Ok, warning, this is a "you know what would be sweet" moment.
> 
> What would be nice is to be able to detach one of the volumes, and
> know the span of the data in there without being able to access the
> data.
> 
> The problem that a lot of warehouse operators have is something like
> this: "We know we have all this data, but we don't know what we will
> want to do with it later.  So keep it all.  I'll get back to you when
> I want to know something."

You should be able to do that with tablespaces and VACUUM FREEZE, the
point of the latter being that you can take the disk containing the
"read only" data offline, and still have the data readable after
plugging it back in, no matter how far along the transaction ID counter
is.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith


On Mon, 19 Feb 2007, Alvaro Herrera wrote:

We already have a "combined GUC option" that is used to change two 
different things (DateStyle) and I regularly see people confused about 
how to use it.


You already have a combined GUC option called log_destination that's 
sitting in the appropriate area of the configuration file, doing something 
similar to what's needed for the new feature.  People confused by that are 
already confused.



Also, "sql" is not really a destination -- it is a format.


A log file with a different name is another destination.  eventlog is 
certainly a different format and it's sitting happily as an option there. 
I haven't heard anyone make a useful argument yet as to how insert/sql 
logs are any different than the current way that stderr, syslog, and 
eventlog are all possibilities now for log_destination, each with their 
own little quirks (and in the case of syslog, their own additional GUC 
parameters).


That way you can choose to have one or the other, or both if you're 
really dumb.


The fact that you're characterizing people who might want both as "really 
dumb" tells me you're not familiar with enterprise logging requirements. I 
already commented on situations where wanting both types of output going 
at once is going to absolutely be a requirement in some environments for 
this feature addition to be useful; there are a lot of large operations 
that rely heavily on features like syslog to help manage their systems. 
Most of the places I've worked at, the syslog server where the analysis is 
running wasn't necessarily even in the same state as the machine 
generating the log entries.


I know I can't deploy this feature unless it operates in parallel with the 
existing text-based info going to syslog, both because of that and because 
of transition issues--I can't disrupt the existing logs to test a new log 
mechanism until that new mechanism has proven itself.  I'll probably 
deploy it with both turned on forever once it's available.


As for your comments on syslog vs. stderr, I completely agree with 
Guillaume's response to you on that subject.  The stderr output is 
difficult to use for the reasons he describes, but the kind of 
environments that use complicated logging aren't relying on that anyway. 
I wouldn't get distracted by fixing that implementation when it's 
functional enough for most who are satisfied with stderr output.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread August Zajonc

On Mon, 19 Feb 2007 20:30:59 +0100, "Florian G. Pflug" <[EMAIL PROTECTED]>
said:
> August Zajonc wrote:
> > Gregory Stark wrote:
> > 
> > Couldn't you define things simply to be that you get a consistent view
> > including all transactions started before x transaction? This is time
> > travel lite, but low overhead which I think is a key benefit of this
> > approach.
> 
> I was thinking along the same line. Flashback is probably ony really
> usefull on databases that are mostly read-only, but with a few users
> who update data. You'd use flashback to undo catastrophic changes done
> by accident, and probably will gladly accept that you undo a little
> more work than strictly necessary.
> 
> On the contrary, if you're running a online shop were people buy stuff
> 24/7, and, say, somebody accidentally deletes some producs, than you
> won't want to loose the orders happened during that last hour, but will
> rather try to regenerate that products from your last backup.

Hopefully people doing order systems are using PITR or similar :) 

For the time travel light case, it's just a matter of clear definition.
You get all transactions that were *started* before and up to x trx. If
the transaction rolled back you still won't see it, so you're still
getting a consistent view. But if it committed after your marker you
will see it. That seems ok to me. In fact, I suspect folks think of
transactions as happening more or less when they get sent to the DB, so
this may map more directly to what people expect.

The one caveat would be that if you started a long running transaction,
then did the oops trx 5 minutes later, and then started time travel
*before* the long running trx committed. In that case you wouldn't see
that long running trx, so the definition would need to be modified to be
something like all trx started before x, that were no longer running
when you time travel. Don't know if it is worth a NOTICE in the logs if
you time travel back, but there are id's of transactions from before
your xmin that are still running (and if you waited a bit might become
visable in your time travel view). 

If Jan gets his way with a timestamp on trx commit, then you can do
started before x time, which may be more user friendly. 

For PITR I'd imagine you might actually be able to get the visability
right no? Havn't looked deeply enough into the wal logs to understand
how the partial playback scanario works. If the wal logs are ordered on
trx commit time, then you'd get proper visability. 

- August

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Bruce Momjian

Joshua D. Drake wrote:
> Andrew Sullivan wrote:
> > On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> >> Martijn van Oosterhout  writes:
> >>> Somehow this seems like implementing RAID within postgres,
> >> RAID and LVM too.  I can't get excited about re-inventing those wheels
> >> when perfectly good implementations already exist for us to sit on top of.
> > 
> > Ok, warning, this is a "you know what would be sweet" moment.
> 
> The dreaded words from a developers mouth to every manager in the world.

Yea, I just instinctively hit "delete" when I saw that phrase.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] proposed todo: use insert/update returning anywhere a table is allowed

2007-02-19 Thread Bruce Momjian


Added to TODO:

* Allow INSERT/UPDATE ... RETURNING inside a SELECT 'FROM' clause

  http://archives.postgresql.org/pgsql-general/2006-09/msg00803.php
  http://archives.postgresql.org/pgsql-hackers/2006-10/msg00693.php

---

Jeff Davis wrote:
> On Tue, 2007-02-06 at 16:54 -0500, Merlin Moncure wrote:
> > It would be great to be able to join to update, returning, etc.  It
> > looks like the code was deliberately tied off as-is and I was
> > surprised not to see a todo for this.
> > 
> > the basic idea is to be able to do things like:
> > select * from (update foo set id = 1 returning *) q;
> > 
> > which currently syntax errors out on 'set' which is a bit odd.
> > 
> > here are tom's brief notes on it and also some context on how select
> > combined with insert/update might influence triggers:
> > 
> > http://archives.postgresql.org/pgsql-hackers/2006-10/msg00693.php
> > 
> 
> Also a relevent thread:
> 
> http://archives.postgresql.org/pgsql-general/2006-09/msg00803.php
> 
> Regards,
>   Jeff Davis
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Greg Smith


On Mon, 19 Feb 2007, Joshua D. Drake wrote:


Longer than that... it supported mirroring and raid 5 in NT4 and
possibly even NT3.51 IIRC.


Mirroring and RAID 5 go back to Windows NT 3.1 Advanced Server in 1993:

http://support.microsoft.com/kb/114779
http://www.byte.com/art/9404/sec8/art7.htm

The main source of confusion about current support for this feature is 
that the desktop/workstation version of Windows don't have it.  For 
Windows XP, you need the XP Professional version to get "dynamic disk" 
support; it's not in the home edition.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Joshua D. Drake

Andrew Sullivan wrote:
> On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
>> Martijn van Oosterhout  writes:
>>> Somehow this seems like implementing RAID within postgres,
>> RAID and LVM too.  I can't get excited about re-inventing those wheels
>> when perfectly good implementations already exist for us to sit on top of.
> 
> Ok, warning, this is a "you know what would be sweet" moment.

The dreaded words from a developers mouth to every manager in the world.

Joshua D. Drake



-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Andrew Sullivan

On Mon, Feb 19, 2007 at 10:33:24AM -0500, Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Somehow this seems like implementing RAID within postgres,
> 
> RAID and LVM too.  I can't get excited about re-inventing those wheels
> when perfectly good implementations already exist for us to sit on top of.

Ok, warning, this is a "you know what would be sweet" moment.

What would be nice is to be able to detach one of the volumes, and
know the span of the data in there without being able to access the
data.

The problem that a lot of warehouse operators have is something like
this: "We know we have all this data, but we don't know what we will
want to do with it later.  So keep it all.  I'll get back to you when
I want to know something."

It'd be nice to be able to load up all that data once, and then shunt
it off into (say) read-only media.  If one could then run a query
that would tell one which spans of data are candidates for the
search, you could bring back online (onto reasonably fast storage,
for instance) just the volumes you need to read.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Users never remark, "Wow, this software may be buggy and hard 
to use, but at least there is a lot of code underneath."
--Damien Katz

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Deadlock with pg_dump?

2007-02-19 Thread Simon Riggs

On Tue, 2007-02-13 at 22:19 -0500, Bruce Momjian wrote:
> Simon Riggs wrote:
> > On Thu, 2006-10-26 at 18:45 -0400, Tom Lane wrote:
> > > Chris Campbell <[EMAIL PROTECTED]> writes:
> > > > Is there additional logging information I can turn on to get more  
> > > > details? I guess I need to see exactly what locks both processes  
> > > > hold, and what queries they were running when the deadlock occurred?  
> > > > Is that easily done, without turning on logging for *all* statements?
> > > 
> > > log_min_error_statement = error would at least get you the statements
> > > reporting the deadlocks, though not what they're conflicting against.
> > 
> > Yeh, we need a much better locking logger for performance analysis.
> > 
> > We really need to dump the whole wait-for graph for deadlocks, since
> > this might be more complex than just two statements involved. Deadlocks
> > ought to be so infrequent that we can afford the log space to do this -
> > plus if we did this it would likely lead to fewer deadlocks.
> > 
> > For 8.3 I'd like to have a log_min_duration_lockwait (secs) parameter
> > that would allow you to dump the wait-for graph for any data-level locks
> > that wait too long, rather than just those that deadlock. Many
> > applications experience heavy locking because of lack of holistic
> > design. That will also show up the need for other utilities to act
> > CONCURRENTLY, if possible.
> 
> Old email, but I don't see how our current output is not good enough?
> 
>   test=> lock a;
>   ERROR:  deadlock detected
>   DETAIL:  Process 6855 waits for AccessExclusiveLock on relation 16394 of
>   database 16384; blocked by process 6795.
>   Process 6795 waits for AccessExclusiveLock on relation 16396 of database
>   16384; blocked by process 6855.

This detects deadlocks, but it doesn't detect lock waits. 

When I wrote that it was previous experience driving me. Recent client
experience has highlighted the clear need for this. We had a lock wait
of 50 hours because of an RI check; thats the kind of thing I'd like to
show up in the logs somewhere.

Lock wait detection can be used to show up synchronisation points that
have been inadvertently designed into an application, so its a useful
tool in investigating performance issues.

I have a patch implementing the logging as agreed with Tom, will post to
patches later tonight.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Joshua D. Drake

Stefan Kaltenbrunner wrote:
> Peter Eisentraut wrote:
>> Tom Lane wrote:
>>> Martijn van Oosterhout  writes:
 Somehow this seems like implementing RAID within postgres,
>>> RAID and LVM too.  I can't get excited about re-inventing those
>>> wheels when perfectly good implementations already exist for us to
>>> sit on top of.
>> I expect that someone will point out that Windows doesn't support RAID 
>> or LVM, and we'll have to reimplement it anyway.
> 
> windows supports software raid just fine since Windows 2000 or so ...

Longer than that... it supported mirroring and raid 5 in NT4 and
possibly even NT3.51 IIRC.


Joshua D. Drake


> 
> 
> Stefan
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org
> 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug


August Zajonc wrote:

Gregory Stark wrote:

"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes:

First we must run the query in serializable mode and replace 
the snapshot with a synthetic one, which defines visibility 
at the start of the desired transaction

We could use something that controls "global xmin".
It would ensure, that global xmin does not advance bejond
what still needs to be visible. This would probably be a 
sliding time window, or a fixed point in time that is

released by the dba/user.

Well there's another detail you have to cover aside from rolling back your
xmin. You have to find the rest of the snapshot including knowing what other
transactions were in-progress at the time you want to flash back to.

If you just roll back xmin and set xmax to the same value you'll get a
consistent view of the database but it may not match a view that was ever
current. That is, some of the transactions after the target xmin may have
committed before that xmin. So there was never a time in the database when
they were invisible but your new xmin was visible.

[...]
Incidentally this is one of the things that would be useful for read-only
access to PITR warm standby machines.



Couldn't you define things simply to be that you get a consistent view
including all transactions started before x transaction? This is time
travel lite, but low overhead which I think is a key benefit of this
approach.


I was thinking along the same line. Flashback is probably ony really
usefull on databases that are mostly read-only, but with a few users
who update data. You'd use flashback to undo catastrophic changes done
by accident, and probably will gladly accept that you undo a little
more work than strictly necessary.

On the contrary, if you're running a online shop were people buy stuff
24/7, and, say, somebody accidentally deletes some producs, than you
won't want to loose the orders happened during that last hour, but will
rather try to regenerate that products from your last backup.

So I don't think that it's too important what snapshot you get exactly,
making the xmin=xmax idea feasable.

The same holds true for PITR warm standby (readonly queries on pitr 
slaves). This would be used for reporting, or load-balancing of searches
in fairly static data - all of which won't depend on the exact snapshot 
you get.


greetings, Florian Pflug


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Peter Eisentraut

Magnus Hagander wrote:
> Windows supports both RAID and LVM.

Oh good, so we've got that on record. :)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Greg Smith


On Mon, 19 Feb 2007, Guillaume Smet wrote:

Why not simply put something like %log_table% in the sql file and let 
the admin replace it with sed or whatever he likes?


This is a reasonable approach.  I would suggest that no special characters 
be used though, so that the SQL could be used as-is by a DBA who doesn't 
even know about or want to use tools like sed.  I also think the default 
should be a pg_ name like the pg_log I suggested, partly because I'd like 
this to be completely internal one day--just push the logs into the 
database directly without even passing through an external file first. 
Also, something like pg_log is unlikely to cause a conflict with existing 
tables.  I would bet there's already databases out there with tables 
called log_table, for example, but everyone already avoids naming 
application tables starting with pg_.


A workable syntax might be

INSERT INTO "pg_log" ...

The redundant quotation marks will make it easier to do a search/replace 
to change the table name without worrying about accidentally impacting the 
text of the message, so that even people who aren't aware how to build a 
regular expression that only modifies the first match will probably be OK.


I consider using the same name as the default log directory helpful, but 
would understand that others might consider it confusing to overload the 
name like that.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread August Zajonc

Gregory Stark wrote:
> "Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes:
> 
>>> First we must run the query in serializable mode and replace 
>>> the snapshot with a synthetic one, which defines visibility 
>>> at the start of the desired transaction
>> We could use something that controls "global xmin".
>> It would ensure, that global xmin does not advance bejond
>> what still needs to be visible. This would probably be a 
>> sliding time window, or a fixed point in time that is
>> released by the dba/user.
> 
> Well there's another detail you have to cover aside from rolling back your
> xmin. You have to find the rest of the snapshot including knowing what other
> transactions were in-progress at the time you want to flash back to.
> 
> If you just roll back xmin and set xmax to the same value you'll get a
> consistent view of the database but it may not match a view that was ever
> current. That is, some of the transactions after the target xmin may have
> committed before that xmin. So there was never a time in the database when
> they were invisible but your new xmin was visible.
> 
>[...]
> Incidentally this is one of the things that would be useful for read-only
> access to PITR warm standby machines.
> 

Couldn't you define things simply to be that you get a consistent view
including all transactions started before x transaction? This is time
travel lite, but low overhead which I think is a key benefit of this
approach.

A huge value for this would be in the oops, I deleted my data category.
Postgresql rarely looses data, but clients seem to have a habit of doing
so, and then going oops. This seems to happen most often when facing
something like a reporting deadline where they are moving lots of stuff
around and making copies and sometimes delete the wrong "company"
recordset or equivalent, even with confirmation dialogs at the app level.

This would give a quick and easy oops procedure to the client. DBA set's
guc to 1hr, tells client, if you make a big mistake, stop database
server as follows and call. Frankly, would bail a few DBA's out as well.

The key is how lightweight the setup could be, which matters because
clients are not always willing to pay for a PITR setup. The low overhead
would mean you'd feel fine about setting guc to 1hr or so.

As a % of total installed instances I suspect the % with PITR is small.
I've got stuff I snapshot nightly, but that's it. So don't have an easy
out from the oops query either.

- August

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> Once this is done it may be worth having arrays convert to short varlenas as
>> well.
>
> Elements of arrays are not subject to being toasted by themselves, so
> I don't think you can make that work.  At least not without breaking
> wide swaths of code that works fine today.

You think it's more likely there are places that build arrays and then read
the items back without passing through detoast than there are places that
build tuples and do so?

Btw I ran into some problems with system tables. Since many of them are read
using the GETSTRUCT method and in that method the first varlena field should
be safely accessible, i would have to not skip the alignment for the first
varlena field in system tables. Instead I just punt on all system tables. The
only one that seems like it'll be loss on is pg_statistic and there the
biggest problem is the space wasted inside the arrays, not before the varlena
fields.

Also, int2vector and oidvector don't expect to be toasted so I've skipped them
as well. If we want to have an escape hatch they would have to be so marked.
For now I just hard coded them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Stefan Kaltenbrunner

Peter Eisentraut wrote:
> Tom Lane wrote:
>> Martijn van Oosterhout  writes:
>>> Somehow this seems like implementing RAID within postgres,
>> RAID and LVM too.  I can't get excited about re-inventing those
>> wheels when perfectly good implementations already exist for us to
>> sit on top of.
> 
> I expect that someone will point out that Windows doesn't support RAID 
> or LVM, and we'll have to reimplement it anyway.

windows supports software raid just fine since Windows 2000 or so ...


Stefan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Magnus Hagander

Peter Eisentraut wrote:
> Tom Lane wrote:
>> Martijn van Oosterhout  writes:
>>> Somehow this seems like implementing RAID within postgres,
>> RAID and LVM too.  I can't get excited about re-inventing those
>> wheels when perfectly good implementations already exist for us to
>> sit on top of.
> 
> I expect that someone will point out that Windows doesn't support RAID 
> or LVM, and we'll have to reimplement it anyway. 

Windows supports both RAID and LVM.

//Magnus


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Peter Eisentraut

Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > Somehow this seems like implementing RAID within postgres,
>
> RAID and LVM too.  I can't get excited about re-inventing those
> wheels when perfectly good implementations already exist for us to
> sit on top of.

I expect that someone will point out that Windows doesn't support RAID 
or LVM, and we'll have to reimplement it anyway.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut

Tom Lane wrote:
> SQL script maybe, much along the lines Greg was just mentioning.

I would welcome that, although a similar suggestion was rejected a few 
years ago, which is why I didn't pursue it here.

> you can't just decide to leave out a few OIDs on the
> spur of the moment.

I still don't understand why that would be a problem, aside from the 
fmgrtab problem that is specific to pg_proc.  Other system catalogs 
also have mixed entries with and without explicit OIDs.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Guillaume Smet


On 2/19/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

If it adds necessary context then it clear does not have "the same
behavior",


I mean log_line_prefix behaviour is the same. The other information
are syslog specific.


I'd propose adding a log_entry_prefix separate from log_line_prefix; the
entry prefix would contain most of the stuff, and log_line_prefix would
be a minimal thing intended to be put in front of each _line_, so the
example you show above could be


It could be a good idea.
It won't make me use stderr output but it will allow other people to
do so without any disadvantage :).


Really, prefixing with a tab does not strike me as a great idea
precisely because it's ambiguous.


Sure.

--
Guillaume

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] wishlist items ..

2007-02-19 Thread David Fetter

On Mon, Feb 19, 2007 at 01:28:46PM +0100, Lukas Kahwe Smith wrote:
> Hannu Krosing wrote:
> >Ãhel kenal pÃ¤eval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:
> >>Lukas Kahwe Smith wrote:
> >>
> >>>I just wanted to bring up the wishlist todo items:
> >>>http://developer.postgresql.org/index.php/Todo:WishlistFor83
> >
> >What does/did the row "Clustered/replication solutions" refer to ?
> 
> there was some discussion early on in 8.3 scoping to add some out of
> the box solutions for clustering and replication in order to reduce
> the "barrier to entry" for people who require such features.

I believe this has been started as a "replication hooks" project,
although I'm unsure of its current status.

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] How can I merge a TargetEntry with a tuple?

2007-02-19 Thread Zoltan Boszormenyi


Hi,

I started working again on my
IDENTITY/GENERATED patch.
My question is $SUBJECT.
This code is in rewriteTargetlist():

new_attr = build_column_default()
new_tle = makeTargetEntry((Expr *) new_expr, ...)

Now, in ExecInsert() I have to compute
the default for IDENTITY/GENERATED
between ExecConstraints() and heap_insert().

How can I create a Datum out of either an Expr
or a TargetEntry (that contains the computed
constant out of the default expression) so I can
use it after I did an heap_deform_tuple()?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Alvaro Herrera

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> >Guillaume Smet escribió:
> >> On 2/19/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> >> >So add the session ID (%c) to log_line_prefix.
> >>
> >> It could work if log_line_prefix was added before every line but it's
> >> definitely not the case:
> >> myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : 
> >SELECT *
> >>FROM lieu
> >>LIMIT 10;
> >> if you execute:
> >> SELECT *
> >> FROM lieu
> >> LIMIT 10;
> >
> >Interesting.  I wonder why didn't you report this as a bug before?
> >Maybe we could have discussed it and fixed it.
> 
> Perhaps because I thought it was not really a bug but the intended 
> behaviour.
> Syslog has the same behaviour and it's quite logical when you consider
> how queries are logged (I've spent a few hours in the logging code).
> Syslog has exactly the same behaviour but adds the necessary context
> information.

If it adds necessary context then it clear does not have "the same
behavior", because the problem is precisely that the context is missing.
I'd propose adding a log_entry_prefix separate from log_line_prefix; the
entry prefix would contain most of the stuff, and log_line_prefix would
be a minimal thing intended to be put in front of each _line_, so the
example you show above could be

myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
45d9d615.4abe FROM lieu
45d9d615.4abe LIMIT 10;

where you have

log_entry_prefix="%d %u "
log_line_prefix="%c "

Really, prefixing with a tab does not strike me as a great idea
precisely because it's ambiguous.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements

2007-02-19 Thread Guillaume Smet

On 2/19/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

Guillaume Smet escribió:
> On 2/19/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:
> >So add the session ID (%c) to log_line_prefix.
>
> It could work if log_line_prefix was added before every line but it's
> definitely not the case:
> myuser mydb 45d9d615.4abe LOG:  duration : 185.223 ms, statement : SELECT *
>FROM lieu
>LIMIT 10;
> if you execute:
> SELECT *
> FROM lieu
> LIMIT 10;

Interesting.  I wonder why didn't you report this as a bug before?
Maybe we could have discussed it and fixed it.

Perhaps because I thought it was not really a bug but the intended behaviour.
Syslog has the same behaviour and it's quite logical when you consider
how queries are logged (I've spent a few hours in the logging code).
Syslog has exactly the same behaviour but adds the necessary context
information.
I'm pretty sure I have explained the problem a few times on the lists
though but perhaps it was just on IRC.

From the feedback I have on pgFouine, very few people think it's a
real problem, probably because they don't use query logging as we do:
our production servers have it enabled all the time and we have a high
load on them so this particular case is a common case for us.

(Second try to move this discussion to -hackers)

--
Guillaume

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Confusing message on startup after a crash while recovering

2007-02-19 Thread Florian G. Pflug


Tom Lane wrote:

"Florian G. Pflug" <[EMAIL PROTECTED]> writes:

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."


It seems the real problem is that it's not specifying *which* data is
probably corrupted.  Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.


IMHO that wording would be fine too - the important points for me is to 
clearly state that corrupted data is maybe the _cause_ of the crash, and
not the _effect_ of the crash. And for the sake of consistency, the 
message for abort-during-recovery and abort-during-archivelog-replay 
should be similar.



Also, do we want to suggest use of pg_resetxlog in the message?

I'd rather add some documentation of how to use pg_resetxlog to the
manual if it's not already there, any maybe reference that chapter in
a HINT message. In that manual chapter you can warn about the dangers
of pg_resetxlog, and put in an advice to backup the database before 
using it. I think such a warning is important, because any documentation 
of pg_resetxlog is targeted at users know are not familiar with postgres

internals, and those users are likely to shoot themselves in their foot
if you point them to pg_resetxlog.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Bruce Momjian

Tom Lane wrote:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
> >> I don't understand -- what problem you got with "NO OPERATION"?  It
> >> seemed a sound idea to me.
> 
> > It seems nonorthogonal.  What if only some of the tables you mentioned did 
> > not 
> > exist?  Do you get "SOME OPERATION"?
> 
> I'd say you get DROP TABLE as long as at least one table was dropped.

If we went with DROP TABLE if any table was dropped, and NO OPERATION
for none, I am fine with that.  What I didn't want was a different NO
OPERATION-type of message for every object type.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Alvaro Herrera

Peter Eisentraut wrote:
> Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
> > Well, in the first place Gen_fmgrtab.sh is producing garbage:
> 
> Uh, ok, that needs fixing.
> 
> > In the second place, if you don't want to predetermine OIDs for your
> > functions then they shouldn't be in hardwired pg_proc.h rows at all.
> 
> Where else would you put them?

But _why_ wouldn't you want to have fixed OIDs for the functions?  I'm
not seeing the benefit.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Martijn van Oosterhout

On Mon, Feb 19, 2007 at 05:10:36PM +0100, Dimitri Fontaine wrote:
> > RAID and LVM too.  I can't get excited about re-inventing those wheels
> > when perfectly good implementations already exist for us to sit on top of.
> 
> I though moving some knowledge about data availability into PostgreSQL code 
> could provide some valuable performance benefit, allowing to organize reads 
> (for example parallel tables scan/indexes scan to different volumes) and 
> obtaining data from 'quicker' known volume (or least used/charged).

Well, organising requests to be handled quickly is a function of
LVM/RAID, so we don't go there. However, speeding up scans by having
multiple requests is an interesting approach, as would perhaps a
different random_page_cost for different tablespaces.

My point is, don't try to implement the mechanics of LVM/RAID into
postgres, instead, work on providing ways for users to take advantage
of these mechanisms if they have them. Look at it as if you have got
LVM/RAID setup for your ideas, how do you get postgres to take
advantage of them?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Tom Lane

Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> You're both saying RAID/LVM implementations provide good enough performances 
> for PG not having to go this way, if I understand correctly.

There's certainly no evidence to suggest that reimplementing them
ourselves would be a productive use of our time.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread tomas

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Mon, Feb 19, 2007 at 04:00:09PM +0100, Florian G. Pflug wrote:
[...]
> In the long run, you'd probably want to store the commit-times of 
> transactions somewhere, and add some guc that makes a vacuum assume
> that recently comitted transaction (say, in the last hour) are still
> considered active [...]

Funny how some things recur:

  

(says I and seeks shelter beneath a big rock ;-)

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFF2c3RBcgs9XrR2kYRAh1PAJ442IXzr0CjN0w5a3BpwBrKgVGvsgCcCmyh
mnM5AUTHo4uIZ/WCnWxLVM0=
=1aUG
-END PGP SIGNATURE-


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
>> In the second place, if you don't want to predetermine OIDs for your
>> functions then they shouldn't be in hardwired pg_proc.h rows at all.

> Where else would you put them?

SQL script maybe, much along the lines Greg was just mentioning.
(I'd been thinking myself earlier that pg_amop/amproc/etc would be a
whole lot easier to maintain if we could feed CREATE OPERATOR CLASS
commands to the bootstrap process.)  But getting there will take
nontrivial work; you can't just decide to leave out a few OIDs on the
spur of the moment.

Magnus, I'd suggest reverting whatever you did to your MSVC script,
so we'll find out the next time someone makes this mistake...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut

Am Montag, 19. Februar 2007 16:50 schrieb Tom Lane:
> Well, in the first place Gen_fmgrtab.sh is producing garbage:

Uh, ok, that needs fixing.

> In the second place, if you don't want to predetermine OIDs for your
> functions then they shouldn't be in hardwired pg_proc.h rows at all.

Where else would you put them?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Dimitri Fontaine

Le lundi 19 février 2007 16:33, Tom Lane a écrit :
> Martijn van Oosterhout  writes:
> > Somehow this seems like implementing RAID within postgres,
>
> RAID and LVM too.  I can't get excited about re-inventing those wheels
> when perfectly good implementations already exist for us to sit on top of.

I though moving some knowledge about data availability into PostgreSQL code 
could provide some valuable performance benefit, allowing to organize reads 
(for example parallel tables scan/indexes scan to different volumes) and 
obtaining data from 'quicker' known volume (or least used/charged).

You're both saying RAID/LVM implementations provide good enough performances 
for PG not having to go this way, if I understand correctly.
And distributed file systems are enough to have the replication stuff, without 
PG having to deal explicitly with the work involved.

May be I should have slept after all ;)

Thanks for your time and comments, regards,
-- 
Dimitri Fontaine

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> In the second place, if you don't want to predetermine OIDs for your
> functions then they shouldn't be in hardwired pg_proc.h rows at all.

Is there any place to hook in to create things like procedures or other SQL
objects that don't really need hard coded OIDs?

It seems like we could make the catalogs much easier to maintain by ripping
out everything that isn't needed by the system tables themselves and having
initdb create them by running a plain SQL script.

In particular I'm looking towards having all the operators and associated
hardware except for the basic btree operators for types used by the system
tables be created in plain SQL.

It might also reduce the pain OID conflicts cause.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] n-gram search function

2007-02-19 Thread Guillaume Smet


On 2/19/07, Oleg Bartunov  wrote:

You need to wait GiN support.


OK. Thanks.

If you need testers for this one, feel free to contact me. I'm very
interested in testing pg_trgm in conjunction with tsearch2.

--
Guillaume

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Montag, 19. Februar 2007 16:26 schrieb Tom Lane:
>> Peter Eisentraut <[EMAIL PROTECTED]> writes:
>>> Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
 This breaks the fmgrtab generator on msvc.
>>
>>> It's intentional.
>> 
>> Kindly change that intention.

> What is wrong?

Well, in the first place Gen_fmgrtab.sh is producing garbage:

#define F_CURSOR_TO_XML DATAINSERT
#define F_CURSOR_TO_XMLSCHEMA DATAINSERT
#define F_QUERY_TO_XML DATAINSERT
#define F_QUERY_TO_XML_AND_XMLSCHEMA DATAINSERT
#define F_QUERY_TO_XMLSCHEMA DATAINSERT
#define F_TABLE_TO_XML DATAINSERT
#define F_TABLE_TO_XML_AND_XMLSCHEMA DATAINSERT
#define F_TABLE_TO_XMLSCHEMA DATAINSERT
#define F_BYTEAOUT 31
#define F_CHAROUT 33

const FmgrBuiltin fmgr_builtins[] = {
  { 0, "cursor_to_xml", 5, true, false, cursor_to_xml },
  { 0, "cursor_to_xmlschema", 4, true, false, cursor_to_xmlschema },
  { 0, "query_to_xml", 4, true, false, query_to_xml },
  { 0, "query_to_xml_and_xmlschema", 4, true, false, query_to_xml_and_xmlschema 
},
  { 0, "query_to_xmlschema", 4, true, false, query_to_xmlschema },
  { 0, "table_to_xml", 4, true, false, table_to_xml },
  { 0, "table_to_xml_and_xmlschema", 4, true, false, table_to_xml_and_xmlschema 
},
  { 0, "table_to_xmlschema", 4, true, false, table_to_xmlschema },
  { 31, "byteaout", 1, true, false, byteaout },

The fact that that table is broken means you're incurring expensive
linear searches to invoke these functions.  It's only by chance that
it works at all...

In the second place, if you don't want to predetermine OIDs for your
functions then they shouldn't be in hardwired pg_proc.h rows at all.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Peter Eisentraut

Am Montag, 19. Februar 2007 15:57 schrieb Tom Lane:
> The precedent that I'm thinking about is that the command tag for COMMIT
> varies depending on what it actually did.

Some have also argued against that in the past, so I guess we just have 
different ideas of how it should work.  Not a problem.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut

Am Montag, 19. Februar 2007 16:26 schrieb Tom Lane:
> Peter Eisentraut <[EMAIL PROTECTED]> writes:
> > Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
> >> This breaks the fmgrtab generator on msvc. Most likely because I didn't
> >> think of that case. But since all other rows in pg_proc.h contain the
> >> oid, I just wanted to check if they're actually supposed to be withuot
> >> oid, or if that was a mistake?
> >
> > It's intentional.
>
> Kindly change that intention.

What is wrong?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Tom Lane

Martijn van Oosterhout  writes:
> Somehow this seems like implementing RAID within postgres,

RAID and LVM too.  I can't get excited about re-inventing those wheels
when perfectly good implementations already exist for us to sit on top of.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Short varlena headers and arrays

2007-02-19 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Once this is done it may be worth having arrays convert to short varlenas as
> well.

Elements of arrays are not subject to being toasted by themselves, so
I don't think you can make that work.  At least not without breaking
wide swaths of code that works fine today.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
>> This breaks the fmgrtab generator on msvc. Most likely because I didn't
>> think of that case. But since all other rows in pg_proc.h contain the
>> oid, I just wanted to check if they're actually supposed to be withuot
>> oid, or if that was a mistake?

> It's intentional.

Kindly change that intention.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Howto change db cluster locale on-the-fly

2007-02-19 Thread Tom Lane

Martijn van Oosterhout  writes:
>> But I guess something like this would work:
>> 1) drop all indexes on text/varchar columns
>> 2) change cluster locale
>> 3) create all indexes on text/varchar columns

> You're going to miss the "name" columns, ie. every string index in
> pg_catalog.

But "name" is not locale-aware --- it just uses strcmp().  AFAIR there
aren't any locale-dependent indexes in the system catalogs.  So in
principle you could hack pg_control, restart the postmaster, and then
reindex every locale-dependent index.  Hacking pg_control would be the
hard part; you'll never get the CRC right if you do it manually.  Possibly
pg_resetxlog could be adapted to the purpose.

> I'd suggest single user mode at least, and make backups!

Yup, a filesystem backup would be a *real* good idea.  Not to mention
testing the procedure on a toy installation.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Gregory Stark

"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes:

>> First we must run the query in serializable mode and replace 
>> the snapshot with a synthetic one, which defines visibility 
>> at the start of the desired transaction
>
> We could use something that controls "global xmin".
> It would ensure, that global xmin does not advance bejond
> what still needs to be visible. This would probably be a 
> sliding time window, or a fixed point in time that is
> released by the dba/user.

Well there's another detail you have to cover aside from rolling back your
xmin. You have to find the rest of the snapshot including knowing what other
transactions were in-progress at the time you want to flash back to.

If you just roll back xmin and set xmax to the same value you'll get a
consistent view of the database but it may not match a view that was ever
current. That is, some of the transactions after the target xmin may have
committed before that xmin. So there was never a time in the database when
they were invisible but your new xmin was visible.

I think to do this you'll need to periodically record a snapshot and then
later restore one of those saved snapshots. Not sure where would be a good
place to record them. The WAL seems like a handy place but digging through the
WAL would be annoying.

Incidentally this is one of the things that would be useful for read-only
access to PITR warm standby machines.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Pavan Deolasee

On 2/19/07, Tom Lane <[EMAIL PROTECTED]> wrote:

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
>> I don't understand -- what problem you got with "NO OPERATION"?  It
>> seemed a sound idea to me.

> It seems nonorthogonal.  What if only some of the tables you mentioned
did not
> exist?  Do you get "SOME OPERATION"?

I'd say you get DROP TABLE as long as at least one table was dropped.

How about DROP TABLE  where 'cnt' is the number of tables dropped ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-19 Thread Tom Lane

"Dawid Kuroczko" <[EMAIL PROTECTED]> writes:
> ... Now, assuming UNIQUE INDEX on such table, the order would be preserved
> since no two intervals can overlap.  And no overlapping data could be inserted
> without breaking "ovelapivity". And of course non-unique index would
> produce garbage (since left of/right of wouldn't make any sense anymore).

I think actually it doesn't work for unique indexes either :-( because
of dead tuples.  Consider that we have in the index

...
(1,2)
(6,8)   DEAD
(4,10)
(12,14)
...

Since under the given operators (6,8) and (4,10) are "equal", btree will
not guarantee that those index entries appear in any particular relative
order.  Thus the above is a legal index configuration.  Now insert (3,5).
This should surely be rejected because it overlaps (4,10).  But what
may well happen is that it gets compared to (1,2) --- OK, it's greater
--- and to (6,8) --- OK, it's less --- and then the uniqueness check stops,
because if it's less than (6,8) then there is no need to search further.
Ooops.

*This* is why the transitive law is essential.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] pg_restore fails with a custom backup file

2007-02-19 Thread Magnus Hagander

On Sat, Feb 17, 2007 at 01:28:22PM -0500, Tom Lane wrote:
> Magnus Hagander <[EMAIL PROTECTED]> writes:
> > I'd also like a comment from at least one other "patch reviewer" that
> > the methods used are good.
> 
> It looks reasonable as far as it goes.  One thought is that pg_dump
> really should have noticed that it was writing a broken archive.
> On machines where off_t is 32 bits, can't we detect the overflow
> situation?

Tested on MSVC as well, works. Also tested and doesn't break the build
on Linux (which shouldn't be affected at all).

So, patch applied to HEAD and 8.2.

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] n-gram search function

2007-02-19 Thread Oleg Bartunov


On Mon, 19 Feb 2007, Guillaume Smet wrote:


On 2/19/07, Oleg Bartunov  wrote:

pg_trgm was developed for spelling corrrection and there is a threshold of
similarity, which is 0.3 by default. Readme explains what does it means.


Yes, I read it.

Similarity could be very low, since you didn't make separate column and 
length

of the full string is used to normalize similarity.


Yep, that's probably my problem. Ignored records are a bit longer than
the others.

I tried the tip in README.pg_trgm to generate a table with all the words.

It can do the work in conjunction of tsearch2 and a bit of AJAX to
suggest the full words to the users. The reason why I was not using
tsearch2 is that it's sometimes hard to spell location names
correctly.

The only problem is that it is still quite slow on a 50k rows words
table but I'll make further tests on a decent server this afternoon.


You need to wait GiN support.


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
>> I don't understand -- what problem you got with "NO OPERATION"?  It
>> seemed a sound idea to me.

> It seems nonorthogonal.  What if only some of the tables you mentioned did 
> not 
> exist?  Do you get "SOME OPERATION"?

I'd say you get DROP TABLE as long as at least one table was dropped.

> There are also other cases where commands don't have an effect but we don't 
> explicitly point that out.

The precedent that I'm thinking about is that the command tag for COMMIT
varies depending on what it actually did.

regression=# begin;
BEGIN
regression=# select 1/0;
ERROR:  division by zero
regression=# commit;
ROLLBACK
regression=#

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug


Zeugswetter Andreas ADI SD wrote:
First we must run the query in serializable mode and replace the 
snapshot with a synthetic one, which defines visibility at the
start 

of the desired transaction

probably it is a good idea to take a lock on all tables involved to



avoid a vacuum to be started on them when the query is running.
Would the xmin exported by that transaction prevent vacuum from 
removing any tuples still needed for the flashback snapshot?

Sure, and that makes the mentioned lock unnecessary.


Problem is, that that transaction sets a historic snapshot at a later
time, so it is not yet running when vacuum looks at "global xmin".
So something else needs to hold up global xmin (see prev post).


I think to make this flashback stuff fly, you'd need to know the 
earliest xmin that you can still flashback too. Vacuum would advance

that xmin, as soon as it starts working. So the case you'd need to
protect against would be a race condition when you start a vacuum
and a flashback transaction at the same time. But for that, some simple
semaphore should suffice, and a well-thought-out ordering of the actions
taken.

In the long run, you'd probably want to store the commit-times of 
transactions somewhere, and add some guc that makes a vacuum assume

that recently comitted transaction (say, in the last hour) are still
considered active. That allow the dba to guarantee that he can always
flashback at least a hour.

greetings, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Zeugswetter Andreas ADI SD


> > >First we must run the query in serializable mode and replace the 
> > >snapshot with a synthetic one, which defines visibility at the
start 
> > >of the desired transaction
> > >
> > >probably it is a good idea to take a lock on all tables involved to

> > >avoid a vacuum to be started on them when the query is running.
> > Would the xmin exported by that transaction prevent vacuum from 
> > removing any tuples still needed for the flashback snapshot?
> 
> Sure, and that makes the mentioned lock unnecessary.

Problem is, that that transaction sets a historic snapshot at a later
time, so it is not yet running when vacuum looks at "global xmin".
So something else needs to hold up global xmin (see prev post).

Andreas

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Zeugswetter Andreas ADI SD


> > Well this is certainly interesting. What do we think it 
> would take to 
> > enable the functionality?
> 
> First we must run the query in serializable mode and replace 
> the snapshot with a synthetic one, which defines visibility 
> at the start of the desired transaction

We could use something that controls "global xmin".
It would ensure, that global xmin does not advance bejond
what still needs to be visible. This would probably be a 
sliding time window, or a fixed point in time that is
released by the dba/user.

Then all below is not really different from a situation where
you had a long running tx.  

> probably it is a good idea to take a lock on all tables 
> involved to avoid a vacuum to be started on them when the 
> query is running.
> 
> also, we can't trust the DELETED flags in index pages, so we 
> should forbid index scans, or just always re-check the 
> visibility in heap.
> 
> Otherways it would probably be enough to just scan tuples as 
> usual, and check if they were visible to desired transaction, 
> that is they were inserted before that transaction and they 
> are not deleted before that trx.
> 
> Of course this will not be true, once we have HOT/WIP with 
> in-page vacuuming.

Currently I think HOT does honor "global xmin". There is no 
lookup for relevant xids, so parts of an update chain where
only a previous tuple or a later tuple can be visible are reused.
Else Hot would need to be told not to, in a scenario where
a backend can choose a snapshot at will.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Alvaro Herrera

Florian G. Pflug escribió:
> Hannu Krosing wrote:
> >Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
> >>Hannu Krosing wrote:
> >>>Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
> >>>To get a flashback query, you "just" have to construct a snapshot from
> >>>that time and you are done. We don't store transaction times anywere, so
> >>>the flashback has to be by transaction id, but there is very little
> >>>extra work involved. We just don't have syntax for saying "SELECT ... AS
> >>>SEEN BY TRANSACTION XXX"
> >>Well this is certainly interesting. What do we think it would take to
> >>enable the functionality?
> >
> >First we must run the query in serializable mode and replace the
> >snapshot with a synthetic one, which defines visibility at the start of
> >the desired transaction
> >
> >probably it is a good idea to take a lock on all tables involved to
> >avoid a vacuum to be started on them when the query is running.
> Would the xmin exported by that transaction prevent vacuum from removing
> any tuples still needed for the flashback snapshot?

Sure, and that makes the mentioned lock unnecessary.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Florian G. Pflug


Hannu Krosing wrote:

Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:

Hannu Krosing wrote:

Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
To get a flashback query, you "just" have to construct a snapshot from
that time and you are done. We don't store transaction times anywere, so
the flashback has to be by transaction id, but there is very little
extra work involved. We just don't have syntax for saying "SELECT ... AS
SEEN BY TRANSACTION XXX"

Well this is certainly interesting. What do we think it would take to
enable the functionality?


First we must run the query in serializable mode and replace the
snapshot with a synthetic one, which defines visibility at the start of
the desired transaction

probably it is a good idea to take a lock on all tables involved to
avoid a vacuum to be started on them when the query is running.

Would the xmin exported by that transaction prevent vacuum from removing
any tuples still needed for the flashback snapshot?

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Peter Eisentraut

Am Montag, 19. Februar 2007 13:12 schrieb Alvaro Herrera:
> I don't understand -- what problem you got with "NO OPERATION"?  It
> seemed a sound idea to me.

It seems nonorthogonal.  What if only some of the tables you mentioned did not 
exist?  Do you get "SOME OPERATION"?

There are also other cases where commands don't have an effect but we don't 
explicitly point that out.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New feature request: FlashBack Query

2007-02-19 Thread Hannu Krosing

Ühel kenal päeval, P, 2007-02-18 kell 14:27, kirjutas Joshua D. Drake:
> Hannu Krosing wrote:
> > Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner:
> > 
> >>
> >> However, they don't have vacuum, we do.
> >>
> >> Right, and I think that is more or less because Oracle doesn't need
> >> it.  Vacuum's main purpose (correct me if I am wrong) is to
> >> recover/mark rows that are no longer used, and Oracle essentially
> >> reuses the space immediately. 
> >>
> >> Obviously with Oracle if you bloat out a table and delete a ton of
> >> rows then you have to rebuild the table, but that is more or less the
> >> same problem that PostgreSQL has and where vacuum full comes into
> >> play.
> >>
> >> The only benefit with the Oracle model is that you can achieve
> >> flashback, which is a very rarely used feature in my book.
> > 
> > We can have flashbacks up to the last vacuum. It is just not exposed.
> > Don't vacuum, and you have the whole history. (Actually you can't go for
> > more than 2G transactions, or you get trx id rollover).
> > 
> > To get a flashback query, you "just" have to construct a snapshot from
> > that time and you are done. We don't store transaction times anywere, so
> > the flashback has to be by transaction id, but there is very little
> > extra work involved. We just don't have syntax for saying "SELECT ... AS
> > SEEN BY TRANSACTION XXX"
> 
> Well this is certainly interesting. What do we think it would take to
> enable the functionality?

First we must run the query in serializable mode and replace the
snapshot with a synthetic one, which defines visibility at the start of
the desired transaction

probably it is a good idea to take a lock on all tables involved to
avoid a vacuum to be started on them when the query is running.

also, we can't trust the DELETED flags in index pages, so we should
forbid index scans, or just always re-check the visibility in heap.

Otherways it would probably be enough to just scan tuples as usual, and
check if they were visible to desired transaction, that is they were
inserted before that transaction and they are not deleted before that
trx.

Of course this will not be true, once we have HOT/WIP with in-page
vacuuming.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] wishlist items ..

2007-02-19 Thread Lukas Kahwe Smith


Hannu Krosing wrote:

Ühel kenal päeval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:

Lukas Kahwe Smith wrote:


I just wanted to bring up the wishlist todo items:
http://developer.postgresql.org/index.php/Todo:WishlistFor83


What does/did the row "Clustered/replication solutions" refer to ?


there was some discussion early on in 8.3 scoping to add some out of the 
box solutions for clustering and replication in order to reduce the 
"barrier to entry" for people who require such features.


regards,
Lukas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] wishlist items ..

2007-02-19 Thread Hannu Krosing

Ühel kenal päeval, L, 2007-02-17 kell 13:35, kirjutas Lukas Kahwe Smith:
> Lukas Kahwe Smith wrote:
> 
> > I just wanted to bring up the wishlist todo items:
> > http://developer.postgresql.org/index.php/Todo:WishlistFor83

What does/did the row "Clustered/replication solutions" refer to ?


-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-19 Thread Alvaro Herrera

Bruce Momjian wrote:
> Tom Lane wrote:
> > "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > > On Thu, Feb 08, 2007 at 01:54:13PM -0500, Tom Lane wrote:
> > >> I would be satisfied if the returned command tag were something else,
> > >> maybe "NO OPERATION".
> > 
> > > "TABLE blah DID NOT EXIST" might be less confusing...
> > 
> > You're confusing a command tag with a notice.  In the first place,
> > we shouldn't assume that applications are ready to deal with
> > indefinitely long command tags (the backend itself doesn't think they
> > can be longer than 64 bytes); in the second place, they should be
> > constant strings for the most part so that simple strcmp()s suffice
> > to see what happened.  Command tags are meant for programs to deal
> > with, more than humans.
> 
> Yep.  Because IF EXISTS is in a lot of object destruction commands,
> adding a modified tag seems very confusing, because in fact the DROP
> TABLE did succeed, so to give any other tag seems incorrect.

I don't understand -- what problem you got with "NO OPERATION"?  It
seemed a sound idea to me.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Martijn van Oosterhout

On Mon, Feb 19, 2007 at 11:25:41AM +0100, Dimitri Fontaine wrote:
> Hi list,
> 
> Here's a proposal of this idea which stole a good part of my night.
> I'll present first the idea, then 2 use cases where to read some rational and 
> few details. Please note I won't be able to participate in any development 
> effort associated with this idea, may such a thing happen!
> 
> The bare idea is to provide a way to 'attach' multiple storage facilities 
> (say 
> volumes) to a given tablespace. Each volume may be attached in READ ONLY, 
> READ WRITE or WRITE ONLY mode.
> You can mix RW and WO volumes into the same tablespace, but can't have RO 
> with 
> any W form, or so I think.

Somehow this seems like implementing RAID within postgres, which seems
a bit outside of the scope of a DB.

> Use Case A: better read performances while keeping data write reliability
> 
> The first application of this multiple volumes per tablespace idea is to keep 
> a tablespace both into RAM (tmpfs or ramfs) and on disk (both RW).

For example, I don't beleive there is a restiction against having one
member of a RAID array being a RAM disk.

> Use Case B: Synchronous Master Slave(s) Replication
> 
> By using a Distributed File System capable of being mounted from several 
> nodes 
> at the same time, we could have a configuration where a master node has 
> ('exports') a WO tablespace volume, and one or more slaves (depending on FS 
> capability) configures a RO tablespace volume.

Here you have the problem of row visibility. The data in the table isn't
very useful without the clog, and that's not stored in a tablespace...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] autovacuum next steps

2007-02-19 Thread Csaba Nagy

> One option that I've heard before is to have vacuum after a single iteration
> (ie, after it fills maintenance_work_mem and does the index cleanup and the
> second heap pass), remember where it was and pick up from that point next
> time.

>From my experience this is not acceptable... I have tables for which the
index cleanup takes hours, so no matter how low I would set the
maintenance_work_mem (in fact I set it high enough so there's only one
iteration), it will take too much time so the queue tables get overly
bloated (not happening either, they get now special "cluster"
treatment).

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Magnus Hagander

On Mon, Feb 19, 2007 at 11:25:02AM +0100, Peter Eisentraut wrote:
> Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
> > This breaks the fmgrtab generator on msvc. Most likely because I didn't
> > think of that case. But since all other rows in pg_proc.h contain the
> > oid, I just wanted to check if they're actually supposed to be withuot
> > oid, or if that was a mistake?
> 
> It's intentional.

Could you explain why, and what the expected result is? Since I can't
find any other examples of people doing it :-)

(will fix the vc stuff, of course, but still interested in knowing)

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-19 Thread Ian Caulfield


On 17/02/07, Warren Turkal <[EMAIL PROTECTED]> wrote:

PERIOD(INT) is actually listed in the Dr. Snodgrass's book. However, I am not
really sure about the semantics of the type. When would you use a
PERIOD(INT)?


It wouldn't be directly useful for temporal SQL, but I have a number
of tables in a database application where a range of integers is
mapped onto a 0-100 range (e.g. 1-5 might get mapped onto 1, 6-15 to
2, etc), which I'd like to store using a (preferably non-overlapping)
period type.


Also, please bring this discussion back on list so that it gets recorded. I
didn't want to post your private reply to the list without your permission.


Oops, meant to reply to the list originally.

Ian

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[HACKERS] Multiple Storage per Tablespace, or Volumes

2007-02-19 Thread Dimitri Fontaine

Hi list,

Here's a proposal of this idea which stole a good part of my night.
I'll present first the idea, then 2 use cases where to read some rational and 
few details. Please note I won't be able to participate in any development 
effort associated with this idea, may such a thing happen!

The bare idea is to provide a way to 'attach' multiple storage facilities (say 
volumes) to a given tablespace. Each volume may be attached in READ ONLY, 
READ WRITE or WRITE ONLY mode.
You can mix RW and WO volumes into the same tablespace, but can't have RO with 
any W form, or so I think.

It would be pretty handy to be able to add and remove volumes on a live 
cluster, and this could be a way to implement moving/extending tablespaces.


Use Case A: better read performances while keeping data write reliability

The first application of this multiple volumes per tablespace idea is to keep 
a tablespace both into RAM (tmpfs or ramfs) and on disk (both RW).

Then PG should be able to read from both volumes when dealing with read 
queries, and would have to fwrite()/fsync() both volumes for each write.
Of course, write speed will be constrained by the slowest volume, but the 
quicker one would then be able to take away some amount of read queries 
meanwhile.

It would be neat if PG was able to account volumes relative write speed in 
order to assign pounds to each tablespace volumes; and have the planner or 
executor span read queries among volumes depending on that.
For example if a single query has a plan containing several full scan (of 
indexes and/or tables) in the same tablespace, those could be done on 
different volumes. 

Use Case B: Synchronous Master Slave(s) Replication

By using a Distributed File System capable of being mounted from several nodes 
at the same time, we could have a configuration where a master node has 
('exports') a WO tablespace volume, and one or more slaves (depending on FS 
capability) configures a RO tablespace volume.

PG has then to be able to cope with a RO volume: the data are not written by 
PG itself (local node point of view), so some limitations would certainly 
occur.
Will it be possible, for example, to add indexes to data on slaves?
I'd use the solution even without this, thus...

When the master/slave link is broken, the master can no more write to 
tablespace, as if it was a local disk failure of some sort, so this should 
prevent nasty desync' problems: data is written on all W volumes or data is 
not written at all.


I realize this proposal is the first draft of a work to be done, and that I 
won't be able to make a lot more than drafting this idea. This mail is sent 
on the hackers list in the hope someone there will find this is worth 
considering and polishing...

Regards, and thanks for the good work ;)
-- 
Dimitri Fontaine


pgp4lUKkfwp0p.pgp
Description: PGP signature

[HACKERS] Short varlena headers and arrays

2007-02-19 Thread Gregory Stark


I had intended to make varlenas alignment 'c' and have the heaptuple.c force
them to alignment 'i' if they required it. However I've noticed a problem that
makes me think I should do this the other way around.

The problem is that other places in the codebase use the alignment. In
particular arrays do. Also toasting.c expects to get a worst-case size from
att_align rather than a best-case. Also there's indextuple.c but probably I
should get to that in this round anyways.

So now I'm thinking it's best to leave them as alignment 'i' unless
heaptuple.c thinks it can get away without aligning them. This means we don't
have a convenient way for data types to opt out of this header compression.
But the more I think about it the less convinced I am that we need that. The
alignment inside the data type doesn't matter since you'll only be working
with detoasted versions of them unless you specifically go out of your way to
do otherwise.


Once this is done it may be worth having arrays convert to short varlenas as
well. Arrays of short strings hurt pretty badly currently:

postgres=# select pg_column_size(array['a','b','c','d']);
 pg_column_size 

 56
(1 row)

The only problem with this is if it's more likely for someone to stuff things
in an array and then read them back out without detoasting than it is for
someone to stuff them in a tuple. Probably the risk is the same. There is some
code that assumes it understands how arrays are laid out in execQual.c and
varlena.c.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] n-gram search function

2007-02-19 Thread Guillaume Smet


On 2/19/07, Oleg Bartunov  wrote:

pg_trgm was developed for spelling corrrection and there is a threshold of
similarity, which is 0.3 by default. Readme explains what does it means.


Yes, I read it.


Similarity could be very low, since you didn't make separate column and length
of the full string is used to normalize similarity.


Yep, that's probably my problem. Ignored records are a bit longer than
the others.

I tried the tip in README.pg_trgm to generate a table with all the words.

It can do the work in conjunction of tsearch2 and a bit of AJAX to
suggest the full words to the users. The reason why I was not using
tsearch2 is that it's sometimes hard to spell location names
correctly.

The only problem is that it is still quite slow on a 50k rows words
table but I'll make further tests on a decent server this afternoon.

--
Guillaume

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] pg_proc without oid?

2007-02-19 Thread Peter Eisentraut

Am Montag, 19. Februar 2007 10:16 schrieb Magnus Hagander:
> This breaks the fmgrtab generator on msvc. Most likely because I didn't
> think of that case. But since all other rows in pg_proc.h contain the
> oid, I just wanted to check if they're actually supposed to be withuot
> oid, or if that was a mistake?

It's intentional.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

1 2 >

1 - 100 of 107 matches

Mail list logo