date:20070216

Re: [HACKERS] n-gram search function

2007-02-16 Thread Oleg Bartunov


3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index,
but may be enhanced with the GiN.

Oleg

On Sat, 17 Feb 2007, Tatsuo Ishii wrote:


Hi,

Is anybody working on implementing n-gram search functionality for
text type data? tsearch2 is great for long text but it's not
appropreate for short (10-100 bytes) text data. What I want to achieve
is a fast partial match search using indexes, i.e. foo ~ 'bar' or foo
LIKE '%bar%' type matching.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-16 Thread Oleg Bartunov


On Fri, 16 Feb 2007, Alvaro Herrera wrote:


Jim C. Nasby wrote:

My suggestion would be to focus on a period data type first and
foremost, as that's something that could be readily used by a lot of
folks. Of particular note, it's difficult to query tables that have
start_time and end_time fields to define a period; it's easy to screw up
the boundary conditions, and it's also hard to make those queries
perform well without going to extra lengths (such as defining a 'bogus'
GiST index on something like box(point(start,start),point(end,end)). And
it's not possible to do that in a way that avoids floating points and
their errors.


FWIW there's already a type called tinterval that stores (start,end).  I
don't think it's very much documented; maybe it can be extended or used
as base for a new, more complete and robust type, indexable in a more
natural way, etc etc.


RI-Tree (Relational intervar tree)
http://www.dbs.informatik.uni-muenchen.de/Forschung/CAD/presentations/RI-Tree.pdf
looks promising for that purposes.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] Invalid to_date patterns (was: [PATCHES] [GENERAL] ISO week dates)

2007-02-16 Thread Brendan Jurd

On 2/17/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:

Bruce Momjian escribió:

> Maybe now would be an appropriate time to discuss the open questions in
> the submitting email:

> > Brendan Jurd wrote:
> > > I'd also like to raise the topic of how conversion from text to ISO
> > > week dates should be handled, where the user has specified a bogus
> > > mixture of fields.  Existing code basically ignores these issues; for
> > > example, if a user were to call to_date('1998-01-01 2454050',
> > > '-MM-DD J') the function returns 2006-01-01, a result of setting
> > > the year field from , then overwriting year, month and day with
> > > the values from the Julian date in J, then setting the month and day
> > > normally from MM and DD.
> > >
> > > 2006-01-01 is not a valid representation of either of the values the
> > > user specified.  Now you might say "ask a silly question, get a silly
> > > answer"; the user shouldn't send nonsense arguments to to_date and
> > > expect a sensible result.  But perhaps the right way to respond to a
> > > broken timestamp definition is to throw an error, rather than behave
> > > as though everything has gone to plan, and return something which is
> > > not correct.
> > >
> > > The same situation can arise if the user mixes ISO and Gregorian data;
> > > how should Postgres deal with something like to_date('2006-250',
> > > 'IYYY-DDD')?  The current behaviour in my patch is actually to assume
> > > that the user meant to say 'IYYY-IDDD', since "the 250th Gregorian day
> > > of the ISO year 2006" is total gibberish.  But perhaps it should be
> > > throwing an error message.

My thinking is that erroneous patterns should throw an error, and not
try to second-guess the user.  (IIRC this was being discussed in some
other thread not long ago).

It seems to me there are basically two different responses to the
problem of invalid patterns.  One is to reject all patterns which
potentially under- or over-constrain the date value, and the other is
to only reject those patterns which, when applied to the given date
string, actually cause a conflict.

For example, on the surface the pattern '-MM-DD J' would appear to
be invalid, because it specifies the date using both the Gregorian and
Julian conventions.  You could argue that the whole idea of using a
pattern like this is bogus, and reject the pattern as soon as it is
parsed.

On the other hand, if a user called to_date('2007-02-17 2454149',
'-MM-DD J'), and you attempted to resolve the pattern you would
find that the Julian date and the Gregorian date agree perfectly with
each other, and there is no reason to reject the conversion.

My gut reaction at first was to go with the former approach.  It's
programmatically more simple, and it's easier to explain in
documentation/error messages.  But then it occurred to me that one of
the use cases for to_date is slurping date information out of textual
reports which may contain redundant date information.  If a user
wanted to parse something like "2007-02-17 Q1", he would probably try
'-MM-DD "Q"Q', even though this pattern is logically
over-constraining.  Would it be fair to throw an error in such a case?

Please let me know what you think.

BJ

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] add to ToDo, please

2007-02-16 Thread Bruce Momjian

Pavel Stehule wrote:
> Hello
> 
> please add to ToDo: Holdable cursor support in SPI

Added:

* Allow holdable cursors in SPI

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Ron Mayer

Alvaro Herrera wrote:
> 
> Once autovacuum_naptime... autovacuum_max_workers...
> How does this sound?

The knobs exposed on autovacuum feel kinda tangential to
what I think I'd really want to control.

IMHO "vacuum_mbytes_per_second" would be quite a bit more
intuitive than cost_delay, naptime, etc.

ISTM I can relatively easily estimate and/or spec out how
much "extra" I/O bandwidth I have per device for vacuum;
and would pretty much want vacuum to be constantly
running on whichever table that needs it the most so
long as it can stay under that bandwith limit.

Could vacuum have a tunable that says "X MBytes/second"
(perhaps per device) and have it measure how much I/O
it's actually doing and try to stay under that limit?

For more fine-grained control a cron job could go
around setting different MBytes/second limits during
peak times vs idle times.

If people are concerned about CPU intensive vacuums
instead of I/O intensive ones (does anyone experience
that? - another tuneable "vacuum_percent_of_cpu" would
be more straightforward than delay_cost, cost_page_hit,
etc.   But I'd be a bit surprised if cpu intensive
vacuums are common.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] NULL and plpgsql rows

2007-02-16 Thread Bruce Momjian


Added to TODO under PL/pgSQL:

o Allow row and record variables to be set to NULL constants,
  and allow NULL tests on such variables

  Because a row is not scalar, do not allow assignment
  from NULL-valued scalars.


---

Jim C. Nasby wrote:
> On Tue, Feb 13, 2007 at 05:55:11PM -0500, Bruce Momjian wrote:
> > 
> > Is there a TODO here?
> > 
> > ---
> > 
> > Jim Nasby wrote:
> > > On Oct 2, 2006, at 6:28 PM, Tom Lane wrote:
> > > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > > >> However, the test right above that means that we'll fail if the user
> > > >> tries something like "row_variable := NULL;":
> > > >
> > > > The patch you seem to have in mind would allow
> > > > row_variable := int_variable;
> > > > to succeed if the int_variable chanced to contain NULL, which is  
> > > > surely
> > > > not very desirable.
> 
> Well, that's Tom's objection, though I'm not sure if by 'int_variable'
> he means 'internal' or 'integer'.
> 
> Personally, I think it would be useful to just allow setting a row or
> record variable to NULL as I showed it above; ie: no variables involved.
> This is something you might want to do to invalidate a row/record
> variable after taking some action (perhaps deleting a row).
> 
> You'd also think that you should be able to detect if a record variable
> is null, as you can with row.
> 
> So, I suggest:
> 
> * Allow row and record variables in plpgsql to be set to NULL
> 
> It's not clear if it's a wise idea to allow this assignment from a
> variable. It may be better to only allow explicitly setting them,
> ie:
> 
> row_variable := NULL;
> 
> * Allow testing a record variable to see if it's NULL
> 
> Currently works for row variables, but not record variables
> -- 
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [pgsql-patches] [PATCHES] [HACKERS] [Fwd: Index Advisor]

2007-02-16 Thread Bruce Momjian


I need someone to review this patch to make sure the API used is
logical.  You can do that by reading the README file in the patch.  To
me, the procedure seems overly complicated, and too restrictive.

The patch is in the patches queue.

---

Gurjeet Singh wrote:
> On 1/20/07, Bruce Momjian <[EMAIL PROTECTED]> wrote:
> >
> >
> > I can't read a 7z file on my end.  Please email me the file and I will
> > put it at a URL.
> >
> >
> > ---
> >
> > Gurjeet Singh wrote:
> > > Please find attached the patches ported to HEAD as of now. The patch to
> > the
> > > contrib modules is the same as before; the version number has been kept
> > but
> > > branch designator has been changed.
> > >
> > > 1) pg_post_planner_plugin-HEAD_20070116-v2.patch.gz
> > > 2) pg_index_adviser-HEAD_20070116-v26.7z
> > >
> >
> 
> I am attaching the .gz versions of both the patches, and CC'ing to -patches
> also. If it doesn't turn up on -patches even this time, then please do the
> needful.
> 
> Thanks and best regards,
> 
> -- 
> [EMAIL PROTECTED]
> [EMAIL PROTECTED] gmail | hotmail | yahoo }.com

[ Attachment, skipping... ]

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] 8.3 patches hold queue empty

2007-02-16 Thread Bruce Momjian

I have completed processing of the 8.3 patches "hold" queue.  There are
some emails where I am waiting on a reply from the authors, but I will
now just handle them as part of the normal patch process.

I will now return to processing patches as they come in, and deal with
the patches that are now waiting.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] n-gram search function

2007-02-16 Thread Tatsuo Ishii

Hi,

Is anybody working on implementing n-gram search functionality for
text type data? tsearch2 is great for long text but it's not
appropreate for short (10-100 bytes) text data. What I want to achieve
is a fast partial match search using indexes, i.e. foo ~ 'bar' or foo
LIKE '%bar%' type matching.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Each worker, including the initial one, starts vacuuming tables
> according to pgstat data.  They recheck the pgstat data after finishing
> each table, so that a table vacuumed by another worker is not processed
> twice (maybe problematic: a table with high update rate may be vacuumed
> more than once.  Maybe this is a feature not a bug).

How are you going to make that work without race conditions?  ISTM
practically guaranteed that all the workers will try to vacuum the same
table.

> Once autovacuum_naptime has passed, if the workers have not finished
> yet, the launcher wants to vacuum another database.

This seems a rather strange design, as it will encourage concentrations
of workers in a single database.  Wouldn't it be better to spread them
out among multiple databases by default?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Chris Browne

[EMAIL PROTECTED] (Alvaro Herrera) writes:
> When there is a single worker processing a database, it does not recheck
> pgstat data after each table.  This is to prevent a high-update-rate
> table from starving the vacuuming of other databases.

This case is important; I don't think that having multiple workers
fully alleviates the problem condition.

Pointedly, you need to have a way of picking up tables often enough to
avoid the XID rollover problem.  That may simply require that on some
periodic basis, a query is run to queue up tables that are getting
close to having an "XID problem."
-- 
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://linuxfinances.info/info/finances.html
Rules of  the Evil Overlord #189. "I  will never tell the  hero "Yes I
was the one who  did it, but you'll never be able  to prove it to that
incompetent  old fool."  Chances  are, that  incompetent  old fool  is
standing behind the curtain."  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Matthew T. O'Connor wrote:
On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).


Yeah, I understand that.  However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available.  The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.


So you are telling people to choose an autovacuum_delay so high that 
they need to run multiple autovacuums at once to keep up?  I'm probably 
being to dramatic, but it seems inconsistent.


I think we can extend the current autovacuum stats to add one more 
column that specifies "is hot" or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


How would you define what's a "hot" table?


I wasn't clear, I would have the Admin specified it, and we can store it 
as an additional column in the pg_autovacuum_settings table.  Or perhaps 
if the table is below some size threshold and autovacuum seems that it 
needs to be vacuumed every time it checks it 10 times in a row or 
something like that.	


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Alvaro Herrera

Matthew T. O'Connor wrote:
> Alvaro Herrera wrote:
> >After staring at my previous notes for autovac scheduling, it has become
> >clear that this basics of it is not really going to work as specified.
> >So here is a more realistic plan:
> 
> [Snip Detailed Description]
> 
> >How does this sound?
> 
> On first blush, I'm not sure I like this as it doesn't directly attack 
> the table starvation problem, and I think it could be a net loss of speed.
> 
> VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
> DB isn't going to make things faster, you are now going to have multiple 
> processes reading from multiple tables at the same time.  I think in 
> general this is a bad thing (unless we someday account for I/O made 
> available from multiple tablespaces).

Yeah, I understand that.  However, I think that can be remedied by using
a reasonable autovacuum_vacuum_cost_delay setting, so that each worker
uses less than the total I/O available.  The main point of the proposal
is to allow multiple workers on a DB while also allowing multiple
databases to be processed in parallel.

> I think we can extend the current autovacuum stats to add one more 
> column that specifies "is hot" or something to that effect.  Then when 
> the AV launcher sends a worker to a DB, it will first look for tables 
> marked as hot and work on them.  While working on hot tables, the 
> launcher need not send any additional workers to this database, if the 
> launcher notices that a worker is working on regular tables, it can send 
> another worker which will look for hot tables to working, if the worker 
> doesn't find any hot tables that need work, then it exits leaving the 
> original working to continue plodding along.

How would you define what's a "hot" table?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Gregory Stark


In an ideal world I think you want precisely one vacuum process running per
tablespace on the assumption that each tablespace represents a distinct
physical device.

The cases where we currently find ourselves wanting more are where small
tables are due for vacuuming more frequently than the time it takes for a
large table to receive a single full pass.

If we could have autovacuum interrupt a vacuum in mid-sweep, perform a cycle
of vacuums on smaller tables, then resume, that problem would go away. That
sounds too difficult though, but perhaps we could do something nearly as good.

One option that I've heard before is to have vacuum after a single iteration
(ie, after it fills maintenance_work_mem and does the index cleanup and the
second heap pass), remember where it was and pick up from that point next
time.

If instead autovacuum could tell vacuum exactly how long to run for (or
calculated how many pages that represented based on cost_delay) then it could
calculate when it will next need to schedule another table in the same
tablespace and try to arrange for the vacuum of the large table to be done by
then.

Once there are no smaller more frequently vacuumed small tables due to be
scheduled it would start vacuum for the large table again and it would resume
from where the first one left off.

This only works if the large tables really don't need to be vacuumed so often
that autovacuum can't keep up. Our current situation is that there is a size
at which this happens. But arranging to have only one vacuum process per
tablespace will only make that less likely to happen rather than more.

I think the changes to vacuum itself are pretty small to get it to remember
where it left off last time and start from mid-table. I'm not sure how easy it
would be to get autovacuum to juggle all these variables though.

Of course users may not create separate tablespaces for physical devices, or
they may set cost_delay so high you really do need more vacuum processes, etc.
So you probably still need a num_vacuum_daemons but the recommended setting
would be the same as the number of physical devices and autovacuum could try
to divide them equally between tablespaces which would amount to the same
thing.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] autovacuum next steps

2007-02-16 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:


[Snip Detailed Description]


How does this sound?


On first blush, I'm not sure I like this as it doesn't directly attack 
the table starvation problem, and I think it could be a net loss of speed.


VACUUM is I/O bound, as such, just sending multiple vacuum commands at a 
DB isn't going to make things faster, you are now going to have multiple 
processes reading from multiple tables at the same time.  I think in 
general this is a bad thing (unless we someday account for I/O made 
available from multiple tablespaces).  In general the only time it's a 
good idea to have multiple vacuums running at the same time is when a 
big table is starving a small hot table and causing bloat.


I think we can extend the current autovacuum stats to add one more 
column that specifies "is hot" or something to that effect.  Then when 
the AV launcher sends a worker to a DB, it will first look for tables 
marked as hot and work on them.  While working on hot tables, the 
launcher need not send any additional workers to this database, if the 
launcher notices that a worker is working on regular tables, it can send 
another worker which will look for hot tables to working, if the worker 
doesn't find any hot tables that need work, then it exits leaving the 
original working to continue plodding along.


Thoughts?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-16 Thread Warren Turkal

On Fri, Feb 16, 2007 at 05:39:24PM -0300, Alvaro Herrera wrote:
> FWIW there's already a type called tinterval that stores (start,end).  I
> don't think it's very much documented; maybe it can be extended or used
> as base for a new, more complete and robust type, indexable in a more
> natural way, etc etc.

The book I cited has a very complete description of the period data type
including details on what extensions to SQL are needed. I am very
interested in starting a robust implementation of the period datatype.

I think the datetime infrastructure will already do most of the needed
parsing and packing of the hard parts of the period datatype (namely the
date and time formats). I will investigate the tinterval to see if it
meets the needs of the PERIOD datatypes.

I agree with focusing on the PERIOD datatype. I think that is a major
part of the foundation for temporal extensions and would have to be
implemented first. Therefore, I present the following plan for getting
there.

1) Focus first on PERIOD(DATE) to keep things as simple as possible.
2) Implement a first cut on the period datatype that only handles
   storing two dates. (Maybe tinterval will get us here for free?)
3) Add information to the datatype for open or closed interval for
   beginning and ending sides of the period.

I could probably have this done in time for the freeze with some
mentoring. I could probably even start implementation of some indices
and operator function for the type. This functionality is what I expect
to have a shot of making an appearance in 8.3. It will be minimally
functional at this point.

The next project will be altering the parser to be able to construct and
operate on PERIOD types with the syntax extensions to SQL in Dr.
Snodgrass's book.

Once all of the syntax is implemented for PERIOD(DATE), the next project
will be to extend to support PERIOD(DATETIME WITH TIMEZONE). Again, I
think the datatime infrastructure will be very useful here.

wt

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Confusing message on startup after a crash while recovering

2007-02-16 Thread Tom Lane

"Florian G. Pflug" <[EMAIL PROTECTED]> writes:
> I'd suggest that the text is changed to something along the line of:
> "database system was interrupted while in recovery at ...
> If this has occurred more than once some data may be corrupted and
> you may need to restore from the last backup."

It seems the real problem is that it's not specifying *which* data is
probably corrupted.  Maybe:

HINT: If recovery fails repeatedly, it probably means that the recovery log
data is corrupted; you may have to restore from your last full backup.

Also, do we want to suggest use of pg_resetxlog in the message?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] autovacuum next steps

2007-02-16 Thread Alvaro Herrera

After staring at my previous notes for autovac scheduling, it has become
clear that this basics of it is not really going to work as specified.
So here is a more realistic plan:

First, we introduce an autovacuum_max_workers parameter, to limit the
total amount of workers that can be running at any time.  Use this
number to create extra PGPROC entries, etc, similar to the way we handle
the prepared xacts stuff.  The default should be low, say 3 o 4.

The launcher sends a worker into a database just like it does currently.
This worker determines what tables need vacuuming per the pg_autovacuum
settings and pgstat data.  If it's more than one table, it puts the
number of tables in shared memory and sends a signal to the launcher.

The launcher then starts
min(autovacuum_max_workers - currently running workers, tables to vacuum - 1)
more workers to process that database.  Maybe we could have a
max-workers parameter per-database in pg_database to use as a limit here
as well.

Each worker, including the initial one, starts vacuuming tables
according to pgstat data.  They recheck the pgstat data after finishing
each table, so that a table vacuumed by another worker is not processed
twice (maybe problematic: a table with high update rate may be vacuumed
more than once.  Maybe this is a feature not a bug).


Once autovacuum_naptime has passed, if the workers have not finished
yet, the launcher wants to vacuum another database.  At this point, the
launcher wants some of the workers processing the first database to exit
early as soon as they finish one table, so that they can help vacuuming
the other database.  It can do this by setting a flag in shmem that the
workers can check when finished with a table; if the flag is set, they
exit instead of continuing with another table.  The launcher then starts
a worker in the second database.  The launcher does this until the
number of workers is even among both databases.  This can be done till
having one worker per database; so at most autovacuum_max_workers
databases can be under automatic vacuuming at any time, one worker each.

When there are autovacuum_max_workers databases under vacuum, the
launcher doesn't have anything else to do until some worker exits on its
own.

When there is a single worker processing a database, it does not recheck
pgstat data after each table.  This is to prevent a high-update-rate
table from starving the vacuuming of other databases.


How does this sound?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-16 Thread Alvaro Herrera

Jim C. Nasby wrote:
> My suggestion would be to focus on a period data type first and
> foremost, as that's something that could be readily used by a lot of
> folks. Of particular note, it's difficult to query tables that have
> start_time and end_time fields to define a period; it's easy to screw up
> the boundary conditions, and it's also hard to make those queries
> perform well without going to extra lengths (such as defining a 'bogus'
> GiST index on something like box(point(start,start),point(end,end)). And
> it's not possible to do that in a way that avoids floating points and
> their errors.

FWIW there's already a type called tinterval that stores (start,end).  I
don't think it's very much documented; maybe it can be extended or used
as base for a new, more complete and robust type, indexable in a more
natural way, etc etc.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL

2007-02-16 Thread Jim C. Nasby

My suggestion would be to focus on a period data type first and
foremost, as that's something that could be readily used by a lot of
folks. Of particular note, it's difficult to query tables that have
start_time and end_time fields to define a period; it's easy to screw up
the boundary conditions, and it's also hard to make those queries
perform well without going to extra lengths (such as defining a 'bogus'
GiST index on something like box(point(start,start),point(end,end)). And
it's not possible to do that in a way that avoids floating points and
their errors.

On Sat, Feb 10, 2007 at 12:20:28AM -0700, Warren Turkal wrote:
> Temporal Extensions for PostgreSQL
> by: Warren Turkal
> 
> I would like to see a comprehensive solution to time varying tables (or
> temporal) in PostgreSQL. I specifically want to see suuport for valid-time and
> transacation-time and bitemporal (valid-time and transaction-time) tables. I
> will be defering the descriptions of much of the functionality to Dr. Richard 
> T.
> Snodgrass's _Developing Time-Oriented Database Applications in SQL_ at [1]. 
> The
> mangled pages 30-31 are at [2].
> 
> 
> a) Functionality
> 
> Dr. Richard T. Snodgrass has worked on defining semantics of temporal very
> completely in several writings. He was also involved in an unsuccessful effort
> to standardize temporal extensions to SQL. I believe his book does a good job
> in presenting the semantics of temporal databases and describing extensions to
> SQL that make the data much more natural with which to work.
> 
> 
> b) How current solutions fall flat
> 
> Current solutions fall flat due to the extreme complexity of implementing
> valid-time and transaction time semantics on tables by adding columns to track
> all of the data. Please see chapter 11 of [1] for a more complete description 
> of
> this complexity. Chapter 12 of [1] goes on to lay out new syntax for SQL that
> will make dealing with data of this nature much more natural.
> 
> 
> c) Examples
> 
> --create normal table
> CREATE TABLE products
>( id SERIAL PRIMARY KEY
>, description TEXT
>);
> 
> -- Add valid-time support to the table with granularity of timestamp.
>   ALTER TABLE products
> ADD VALIDTIME PERIOD(TIMESTAMP WITH TIMEZONE);
> 
> -- Insert row valid from 2006-01-01 to just before 2007-01-01
> VALIDTIME PERIOD '[2006-01-01 - 2007-01-01)'
>  INSERT INTO products
>( description
>)
>   VALUES
>( 'red ball'
>);
> 
> -- Insert row valid from 2007-01-01 to just before 2008-01-01
> -- Should be smart enough to realize the id=777 does not conflict in this time
> --  of validity.
> VALIDTIME PERIOD '[2007-01-01 - 2008-01-01)'
>  INSERT INTO products
>( id
>, description
>)
>   VALUES
>( 777
>, 'blue ball'
>);
> 
> -- Select history of products with id=777
> VALIDTIME
>SELECT *
>  FROM product
> WHERE id=777;
> 
>  id | description | valid_period
> --
>  777| red ball| [2006-01-01 - 2007-01-01)
>  777| blue ball   | [2007-01-01 - 2008-01-01)
> 
> -- Select current products with id=777
> -- The date when query was run was 2007-02-10.
> SELECT *
>   FROM products
>  WHERE id=777;
> 
>  id | description 
> --
>  777| blue ball
> 
> There are many more details in chapter 12 of [1].
> 
> 
> d) New stuff (dependencies, indices, syntax, libraries)
> 
> One of the base level additions is the PERIOD datatype. I think that
> implementing temporal support is reliant on developing such a type. The
> description of this datatype is laid out in chapter 4 of [1]. The SQL syntax 
> is
> present in chapter 12 of [1]. I see this as the first piece that needs to be
> implemented in order to take steps toward a DBMS to supports full temporal
> capabilities. I think that PERIOD can largely reuse the datatime functionality
> for parsing of literals and for comparisons. The RTREE seems to nicely
> incorporate needed indexing of the PERIOD type. The syntax of the parser will
> have to be extended to handle the PERIOD literals and constructor. I believe 
> any
> additional libraries will be required.
> 
> There are also extensions to the syntax of table creation, table altering,
> querying, inserting, and updating on temporal tables. These are all discussed 
> in
> some detail in chapter 12 of [1]. I don't think that any of these changes will
> require new libraries.
> 
> The semantics of temporal tables and querying them could have a dramatic 
> affect
> on how things like primary keys and unique constraints work. I would like to 
> get
> some comments about this from the community.
> 
> 
> e) See Also
> 
> Addtional resources can be found at Dr. Richard T. Snodgrass's website at [3],
> including SQL valid-time table support spec at [4] and SQL transaction-time
> table s

[HACKERS] Confusing message on startup after a crash while recovering

2007-02-16 Thread Florian G. Pflug


Hi

When postgres crashes during recovery, and is then restarted, it
says:
"database system was interrupted while in recovery at ...
This probably means that some data is corrupted and
you will have to use the last backup for recovery."

When I first read that message, I assumed that there are cases were
postgres can't recover from a crash that happened during recovery.
I guessed that some operations done during wal restore are not
idempotent, and lead to corrupt data if performed twice.

Only after actually reading the sourcecode of xlog.c, and seeing that
the a similar (but better worded) warning is output after a crash during
archive log replay, I realized that this warning probably just means
that corrupt data could be the _cause_ for the crash during recovery, not
the _caused_by_ a crash during recovery.

I'd suggest that the text is changed to something along the line of:
"database system was interrupted while in recovery at ...
If this has occurred more than once some data may be corrupted and
you may need to restore from the last backup."

This would also match the message for "interrupted while doign archive
log replay" more closely.

greetings, Florian Pflug

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] "anyelement2" pseudotype

2007-02-16 Thread Tom Lane

Tom Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> So it seems neither can_coerce_type() nor find_coercion_pathway() are
>> really particularly well thought out in terms of what they test or don't
>> test.  I'm not very sure what a good refactoring would look like,
>> but I am sure that I don't want all their call sites having to
>> individually account for ANYfoo types.  Any thoughts?

> To begin with I'll need to do a survey of the call sites 
> to see what they really need, since perhaps it isn't what the coerce 
> functions are currently offering. :)

I realized that I can probably fix ATAddForeignKeyConstraint to do the
right thing by having it pass the two actual column types to
can_coerce_type, thus allowing check_generic_type_consistency to kick
in and detect the problem.  I haven't got round to trying that (up to my
rear in planner bugs ATM :-() but I think the immediate problem can be
dealt with without refactoring.  Still, if you have any ideas for making
this code cleaner, I'm all ears.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Not here:

> Fixed.

Looks good, thanks.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Peter Eisentraut

Tom Lane wrote:
> Not here:

Fixed.

> And this is in a build WITHOUT xml enabled --- why is this code being
> compiled at all?

The enablement pertains to libxml, which this code doesn't use.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] "anyelement2" pseudotype

2007-02-16 Thread Tom Dunstan


Tom Lane wrote:

So it seems neither can_coerce_type() nor find_coercion_pathway() are
really particularly well thought out in terms of what they test or don't
test.  I'm not very sure what a good refactoring would look like,
but I am sure that I don't want all their call sites having to
individually account for ANYfoo types.  Any thoughts?


Yeah, I remember thinking at the time that some of it was a bit 
backwards, but it's been almost 6 months since I did the original enum 
patch, so I'll need to refresh my memory. I'll have a look over the 
weekend and see if I can come up with something that'll work for these 
various cases. To begin with I'll need to do a survey of the call sites 
to see what they really need, since perhaps it isn't what the coerce 
functions are currently offering. :) I completely agree that anything 
requiring call sites to understand specifics about ANY* types is a bad 
idea, the most that we would want would be a generic IsGeneric(typoid) 
macro, but it would be nice to hide that inside a coerce function as 
well. We'll see.


Cheers

Tom

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Pavan Deolasee


On 2/16/07, Zeugswetter Andreas ADI SD <[EMAIL PROTECTED]> wrote:



Oh sorry. Thanks for the clarification. Imho HEAP_UPDATE_ROOT should be
renamed for this meaning then (or what does ROOT mean here ?).
Maybe HEAP_UPDATE_CHAIN ?



Yes, you are right. There is some disconnect between what Simon had
originally posted and the patch I sent out. Hopefully as we discuss HOT
more here, everything will converge.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] Mail getting through? Short varlena headers

2007-02-16 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> Gregory Stark <[EMAIL PROTECTED]> writes:
>> In it I said that removing the 2-byte cases had no advantages but actually
>> since then I've thought of one. It makes the toaster code simpler since it 
>> can
>> just set a bit in the four-byte header just as it does now. It doesn't have 
>> to
>> worry about converting to a 2-byte header.
>
> Run that by me again?  A toast pointer datum ought to have a 1-byte
> header, since its only twenty-something bytes long.

I was referring to compressed inline data.

By not having inline compressed data be 2-byte headers it eliminates having to
check a lot of corner cases and reduces the changes in tuptoaster.c since it
means pg_lzcompress can return a normal 4-byte header and nobody has to
convert it to a 2-byte header.

So I am doing that for now. I suspect we'll never get around to reintroducing
2-byte headers, but we could if we wanted to. It would be a small change
everywhere else but an annoying bunch of fiddly changes in tuptoaster.c.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 16. Februar 2007 16:09 schrieb Tom Lane:
>> This "fix" doesn't fix anything.

> Sure, it fixes the compilation failures.

Not here:

gcc -O1 -Wall -Wmissing-prototypes -Wpointer-arith -Winline 
-fno-strict-aliasing -g -I../../../../src/include -D_XOPEN_SOURCE_EXTENDED   -c 
-o xml.o xml.c
xml.c: In function `map_sql_type_to_xmlschema_type':
xml.c:2192: `INT64_MAX' undeclared (first use in this function)
xml.c:2192: (Each undeclared identifier is reported only once
xml.c:2192: for each function it appears in.)
xml.c:2192: `INT64_MIN' undeclared (first use in this function)
make[4]: *** [xml.o] Error 1

And this is in a build WITHOUT xml enabled --- why is this code being
compiled at all?  Kindly fix.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Mail getting through? Short varlena headers

2007-02-16 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> In it I said that removing the 2-byte cases had no advantages but actually
> since then I've thought of one. It makes the toaster code simpler since it can
> just set a bit in the four-byte header just as it does now. It doesn't have to
> worry about converting to a 2-byte header.

Run that by me again?  A toast pointer datum ought to have a 1-byte
header, since its only twenty-something bytes long.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Zeugswetter Andreas ADI SD


> Just to avoid any confusion with the patch I sent out this 
> week, we are setting HEAP_UPDATE_ROOT on all tuples which are 
> HOT-updated.
> 
> We set HEAP_ONLY_TUPLE for all tuples which does not have 
> index reference. So may be combination of (HEAP_UPDATE_ROOT & 
> ~HEAP_ONLY_TUPLE) can be used to identify index referred 
> tuple in a HOT-update chain.

Oh sorry. Thanks for the clarification. Imho HEAP_UPDATE_ROOT should be
renamed for this meaning then (or what does ROOT mean here ?).
Maybe HEAP_UPDATE_CHAIN ?

Andreas

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Intermittent buildfarm failures due to timestamptz test

2007-02-16 Thread Bruce Momjian

Seneca Cunningham wrote:
> Multiple systems are occasionally failing the timestamptz test at the 
> check stage.  From what I can tell, this is due to the newly added ISO
> week checks referring to TIMESTAMP_TBL used by the timestamp test 
> instead of TIMESTAMPTZ_TBL.  Both checks run at the same segment, so
> the test fails if the timing between the timestamp and timestamptz 
> checks doesn't match.
> 
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jackal&dt=2007-02-16%2012:21:01

Alvaro has committed a fix for this.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Pavan Deolasee

On 2/16/07, Zeugswetter Andreas ADI SD <[EMAIL PROTECTED]> wrote:

> > As described, you've made
> > that problem worse because you're trying to say we don't know which
of
> > the chain entries is pointed at.
>
> There should be a flag, say HOT_CHAIN_ENTRY for the tuple the

it's called HEAP_UPDATE_ROOT

Just to avoid any confusion with the patch I sent out this week, we are
setting HEAP_UPDATE_ROOT on all tuples which are HOT-updated.

We set HEAP_ONLY_TUPLE for all tuples which does not have index
reference. So may be combination of (HEAP_UPDATE_ROOT & ~HEAP_ONLY_TUPLE)
can be used to identify index referred tuple in a HOT-update chain.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Zeugswetter Andreas ADI SD


> > As described, you've made
> > that problem worse because you're trying to say we don't know which
of 
> > the chain entries is pointed at.
> 
> There should be a flag, say HOT_CHAIN_ENTRY for the tuple the

it's called HEAP_UPDATE_ROOT

> index(es) point at. And this should be the preferred CTID for 
> inserting new versions once the old one is dead.

This is not possible, see my reply to Bruce (maybe unless the whole hot 
chain is dead).
(because that would need a back pointer, so readers arriving at the root
find the visible tuple) 

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Pavan Deolasee

On 2/16/07, Hannu Krosing <[EMAIL PROTECTED]> wrote:

Ühel kenal päeval, K, 2007-02-14 kell 10:41, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > OTOH, for same page HOT tuples, we have the command and trx ids stored
> > twice first as cmax,xmax of the old tuple and as cmin,xmin of the
> > updated tuple. One of these could probably be used for in-page HOT
tuple
> > pointer.
>
> This proposal seems awfully fragile, because the existing
> tuple-chain-following logic *depends for correctness* on comparing each
> tuple's xmin to prior xmax.

What kinds of correctnes guarantees does this give for same-page tuples?

I agree with Tom that xmin/xmax check does help to guarantee correctness.
I myself have used it often during HOT development to find/fix bugs. But
ISTM that we don't need atleast for in-page tuple chain, if we are
careful. So if removing this buys us something important, I am all for it.

The comparing of each tuple's xmin to prior xmax should stay for

inter-page ctid links.

Agree.

Mostly you can think of the same-page HOT chain as one extended tuple

when looking at it from outside of that page.

> I don't think you can just wave your hands and say we don't need that
cross-check.

> Furthermore it seems to me you
> haven't fixed the problem, which is that you can't remove the chain
> member that is being pointed at by off-page links (either index entries
> or a previous generation of the same tuple).

You can't remove any tuples before they are invisible for all
transactions (i.e. dead). And being dead implies that all previous
versions are dead as well. So if I can remove a tuple, I can also remove
all its previous versions as well. Or are you trying to say that VACUUM
follows ctid links of dead tuples for some purpose ?

The only exception to this would be the case of aborted updates. In that
case a tuple is dead, but the one pointing to it is still live. But I don't
see
any reason somebody would want to follow a chain past a live tuple. Not
sure about the VACUUM FULL code path though. Thats the only
place other than EvalPlanQual where we follow ctid chain.

The problem I am trying to fix is reusing in-page space without need to

touch indexes.

Can we do some kind of indirection from the root line pointer ? Haven't
completely thought through yet, but the basic idea is to release the actual
space consumed by the root tuple once it becomes dead, but store the
offnum of the new root in the line pointer of the original root tuple. We
may need to flag the line pointer for that, but if I am not wrong, LP_DELETE
is not used for heap tuples.

We would waste  4 bytes of line pointer until the tuple is COLD updated and
the entire chain and the associated index entry is removed.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Peter Eisentraut

Am Freitag, 16. Februar 2007 16:09 schrieb Tom Lane:
> This "fix" doesn't fix anything.

Sure, it fixes the compilation failures.

> We have always in the past managed to 
> avoid assuming that int64 actually is available; I don't intend to give
> the xml code a free pass to break that,

I don't intend that either, but a proper fix will be more elaborate.  I will 
work on that.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Mail getting through? Short varlena headers

2007-02-16 Thread Gregory Stark


I sent two emails last night, one to -patches and one to -hackers. Neither
seems to have come through. Has anyone else seen them? I sent this email once
already too and it hasn't come through. I'm resending it from a different
account now.

On -patches I posted an updated patch that was functionally a noop but changed
the macro api to SETVARSIZE(). It also replaced the VARATT_SIZE and
VARATT_DATA macros with VARSIZE and VARDATA, changed inet to call detoast, and
changed arrays, inet, geometric data types and a handful of others to use
SET_VARSIZE instead of accessing a struct member directly.

On -hackers I reposted Tom's email where he proposed two sets of bitpatterns
with tradeoffs and also included an additional one that was the second of his
with the 2-byte cases removed.

In it I said that removing the 2-byte cases had no advantages but actually
since then I've thought of one. It makes the toaster code simpler since it can
just set a bit in the four-byte header just as it does now. It doesn't have to
worry about converting to a 2-byte header.

So I'm thinking of doing it for now at least. I still think paying 2 bytes on
virtually every datum is silly even if mathematically it's only 2% space
savings that's still a 2% performance penalty on sequential scans and it gains
us nothing except a few lines of code saved in tuptoaster.c.

Comments?

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 16. Februar 2007 14:59 schrieb Alvaro Herrera:
>> UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MAX
>> UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MIN
>> 
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthog&dt=2007-02-1
>> 6%2009:06:01

> It needs a rebuild after the fix.

This "fix" doesn't fix anything.  We have always in the past managed to
avoid assuming that int64 actually is available; I don't intend to give
the xml code a free pass to break that, especially for such an utterly
marginal purpose as this code has.  I'm also wondering why xml.c is the
only place anywhere in the code that uses .

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Intermittent buildfarm failures due to timestamptz test

2007-02-16 Thread Seneca Cunningham

Multiple systems are occasionally failing the timestamptz test at the 
check stage.  From what I can tell, this is due to the newly added ISO
week checks referring to TIMESTAMP_TBL used by the timestamp test 
instead of TIMESTAMPTZ_TBL.  Both checks run at the same segment, so
the test fails if the timing between the timestamp and timestamptz 
checks doesn't match.

http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=jackal&dt=2007-02-16%2012:21:01

-- 
Seneca
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] buildfarm failure in XML code

2007-02-16 Thread Peter Eisentraut

Am Freitag, 16. Februar 2007 14:59 schrieb Alvaro Herrera:
> UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MAX
> UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MIN
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthog&dt=2007-02-1
>6%2009:06:01

It needs a rebuild after the fix.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-16 Thread Hannu Krosing

Ühel kenal päeval, K, 2007-02-14 kell 10:41, kirjutas Tom Lane:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > OTOH, for same page HOT tuples, we have the command and trx ids stored
> > twice first as cmax,xmax of the old tuple and as cmin,xmin of the
> > updated tuple. One of these could probably be used for in-page HOT tuple
> > pointer.
> 
> This proposal seems awfully fragile, because the existing
> tuple-chain-following logic *depends for correctness* on comparing each
> tuple's xmin to prior xmax.  

What kinds of correctnes guarantees does this give for same-page tuples?

The comparing of each tuple's xmin to prior xmax should stay for
inter-page ctid links.

Mostly you can think of the same-page HOT chain as one extended tuple
when looking at it from outside of that page.

> I don't think you can just wave your hands and say we don't need that 
> cross-check.  

> Furthermore it seems to me you
> haven't fixed the problem, which is that you can't remove the chain
> member that is being pointed at by off-page links (either index entries
> or a previous generation of the same tuple).  

You can't remove any tuples before they are invisible for all
transactions (i.e. dead). And being dead implies that all previous
versions are dead as well. So if I can remove a tuple, I can also remove
all its previous versions as well. Or are you trying to say that VACUUM
follows ctid links of dead tuples for some purpose ?

The problem I am trying to fix is reusing in-page space without need to
touch indexes.

> As described, you've made
> that problem worse because you're trying to say we don't know which of
> the chain entries is pointed at.

There should be a flag, say HOT_CHAIN_ENTRY for the tuple the index(es)
point at. And this should be the preferred CTID for inserting new
versions once the old one is dead.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] pg_restore fails with a custom backup file

2007-02-16 Thread Magnus Hagander

On Fri, Feb 16, 2007 at 02:09:41PM +0900, Yoshiyuki Asaba wrote:

> > > Does not compile on my MinGW - errors in the system headers (unistd.h,
> > > io.h) due to changing the argument format for chsize(). The change of
> > > off_t propagated into parts of the system headers, thus chaos was
> > > ensured.
> > > 
> > > I still think we need to use a pgoff_t. Will look at combining these two
> > > approaches.
> > 
> > Here's a patch that tries this.
> > *needs more testing*. But built with this patch, I can dump and
> > restore a table at the end of a 10gb database without errors.
> 
> I tried the attached patch. But I got the following error.
> 
> pg_backup_archiver.o(.text+0x1fa4): In function `allocAH':
> C:/msys/1.0/home/y-asaba/postgresql-8.2.3-patch/src/bin/pg_dump/pg_backup_archiver.c:1580:
>  undefined reference to `fseeko64'
> ...
> make[3]: *** [pg_dump] Error 1
> 
>   $ uname -sr
>   MINGW32_NT-5.1 1.0.10(0.46/3/2)
> 
> Is MINGW version too old?

I think so. It seems this was added in version 1.24 of stdio.h in mingw
(http://cygwin.com/cgi-bin/cvsweb.cgi/src/winsup/mingw/include/stdio.h?cvsroot=src).
Could you try upgrading mingw and see if that helps? Or possibly
instlaling side-by-side a different version (if they even allow that)?

//Magnus

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Mail getting through?

2007-02-16 Thread Alvaro Herrera

Gregory Stark wrote:
> 
> I sent two emails last night, one to -patches and one to -hackers. Neither
> seems to have come through. Has anyone else seen them?

Not yet, but I got bounces from the server saying

451 - Server configuration problem

or something to that effect.  Since my emails were eventually delivered,
I'd expect yours to be too.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [COMMITTERS] pgsql: Functions for mapping table data and table schemas to XML (a.k.a.

2007-02-16 Thread Peter Eisentraut

Am Freitag, 16. Februar 2007 08:46 schrieb Peter Eisentraut:
> Log Message:
> ---
> Functions for mapping table data and table schemas to XML (a.k.a. XML
> export)

Breaks various platforms.  I'm on it.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] HOT WIP Patch - version 1

2007-02-16 Thread Hannu Krosing

Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
Linnakangas:

> We already log tuple removals by normal vacuums. We can't use that wal 
> entry as it is: if a dead tuple is in the middle of an update chain, it 
> needs to be unlinked from the chain. But I don't see any particular 
> problem with that, it just needs to be wal logged like every other data 
> changing operation.
> 
> Do we actually ever want to remove dead tuples from the middle of the 
> chain? If a tuple in the middle of the chain is dead, surely every tuple 
> before it in the chain is dead as well, and we want to remove them as 
> well. I'm thinking, removing tuples from the middle of the chain can be 
> problematic, because we'd need to fiddle with the xmin/xmax of the other 
> tuples to make them match. Or change the tuple-following logic to not do 
> the xmin=xmax check, but it's a nice robustness feature.

What kind of robustness does it provide ? In other words - what failure
scenario does this guard against ?

I can't see the case where the xmin=xmax check can not succeed, at least
not for same page tuples.

-- 

Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[HACKERS] Mail getting through?

2007-02-16 Thread Gregory Stark


I sent two emails last night, one to -patches and one to -hackers. Neither
seems to have come through. Has anyone else seen them?

On -patches I posted an updated patch that was functionally a noop but changed
the macro api to SETVARSIZE(). It also replaced the VARATT_SIZE and
VARATT_DATA macros with VARSIZE and VARDATA, changed inet to call detoast, and
changed arrays, inet, geometric data types and a handful of others to use
SET_VARSIZE instead of accessing a struct member directly.

On -hackers I reposted Tom's email where he proposed two sets of bitpatterns
with tradeoffs and also included an additional one that was the second of his
with the 2-byte cases removed.

In it I said that removing the 2-byte cases had no advantages but actually
since then I've thought of one. It makes the toaster code simpler since it can
just set a bit in the four-byte header just as it does now. It doesn't have to
worry about converting to a 2-byte header.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] buildfarm failure in XML code

2007-02-16 Thread Alvaro Herrera

UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MAX
UX:acomp: ERROR: "xml.c", line 2188: undefined symbol: INT64_MIN

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=warthog&dt=2007-02-16%2009:06:01


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Chatter on DROP SOMETHING IF EXISTS

2007-02-16 Thread Tom Lane

"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> On Thu, Feb 08, 2007 at 01:54:13PM -0500, Tom Lane wrote:
>> I would be satisfied if the returned command tag were something else,
>> maybe "NO OPERATION".

> "TABLE blah DID NOT EXIST" might be less confusing...

You're confusing a command tag with a notice.  In the first place,
we shouldn't assume that applications are ready to deal with
indefinitely long command tags (the backend itself doesn't think they
can be longer than 64 bytes); in the second place, they should be
constant strings for the most part so that simple strcmp()s suffice
to see what happened.  Command tags are meant for programs to deal
with, more than humans.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Fixing insecure security definer functions

2007-02-16 Thread Merlin Moncure

On 2/15/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> yikes!

> If you guys go through with forcing functions to attach to objects
> when they are created, it will break almost every project I've ever
> worked on :(.  The schema/function combo fits into all kinds of de
> facto partitioning strategies and organization methods.

If you read a bit further, I did suggest providing an option to retain
the current behavior.  I don't think it should be the default though.

Yeah, I saw that, but the issue is really deeper, functions that
create functions, etc. changing the default behavior affects how
functions work in a really fundamental way...all pl/pgsql code that
can float over schemas would have to be checked.  In the worst case,
this could mean converting large libraries to dynamic sql or creating
thousands of additional functions...ugh.

Maybe there could be a GUC setting(function default function schema
path=current path/path null)?  It would seem only appropriate to have
security definer raise a warning/error for path null though.  Then we
could debate about how that should be set by default but nobody really
loses that argument.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

48 matches

Mail list logo