Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Teodor Sigaev

I won't have access to the original testcase and server for a few days
but I just redid some testing on a slower personal box of mine with a
smaller(but similiar) testset and on that box I could not reproduce that
issue.
So the problem is either caused by the size of the table or somehow by
the data itself :-(


We tested gin with 5 millions records (but not a wikipedia's text, but with blog 
records which is usually mush shorter) and index creation time was near 8 hours 
on Dual Xeon 3.2, but we didn't try GiST. In other hand, creating GiST index on 
0.5 millions emails took about 12 hours.


That's why 7 minutes is very suspicious result.

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote:
> We tested gin with 5 millions records (but not a wikipedia's text, but with 
> blog records which is usually mush shorter) and index creation time was 
> near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand, 
> creating GiST index on 0.5 millions emails took about 12 hours.
> 
> That's why 7 minutes is very suspicious result.

Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Oleg Bartunov

On Wed, 28 Jun 2006, Martijn van Oosterhout wrote:


On Wed, Jun 28, 2006 at 01:09:23PM +0400, Teodor Sigaev wrote:

We tested gin with 5 millions records (but not a wikipedia's text, but with
blog records which is usually mush shorter) and index creation time was
near 8 hours on Dual Xeon 3.2, but we didn't try GiST. In other hand,
creating GiST index on 0.5 millions emails took about 12 hours.

That's why 7 minutes is very suspicious result.


Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.


it's on 8.1.4 and speedup is about 10 times.



Have a nice day,



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] GIN index creation extremely slow ?

2006-06-28 Thread Teodor Sigaev

Tom did commit a patch a while ago which made a huge difference in
index creation time for tsearch by changing one routine. I don't know
if it got backpatched, so it might be worth checking people are working
on the same version.


I saw that patch, but I still think that 7 minutes is too small :)

--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] Help with casting and comparing.

2006-06-28 Thread Tzahi Fadida
Hi,

I need help finding out how to determine if two types are equality compatible
and compare them.

I am using the following call to check for equality between two values:
DatumGetBool(\
FunctionCall2(&(fctx->tupleSetAttEQFunctions[attID]->eq_opr_finfo)\
  , lvalue, rvalue))

The structure fctx->tupleSetAttEQFunctions[attID]->eq_opr_finfo currently 
holds a reference to an equality function that is proper for the type of 
lvalue and rvalue.

Currently i only allow two values only of the same type but i wish to allow
to compare values like "20.2"=?20.2 or 20=?20.0 etc...

The first step is to find out if two attributes are equality and casting 
compatible, i.e., if one type can be cast to the other type so they can be 
compared. Or, just equality compatible and the casting is done somehow behind
the scene.
Finally, i have to use a function to compare the two values.

-- 
Regards,
Tzahi.
--
Tzahi Fadida
Blog: http://tzahi.blogsite.org | Home Site: http://tzahi.webhop.info
WARNING TO SPAMMERS:  see at 
http://members.lycos.co.uk/my2nis/spamwarning.html

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-28 Thread Yoshiyuki Asaba
Hi,

From: Tom Lane <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Tue, 27 Jun 2006 14:43:57 -0400

> >> What I would think might help is a patch on the libpq side (because it
> >> *does* use a nonblocking socket) to avoid sending more than 
> >> 8K per WSASend call.
> 
> > It could definitly be a good idea to have a patch there *as well*, but I
> > think they'd both be affected.
> 
> On the libpq side, sending large messages is probably rare except for
> COPY IN mode.  Has anyone noticed performance issues specifically with
> COPY IN?

I think libpq interface does not use non-blocking socket. Because
'FRONTEND' symbol is enabled.

src/include/port/win32.h
  #ifndef FRONTEND
  #define socket(af, type, protocol) pgwin32_socket(af, type, protocol)
  #define accept(s, addr, addrlen) pgwin32_accept(s, addr, addrlen)
  #define connect(s, name, namelen) pgwin32_connect(s, name, namelen)
  #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
  #define recv(s, buf, len, flags) pgwin32_recv(s, buf, len, flags)
  #define send(s, buf, len, flags) pgwin32_send(s, buf, len, flags)

I think this is only server-side problem. Is this right?

Regards,

--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-28 Thread Tom Lane
Yoshiyuki Asaba <[EMAIL PROTECTED]> writes:
> I think libpq interface does not use non-blocking socket.

Not unless the Windows port has also disabled pg_set_noblock ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Help with casting and comparing.

2006-06-28 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 03:25:57PM +0300, Tzahi Fadida wrote:
> Hi,
> 
> I need help finding out how to determine if two types are equality compatible
> and compare them.



> Currently i only allow two values only of the same type but i wish to allow
> to compare values like "20.2"=?20.2 or 20=?20.0 etc...
> 
> The first step is to find out if two attributes are equality and casting 
> compatible, i.e., if one type can be cast to the other type so they can be 
> compared. Or, just equality compatible and the casting is done somehow behind
> the scene.

There's two ways an equality could work. For example, there are
equality operators that take parameters of different types. That's the
easy case. Then you have binary compatable types, and then types with
actual conversion functions.

Fortunatly the backend contains functions that do all this already.
Check out parser/parse_oper.c, in particular oper() and
compatable_oper().

You may have to be prepared to handle a parsetree to do the actual
work.

/* oper() -- search for a binary operator
 * Given operator name, types of arg1 and arg2, return oper struct.
 *
 * IMPORTANT: the returned operator (if any) is only promised to be
 * coercion-compatible with the input datatypes.  Do not use this if
 * you need an exact- or binary-compatible match; see compatible_oper.
...etc...

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


[HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Phil Frost
I have an optimization I'd like to see which I think should be pretty
easy for someone familiar with the planner code to implement. My
situation is this: I have an application using veil[1]. Essentially, I
have a schema "private" and another "public". Private contains regular
tables, where private contains views on those tables, like "create view
public.foo as select * from foo where i_have_global_priv('select_foo')",
and i_have_global_priv is a stable function.

My problem is that in several situations, postgresql is planning a
sequential scan with i_have_global_priv(n) as a filter, where N is some
constant literal specified in the view definition. This leads to the
function being called hundreds of thousands of times, which makes my
query orders of magnitude slower.

In some cases, the planner already optimizes this by moving the "where
i_have_global_priv(n)" qualification out of the seq scan filter and into
the one-time filter of a result node. The relevant function in the code
seems to be pull_constant_clauses, called from query_planner in
planmain.c around line 118.

By experimentation, it seems that this optimization will not be made on
either side of an outer join. For example:

dew=# explain select * from
(select * from private.orderitem where i_have_global_priv(28)) as oi
join (
select * from private.orderitemproduct where i_have_global_priv(32)
) as oip using (objectid);
  QUERY PLAN
   
---
 Result  (cost=96.56..402.70 rows=5004 width=325)
   One-Time Filter: (i_have_global_priv(28) AND i_have_global_priv(32))
   ->  Hash Join  (cost=96.55..402.69 rows=5004 width=325)
 Hash Cond: ("outer".objectid = "inner".objectid)
 ->  Seq Scan on orderitem  (cost=0.00..165.44 rows=6044 width=306)
 ->  Hash  (cost=84.04..84.04 rows=5004 width=23)
   ->  Seq Scan on orderitemproduct  (cost=0.00..84.04 rows=5004 
width=23)

dew=# explain select * from
(select * from private.orderitem where i_have_global_priv(28)) as oi
left join (
select * from private.orderitemproduct where i_have_global_priv(32)
) as oip using (objectid);
   QUERY PLAN   
 
-
 Hash Left Join  (cost=100.72..301.94 rows=2015 width=325)
   Hash Cond: ("outer".objectid = "inner".objectid)
   ->  Seq Scan on orderitem  (cost=0.00..180.55 rows=2015 width=306)
 Filter: i_have_global_priv(28)
   ->  Hash  (cost=96.55..96.55 rows=1668 width=23)
 ->  Seq Scan on orderitemproduct  (cost=0.00..96.55 rows=1668 width=23)
   Filter: i_have_global_priv(32)


Notice that the cross join plan results in i_have_global_priv being
called just twice -- once for each privilege being checked, while the
left join plan will result in it being called once for each row.

So, is this something I can coerce someone into doing? It would be very
much appreciated here.


[1] 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Martijn van Oosterhout
On Wed, Jun 28, 2006 at 10:35:37AM -0400, Phil Frost wrote:
> I have an optimization I'd like to see which I think should be pretty
> easy for someone familiar with the planner code to implement. My
> situation is this: I have an application using veil[1]. Essentially, I
> have a schema "private" and another "public". Private contains regular
> tables, where private contains views on those tables, like "create view
> public.foo as select * from foo where i_have_global_priv('select_foo')",
> and i_have_global_priv is a stable function.
> 
> My problem is that in several situations, postgresql is planning a
> sequential scan with i_have_global_priv(n) as a filter, where N is some
> constant literal specified in the view definition. This leads to the
> function being called hundreds of thousands of times, which makes my
> query orders of magnitude slower.

Is the function marked stable or immutable?

In the examples you give the planner can't move the function around the
tree because that would change the output of the query. For inner joins
it's ok, for outer joins it's much more tricky.

I thought the planner would evaluate constant conditions early on which
I why I'm asking about the function.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Help with casting and comparing.

2006-06-28 Thread Tom Lane
Martijn van Oosterhout  writes:
> On Wed, Jun 28, 2006 at 03:25:57PM +0300, Tzahi Fadida wrote:
>> I need help finding out how to determine if two types are equality compatible
>> and compare them.

> Fortunatly the backend contains functions that do all this already.
> Check out parser/parse_oper.c, in particular oper() and
> compatable_oper().

Note that this still leaves the question of what operator to search for,
and where to look for it.  The current system doesn't really provide an
adequate way of identifying a suitable equality operator; you kind of
have to take it on faith that people won't have made "=" do unexpected
things (an assumption already violated by some builtin datatypes ...).
We've been moving gradually in the direction of relying on btree
operator classes to give us a better understanding of which operators
really act like equality, but it's far from all done.

The most recent thread about fixing this was
http://archives.postgresql.org/pgsql-hackers/2006-02/msg00960.php
Nothing much has been done since then as far as fixing foreign-key
checks, but you might want to look at the code for interpreting row
value comparisons (make_row_comparison_op in parse_expr.c).
SelectSortFunction in tuplesort.c is another example of looking for
btree info to infer the behavior of an operator.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Phil Frost
On Wed, Jun 28, 2006 at 05:11:59PM +0200, Martijn van Oosterhout wrote:
> On Wed, Jun 28, 2006 at 10:35:37AM -0400, Phil Frost wrote:
> > I have an optimization I'd like to see which I think should be pretty
> > easy for someone familiar with the planner code to implement. My
> > situation is this: I have an application using veil[1]. Essentially, I
> > have a schema "private" and another "public". Private contains regular
> > tables, where private contains views on those tables, like "create view
> > public.foo as select * from foo where i_have_global_priv('select_foo')",
> > and i_have_global_priv is a stable function.
> > 
> > My problem is that in several situations, postgresql is planning a
> > sequential scan with i_have_global_priv(n) as a filter, where N is some
> > constant literal specified in the view definition. This leads to the
> > function being called hundreds of thousands of times, which makes my
> > query orders of magnitude slower.
> 
> Is the function marked stable or immutable?
> 
> In the examples you give the planner can't move the function around the
> tree because that would change the output of the query. For inner joins
> it's ok, for outer joins it's much more tricky.
> 
> I thought the planner would evaluate constant conditions early on which
> I why I'm asking about the function.

i_have_global_priv is a stable function.

The planner in fact can move the function around without changing the
output. I can make it do so by putting "offset 0" in the subqueries:

dew=# explain select * from
(select * from private.orderitem where i_have_global_priv(28) offset 0) as 
oi
left join (
select * from private.orderitemproduct where i_have_global_priv(32) 
offset 0
) as oip using (objectid);
QUERY PLAN  
   
---
 Merge Right Join  (cost=1310.33..3603.67 rows=151221 width=187)
   Merge Cond: ("outer".objectid = "inner".objectid)
   ->  Sort  (cost=441.55..454.06 rows=5004 width=45)
 Sort Key: oip.objectid
 ->  Subquery Scan oip  (cost=0.00..134.08 rows=5004 width=45)
   ->  Limit  (cost=0.00..84.04 rows=5004 width=23)
 ->  Result  (cost=0.00..84.04 rows=5004 width=23)
   One-Time Filter: i_have_global_priv(32)
   ->  Seq Scan on orderitemproduct  (cost=0.00..84.04 
rows=5004 width=23)
   ->  Sort  (cost=868.78..883.89 rows=6044 width=146)
 Sort Key: oi.objectid
 ->  Limit  (cost=0.00..165.44 rows=6044 width=306)
   ->  Result  (cost=0.00..165.44 rows=6044 width=306)
 One-Time Filter: i_have_global_priv(28)
 ->  Seq Scan on orderitem  (cost=0.00..165.44 rows=6044 
width=306)

The transformation is from this:

   ->  Seq Scan on orderitem  (cost=0.00..180.55 rows=2015 width=306)
 Filter: i_have_global_priv(28)

to this:

   ->  Result  (cost=0.00..165.44 rows=6044 width=306)
 One-Time Filter: i_have_global_priv(28)
 ->  Seq Scan on orderitem  (cost=0.00..165.44 rows=6044 width=306)

which produce the same result. However, I'm not about to put "offset 0"
in all my view definitions, as that would prevent a number of other
extremely desirable optimizations.

Can a Result node not be an input to an outer join node? That would make
me sad :(

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Tom Lane
Phil Frost <[EMAIL PROTECTED]> writes:
> The planner in fact can move the function around without changing the
> output.

Not when it's within the nullable side of an outer join --- moving a
WHERE clause up out of that would make the difference between no row
out, and a null-extended row out, which are certainly not the same.

I'm not sure why it's not pulling up from the left side of the left join
though.  That might be a bug.  What PG version is this exactly?

Of course the real question is why is your app generating such poorly
phrased queries ;-)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] SO_SNDBUF size is small on win32?

2006-06-28 Thread Yoshiyuki Asaba
From: Tom Lane <[EMAIL PROTECTED]>
Subject: Re: [HACKERS] SO_SNDBUF size is small on win32? 
Date: Wed, 28 Jun 2006 09:54:21 -0400

> Yoshiyuki Asaba <[EMAIL PROTECTED]> writes:
> > I think libpq interface does not use non-blocking socket.
> 
> Not unless the Windows port has also disabled pg_set_noblock ...

Sorry, I misunderstood.

I tried to occur this issue on msys.

 % cat test.sh
 export PGHOST=xxx
 export PGPORT=5432
 export PGDATABASE=test
 
 dropdb $PGDATABASE
 createdb
 psql -c 'CREATE TABLE t1 (a int, b text)'
 i=0
 while [ $i -lt 50 ]; do
   psql -c "insert into t1 values ($i, repeat('x', 1))"
   i=`expr $i + 1`
 done
 pg_dump -a > dump
 time psql -f dump

 % sh test.sh

But, I did not occur this issue... Does anyone occur this issue?
--
Yoshiyuki Asaba
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Buildfarm member platypus is showing a regression failure that I'm
surprised we have not seen before:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=platypus&dt=2006-06-28%2014:05:01

Basically what this is showing is that when there is more than one
referencing table, the order in which things get done is dependent
on chance locations of system catalog entries.  That results in
cosmetic differences in which of multiple violations gets reported,
or in the order of "truncate cascades to" notices.

Given our push to have the buildfarm "all green all the time",
I don't think I want to just live with occasional failures.
Seems like the alternatives are

1. Find a way to make the processing order consistent (eg by driving it
off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

2. Install multiple expected files for the truncate test.

3. Dumb down the test cases so that they don't test multiple-cascade
situations.

Don't much care for any of these :-(.

Also, it seems possible that not-so-cosmetic problems could occur, for
instance deadlock between two backends trying to truncate the same
tables in different orders.  That suggests that answer #1 would be the
best way to fix it, but that would mean ordering the tables consistently
before we even get any locks on them, which seems hard.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Phil Frost
On Wed, Jun 28, 2006 at 11:40:52AM -0400, Tom Lane wrote:
> Phil Frost <[EMAIL PROTECTED]> writes:
> > The planner in fact can move the function around without changing the
> > output.
> 
> Not when it's within the nullable side of an outer join --- moving a
> WHERE clause up out of that would make the difference between no row
> out, and a null-extended row out, which are certainly not the same.
> 
> I'm not sure why it's not pulling up from the left side of the left join
> though.  That might be a bug.  What PG version is this exactly?
> 
> Of course the real question is why is your app generating such poorly
> phrased queries ;-)

Sure it can't pull the condition to the root result node, but it can
make an intermediate result node that is a child of the join and wraps
the sequential scan. "offset 0" makes it do this. I'd like this:

create table a(i int);
create table b(i int);
create function stable_function() returns bool language plpgsql stable as $$
begin return true; end $$;
create view c as select * from b where stable_function();
explain select * from a left join c using (i);
   QUERY PLAN
-
 Merge Right Join  (cost=220.32..338.32 rows=7629 width=4)
   Merge Cond: ("outer".i = "inner".i)
   ->  Sort  (cost=70.54..72.32 rows=713 width=4)
 Sort Key: b.i
 ->  Seq Scan on b  (cost=0.00..36.75 rows=713 width=4)
   Filter: stable_function()
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4)
 Sort Key: a.i
 ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)

to become this:

   QUERY PLAN
-
 Merge Right Join  (cost=220.32..338.32 rows=7629 width=4)
   Merge Cond: ("outer".i = "inner".i)
   ->  Sort  (cost=70.54..72.32 rows=713 width=4)
 Sort Key: b.i
 ->  Result
   One-Time Filter: stable_function()
   ->  Seq Scan on b  (cost=0.00..36.75 rows=713 width=4)
 Filter: stable_function()
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4)
 Sort Key: a.i
 ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)

That will make the same results. Maybe there is something about the
implementation that I don't understand that makes it hard, but the
concept is simple: before you do a seq scan on b, you call
stable_function(), and if it returns true, you just do the sequential
scan without calling stable_function() for each row. If it returns
false, you can not do the sequental scan at all, and return the empty
set immediately.

I wasn't aware my queries are "badly phrased". The application generates
quite nice queries like "select * from saleorder_summary", which is a view
along the lines of 'select * from "order" left join saleorder using
(objectid)'. "order" and "saleorder" are views like "select * from
private.order where i_have_global_priv(20)". The subqueries are in the
examples I gave just to make it simpler to demonstrate.

The only other way I can think of phrasing a query like that is perhaps

select *
from private.order
left join purchaseorder on (
order.objectid = purchaseorder.objectid and i_have_global_priv(31)
)

This of course would not only be hugely inconvinent, but would require
that regular users have unrestricted access to the base tables, which
totally defeats the purpose of using veil. Also, that too is not
optimized as well as it could be:

test=# explain select * from a left join b on (a.i = b.i and stable_function());
   QUERY PLAN
-
 Merge Left Join  (cost=299.56..710.97 rows=7633 width=8)
   Merge Cond: ("outer".i = "inner".i)
   Join Filter: stable_function()
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4)
 Sort Key: a.i
 ->  Seq Scan on a  (cost=0.00..31.40 rows=2140 width=4)
   ->  Sort  (cost=149.78..155.13 rows=2140 width=4)
 Sort Key: b.i
 ->  Seq Scan on b  (cost=0.00..31.40 rows=2140 width=4)

stable_function() will still be called multiple times needlessly.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Index corruption

2006-06-28 Thread Marc Munro
We have now experienced index corruption on two separate but identical
slony clusters.  In each case the slony subscriber failed after
attempting to insert a duplicate record.  In each case reindexing the
sl_log_1 table on the provider fixed the problem.

The latest occurrence was on our production cluster yesterday.  This has
only happened since we performed kernel upgrades and we are uncertain
whether this represents a kernel bug, or a postgres bug exposed by
different timings in the new kernel.

Our systems are:

Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875
Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux
kernel boot option: elevator=deadline
16 Gigs of RAM
postgresql-8.0.3-1PGDG
Bonded e1000/tg3 NICs with 8192 MTU.
Slony 1.1.0

NetApp FAS270 OnTap 7.0.3
Mounted with the NFS options
rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=600,tcp,noac
Jumbo frames 8192 MTU.

All postgres data and logs are stored on the netapp.

In the latest episode, the index corruption was coincident with a
slony-induced vacuum.  I don't know if this was the case with our test
system failures.

What can we do to help identify the cause of this?  I believe we will be
able to reproduce this on a test system if there is some useful
investigation we can perform.

__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

2006-06-28 Thread Thomas Hallgren

Martijn van Oosterhout wrote:

On Wed, Jun 28, 2006 at 01:56:47PM +0200, Thomas Hallgren wrote:
A user that is trusted with installing a C-function in the backend is 
free to scan the process memory anyway so in what way did that increase 
the security? IMHO, the only relevant security in that context is to 
have trusted people install trusted modules. I'm surprised that 
something like that made you remove significant functionality.


You're missing the point. The type output function is not generally a
priveledged function. Think bpcharout, text_out, numeric_out, etc...
These can be called by users directly and the input to those functions
cannot be trusted.

Ah, OK that makes sense. An alternative solution when the signature was changed could 
perhaps have been to pass one single argument, a structure appointing the data and its 
associated type. My idea would work if the data and its type lived together always from the 
moment its instantiated (read from disk or otherwise) and until death do them apart (or the 
data is stored on disk, in which case the tupledesc knows what it is). I guess that would 
imply a major rewrite and that my desire to have a RAW fixed length type isn't enough 
motivation to do that :-)


Instead, I would like to humbly request the inclusion of a UUID datatype (or an opaque 128 
bit datatype) in the core package. It's increasingly common and some databases (MS 
SQLServer) already have built in support for it.


Regards,
Thomas Hallgren


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Tom Lane wrote:

> 1. Find a way to make the processing order consistent (eg by driving it
> off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

Hmm, what about

1. get the complete list of tables to truncate, AccessShareLock'ed, get
their names
2. release locks
3. sort the list lexicographically (or by Oid, whatever)
4. acquire the stronger locks, in list order, taking care of not
aborting if a table is no longer there
5. truncate

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

2006-06-28 Thread Jim C. Nasby
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> Instead, I would like to humbly request the inclusion of a UUID datatype 
> (or an opaque 128 bit datatype) in the core package. It's increasingly 

ISTM that we get enough requests for this that it's probably worth
doing.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> 1. Find a way to make the processing order consistent (eg by driving it
>> off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

> Hmm, what about

> 1. get the complete list of tables to truncate, AccessShareLock'ed, get
> their names
> 2. release locks
> 3. sort the list lexicographically (or by Oid, whatever)
> 4. acquire the stronger locks, in list order, taking care of not
> aborting if a table is no longer there
> 5. truncate

Releasing locks is no good ... what if someone adds/drops FK constraints
while you've not got any lock?

One thing I was toying with was to add an index to pg_constraint on,
say, (confrelid, conrelid), and to replace the existing seqscans for FK
constraints with scans using this index.  The second-column ordering
would guarantee everybody visits the entries in the same order.  Not
sure about overall performance implications ... in a small database,
several indexscans might take more time than one seqscan.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as primary keys

2006-06-28 Thread mark
On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> Instead, I would like to humbly request the inclusion of a UUID datatype 
> (or an opaque 128 bit datatype) in the core package. It's increasingly 
> common and some databases (MS SQLServer) already have built in support for 
> it.

We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the
type has value, that is is generic, and would be widely used without
being abused. All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Jim C. Nasby
On Wed, Jun 28, 2006 at 01:13:42PM -0400, Tom Lane wrote:
> One thing I was toying with was to add an index to pg_constraint on,
> say, (confrelid, conrelid), and to replace the existing seqscans for FK
> constraints with scans using this index.  The second-column ordering
> would guarantee everybody visits the entries in the same order.  Not
> sure about overall performance implications ... in a small database,
> several indexscans might take more time than one seqscan.

In a small database, both operations are likely to be plenty fast for
TRUNCATE, though. Surely the performance impact of getting the requisite
locks would far exceed any catalog scan times, no? And if you were doing
TRUNCATE's very often, I'd expect the right pages to be in cache
anyway...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread A.M.
On Wed, June 28, 2006 1:14 pm, [EMAIL PROTECTED] wrote:
> On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
>
>> Instead, I would like to humbly request the inclusion of a UUID
>> datatype (or an opaque 128 bit datatype) in the core package. It's
>> increasingly common and some databases (MS SQLServer) already have built
>> in support for it.
>
> We have it. We're just not putting in the effort required to have it
> included in core, as it's too much effort to convince people that the type
> has value, that is is generic, and would be widely used without being
> abused. All the geometric types that I'll never use in core, with few or
> no uses, including functions to operate on these types, and no UUID
> type... Hehe... To me, that's irony... :-)

Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Index corruption

2006-06-28 Thread Tom Lane
Marc Munro <[EMAIL PROTECTED]> writes:
> We have now experienced index corruption on two separate but identical
> slony clusters.  In each case the slony subscriber failed after
> attempting to insert a duplicate record.  In each case reindexing the
> sl_log_1 table on the provider fixed the problem.

Please be more specific about what you mean by "index corruption"
... what were the exact symptoms?

> postgresql-8.0.3-1PGDG

The *first* thing you should do is update to 8.0.8 and see if the
problem is still there.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread Jim C. Nasby
On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
> On Wed, June 28, 2006 1:14 pm, [EMAIL PROTECTED] wrote:
> > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> >
> >> Instead, I would like to humbly request the inclusion of a UUID
> >> datatype (or an opaque 128 bit datatype) in the core package. It's
> >> increasingly common and some databases (MS SQLServer) already have built
> >> in support for it.
> >
> > We have it. We're just not putting in the effort required to have it
> > included in core, as it's too much effort to convince people that the type
> > has value, that is is generic, and would be widely used without being
> > abused. All the geometric types that I'll never use in core, with few or
> > no uses, including functions to operate on these types, and no UUID
> > type... Hehe... To me, that's irony... :-)
> 
> Is it on pgfoundry? From past discussions, the new criteria for getting
> something into core is to first determine if it is successful on
> pgfoundry.

If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...

Looking back through the list archives I think you'd find this comes up
at least every few months.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Andrew Dunstan

Tom Lane wrote:


Buildfarm member platypus is showing a regression failure that I'm
surprised we have not seen before:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=platypus&dt=2006-06-28%2014:05:01

Basically what this is showing is that when there is more than one
referencing table, the order in which things get done is dependent
on chance locations of system catalog entries.  That results in
cosmetic differences in which of multiple violations gets reported,
or in the order of "truncate cascades to" notices.

Given our push to have the buildfarm "all green all the time",
I don't think I want to just live with occasional failures.
Seems like the alternatives are

1. Find a way to make the processing order consistent (eg by driving it
off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

2. Install multiple expected files for the truncate test.

3. Dumb down the test cases so that they don't test multiple-cascade
situations.

Don't much care for any of these :-(.

Also, it seems possible that not-so-cosmetic problems could occur, for
instance deadlock between two backends trying to truncate the same
tables in different orders.  That suggests that answer #1 would be the
best way to fix it, but that would mean ordering the tables consistently
before we even get any locks on them, which seems hard.

Thoughts?


 



If this were a significant risk wouldn't we have seen many such failures 
before now? I guess we don't expect to see concurrent truncates being 
run. Probably worth protecting against, but also probably something of a 
corner case.


In the absence of a fix I'd go for the extra regression result file.

cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark
On Wed, Jun 28, 2006 at 12:38:50PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
> > On Wed, June 28, 2006 1:14 pm, [EMAIL PROTECTED] wrote:
> > > On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:
> > >> Instead, I would like to humbly request the inclusion of a UUID
> > >> datatype (or an opaque 128 bit datatype) in the core package. It's
> > >> increasingly common and some databases (MS SQLServer) already have built
> > >> in support for it.
> > > We have it. We're just not putting in the effort required to have it
> > > included in core, as it's too much effort to convince people that the type
> > > has value, that is is generic, and would be widely used without being
> > > abused. All the geometric types that I'll never use in core, with few or
> > > no uses, including functions to operate on these types, and no UUID
> > > type... Hehe... To me, that's irony... :-)
> > Is it on pgfoundry? From past discussions, the new criteria for getting
> > something into core is to first determine if it is successful on
> > pgfoundry.
> If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
> downloaded it. But I find that exceptionally hard to believe...
> 
> Looking back through the list archives I think you'd find this comes up
> at least every few months.

I've downloaded the version off pgfoundry.org. It is broken. It leaks
memory, and if memory is correct it can cause the client to core dump.

Two of us worked on a re-write based off a different UUID system library,
and I've been happily using it in production for a year or so. I don't
believe either of us have bothered to market it. Each time it comes up,
a number of people on this list shut it down, and it doesn't seem worth
the effort to convince them otherwise. They can have their ivory tower,
and I can have my plugin.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread Andrew Dunstan

Jim C. Nasby wrote:


On Wed, Jun 28, 2006 at 01:18:39PM -0400, A.M. wrote:
 


On Wed, June 28, 2006 1:14 pm, [EMAIL PROTECTED] wrote:
   


On Wed, Jun 28, 2006 at 06:39:16PM +0200, Thomas Hallgren wrote:

 


Instead, I would like to humbly request the inclusion of a UUID
datatype (or an opaque 128 bit datatype) in the core package. It's
increasingly common and some databases (MS SQLServer) already have built
in support for it.
   


We have it. We're just not putting in the effort required to have it
included in core, as it's too much effort to convince people that the type
has value, that is is generic, and would be widely used without being
abused. All the geometric types that I'll never use in core, with few or
no uses, including functions to operate on these types, and no UUID
type... Hehe... To me, that's irony... :-)
 


Is it on pgfoundry? From past discussions, the new criteria for getting
something into core is to first determine if it is successful on
pgfoundry.
   



If http://lnk.nu/pgfoundry.org/a86.php is accurate, then no one has ever
downloaded it. But I find that exceptionally hard to believe...

Looking back through the list archives I think you'd find this comes up
at least every few months.
 



That's because there is nothing there to download. See instead: 
http://gborg.postgresql.org/project/pguuid/projdisplay.php


Personally I don't buy the misuse objection - we already have plenty of 
things that can be misused. As long as there is a reasonable valid use 
and we can make it portable enough, I think there is a good case for 
including it.


cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> 1. Find a way to make the processing order consistent (eg by driving it
> >> off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.
> 
> > Hmm, what about
> 
> > 1. get the complete list of tables to truncate, AccessShareLock'ed, get
> > their names
> > 2. release locks
> > 3. sort the list lexicographically (or by Oid, whatever)
> > 4. acquire the stronger locks, in list order, taking care of not
> > aborting if a table is no longer there
> > 5. truncate
> 
> Releasing locks is no good ... what if someone adds/drops FK constraints
> while you've not got any lock?

Recheck after acquiring the stronger locks, unlock and drop from list.

> One thing I was toying with was to add an index to pg_constraint on,
> say, (confrelid, conrelid), and to replace the existing seqscans for FK
> constraints with scans using this index.  The second-column ordering
> would guarantee everybody visits the entries in the same order.  Not
> sure about overall performance implications ... in a small database,
> several indexscans might take more time than one seqscan.

I think there is more than one place that would benefit from such an
index.  Probably turn into a syscache as well?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote:

> I've downloaded the version off pgfoundry.org. It is broken. It leaks
> memory, and if memory is correct it can cause the client to core dump.

Also it couldn't possibly be included in core, since it's based on a
GPL'ed UUID library.  If you have a more appropiately licensed package,
it could be considered for inclusion.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> If this were a significant risk wouldn't we have seen many such failures 
> before now?

Hard to tell.  It's possibly architecture-dependent, for one thing
(MAXALIGN will affect space availability).  Since this happened in a
parallel regression run, it could also be a matter of timing relative to
the concurrent tests.  I've often thought that we are not getting as much
mileage out of the parallel-testing facility as we could, because it's
really not exercising variations in timing all that much.  It'd be
interesting to throw in a small random delay at the start of each member
of a concurrent set of tests.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Alvaro Herrera wrote:
> Tom Lane wrote:
> > Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > > Tom Lane wrote:
> > >> 1. Find a way to make the processing order consistent (eg by driving it
> > >> off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.
> > 
> > > Hmm, what about
> > 
> > > 1. get the complete list of tables to truncate, AccessShareLock'ed, get
> > > their names
> > > 2. release locks
> > > 3. sort the list lexicographically (or by Oid, whatever)
> > > 4. acquire the stronger locks, in list order, taking care of not
> > > aborting if a table is no longer there
> > > 5. truncate
> > 
> > Releasing locks is no good ... what if someone adds/drops FK constraints
> > while you've not got any lock?
> 
> Recheck after acquiring the stronger locks, unlock and drop from list.

Oops, this doesn't cover the "add FK constraints" case, only drop.  I
think it would work to keep the locks on the tables initially mentioned
in the command (i.e. those not followed by CASCADE).  Hmm, but it fails
if it cascades more than once, so scratch that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> One thing I was toying with was to add an index to pg_constraint on,
>> say, (confrelid, conrelid), and to replace the existing seqscans for FK
>> constraints with scans using this index.

> I think there is more than one place that would benefit from such an
> index.  Probably turn into a syscache as well?

Yeah, that was in the back of my mind too, but I haven't looked through
the code to see.  A syscache wouldn't work because it's not a unique key.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Single Index Tuple Chain (SITC) method

2006-06-28 Thread Bruce Momjian
bruce wrote:
> Greg Stark wrote:
> > 
> > Bruce Momjian <[EMAIL PROTECTED]> writes:
> > 
> > > PFC wrote:
> > > > 
> > > > > My idea is that if an UPDATE places the new tuple on the same page as
> > > > > the old tuple, it will not create new index entries for any indexes
> > > > > where the key doesn't change.
> > > > 
> > > > Basically the idea behind preventing index bloat by updates is 
> > > > to have  
> > > > one index tuple point to several actual tuples having the same value.
> > > > 
> > > 
> > > The idea is not to avoid index bloat, but to allow heap reuse, and having
> > > one index entry for multiple versions of an UPDATEd row is merely an
> > > implementation detail.
> > 
> > It sort of sounds like you're describing a whole new index type that stores
> > only the page, not the precise record of any tuple it indexes. If your table
> 
> Background, indexes point to page item pointers, not to actual offsets
> in the page.  This is how vacuum can move around tuples without modifying the
> indexes.  The index points to a page item pointer that is a chain of
> tuples with the same indexed columns.

Here is an overview of the SITC method:

http://momjian.us/cgi-bin/pgsitc

Anyone want to start coding?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread Jim C. Nasby
On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
> Personally I don't buy the misuse objection - we already have plenty of 
> things that can be misused. As long as there is a reasonable valid use 
> and we can make it portable enough, I think there is a good case for 
> including it.

Well, since Mark has one, how about we consider adding it in?

If nothing else, can you please put your stuff on pgFoundry so others
can find it, Mark?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Single Index Tuple Chain (SITC) method

2006-06-28 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Here is an overview of the SITC method:
>   http://momjian.us/cgi-bin/pgsitc

A pretty fundamental problem is that the method assumes it's OK to
change the CTID of a live tuple (by swapping its item pointer with some
expired version).  It is not --- this will break:
* active UPDATEs and DELETEs that may have fetched the CTID
  but not yet completed processing to decide whether to change
  the tuple;
* pending AFTER ROW triggers, such as foreign key checks;
* ODBC as well as other applications that assume CTID is a
  usable unique row identifier within transactions.
VACUUM FULL can get away with moving tuples to new CTIDs because it takes
AccessExclusiveLock, so there can be no open transactions with knowledge
of current CTIDs in the table.  This is not OK for something that's
supposed to happen in plain UPDATEs, though.

Another problem is you can't recycle tuples, nor item ids, without
taking a VACUUM-style lock on the page (LockBufferForCleanup).  If
anyone else is holding a pin on the page they risk getting totally
confused --- for instance, a seqscan will either miss a tuple or scan it
twice depending on which direction you're juggling item ids around it.
The concurrency loss involved in LockBufferForCleanup is OK for
background-maintenance operations like VACUUM, but I seriously doubt
anyone will find it acceptable for UPDATE.  It could easily create
application-level deadlocks, too.  (VACUUM is safe against that because
it only holds one lock.)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Buildfarm member platypus is showing a regression failure that I'm
>> surprised we have not seen before:
>> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=platypus&dt=2006-06-28%2014:05:01

> If this were a significant risk wouldn't we have seen many such failures 
> before now?

mongoose just failed with almost the exact same symptoms:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoose&dt=2006-06-28%2021:30:02

It's not quite the same diffs, which is unsurprising given the presumed
mechanism behind the failure, but that probably shoots down the "add
another expected file" response.

I imagine some recent change has made the probability of this behavior
much higher than it was before; perhaps there's more pg_constraint
update traffic in concurrent tests?  Anyway, it's now up to "must fix"
in my estimation.  I'll look into the new-pg_constraint-index idea.

I think someone should also take a hard look at that idea of introducing
more timing variability into the parallel tests.  Any volunteers?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
I wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
>> Tom Lane wrote:
>>> One thing I was toying with was to add an index to pg_constraint on,
>>> say, (confrelid, conrelid), and to replace the existing seqscans for FK
>>> constraints with scans using this index.

>> I think there is more than one place that would benefit from such an
>> index.  Probably turn into a syscache as well?

> Yeah, that was in the back of my mind too, but I haven't looked through
> the code to see.  A syscache wouldn't work because it's not a unique key.

Having looked through the code, the only two places that currently seem
to have any need for an index on confrelid are the two paths in TRUNCATE
that find/check for FK relationships.  So I'm hesitant to add an index
just for that; seems like too much overhead to put onto all other
updates of pg_constraint.

What we can perhaps do instead is pull out the related OIDs (ie, a
function that given a rel OID returns a list of rels that have FK
dependencies on that rel) and then sort that list into OID order before
acting on it.

Note: the OID-sort-order concept is not perfect; if the OID counter were
to wrap around while the regression tests are running, you could get a
bogus failure of this type.  That seems low enough probability to live
with, though.  Anyway it'll never happen in the buildfarm's usage, since
buildfarm only runs the tests in freshly-initdb'd databases.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-28 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Martijn van Oosterhout  writes:
> > The input functions get it, the output functions (bpcharout,
> > bpcharsend, etc) don't. Which makes it kind of hard to print a raw
> > value if you don't know how long it's going to be. They used to, but
> > that was removed some time back.

> Even back then you couldn't rely on the typmod value to be supplied;
> it was quite likely to be passed as -1.  The issue is not actually
> with on-disk storage, it is with function/operator arguments and
> results.  Those have never been identified any more closely than by
> giving a type OID.  So for any value that came from a function,
> you won't have a typmod, and you'd better be able to find out all
> you need to know just by inspecting the value itself.  Hence, length
> words.

Hm, so it could be stored on disk without the length header as long as the
length header is added to the in-memory representation? I don't think the type
system has hooks for reading and storing data to disk though.

> This is all pretty off-topic for pgsql-general, isn't it?

[moved to -hackers]

-- 
greg


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] optimizing constant quals within outer joins

2006-06-28 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Phil Frost <[EMAIL PROTECTED]> writes:
> > The planner in fact can move the function around without changing the
> > output.
> 
> Not when it's within the nullable side of an outer join --- moving a
> WHERE clause up out of that would make the difference between no row
> out, and a null-extended row out, which are certainly not the same.
> 
> I'm not sure why it's not pulling up from the left side of the left join
> though.  That might be a bug.  What PG version is this exactly?

In fact it doesn't even pull it up out of a regular join. I looked into this
when it was first brought up on IRC and as near as I can tell it is trying to
do so and somehow just failing.


postgres=# create function foo(text) returns bool as 'select case when $1 = 
''foo'' then true else false end' language sql stable strict ;


postgres=# explain select 1 from a,a as b where foo('foo') ;
   QUERY PLAN
-
 Result  (cost=31.34..75332.74 rows=3763600 width=0)
   One-Time Filter: foo('foo'::text)
   ->  Nested Loop  (cost=31.34..75332.74 rows=3763600 width=0)
 ->  Seq Scan on a  (cost=0.00..29.40 rows=1940 width=0)
 ->  Materialize  (cost=31.34..50.74 rows=1940 width=0)
   ->  Seq Scan on a b  (cost=0.00..29.40 rows=1940 width=0)
(6 rows)


postgres=# explain select 1 from (select * from a where foo('foo')) as x, a;
   QUERY PLAN
-
 Nested Loop  (cost=31.34..25169.19 rows=1255180 width=0)
   ->  Seq Scan on a  (cost=0.00..34.25 rows=647 width=0)
 Filter: foo('foo'::text)
   ->  Materialize  (cost=31.34..50.74 rows=1940 width=0)
 ->  Seq Scan on a  (cost=0.00..29.40 rows=1940 width=0)
(5 rows)


-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-28 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Hm, so it could be stored on disk without the length header as long as
> the length header is added to the in-memory representation? I don't
> think the type system has hooks for reading and storing data to disk
> though.

No, it doesn't, and we'd pay a nonzero price for allowing that.
Currently the executor doesn't have to care (much) about whether a
tuple is on-disk or in-memory --- the individual datums look the same
either way.  Allowing them to be different would force a lot of
format conversion steps that currently need not happen.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark
On Wed, Jun 28, 2006 at 01:12:17PM -0500, Jim C. Nasby wrote:
> On Wed, Jun 28, 2006 at 01:49:55PM -0400, Andrew Dunstan wrote:
> > Personally I don't buy the misuse objection - we already have plenty of 
> > things that can be misused. As long as there is a reasonable valid use 
> > and we can make it portable enough, I think there is a good case for 
> > including it.
> Well, since Mark has one, how about we consider adding it in?
> If nothing else, can you please put your stuff on pgFoundry so others
> can find it, Mark?

It was written by Nathan Wagner <[EMAIL PROTECTED]> and myself, and
is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
I'm not an expert on the license, but it seems acceptable to me:

 "Permission to use, copy, modify, and distribute this software for
  any purpose with or without fee is hereby granted, provided that
  the above copyright notice and this permission notice appear in all
  copies."

I haven't tested to see how portable the OSSP UUID implementation is.
This is their words:

 "OSSP uuid was already written with maximum portability in mind, so
  there should be no great effort required to get it running on any Unix
  platform with a reasonable POSIX API. Additionally, the portability
  was tested by successfully building and running it on the following
  particular Unix platforms (syntax is "- ()"):

  alpha-tru644.0 (cc)
  alpha-tru645.1 (gcc, cc)
  hppa-hpux11.11 (cc)
  ia64-hpux11.23 (cc)
  ix86-debian2.2 (gcc, icc)
  ix86-debian3.0 (gcc)
  ix86-debian3.1 (gcc)
  ix86-freebsd4.9 (gcc)
  ix86-freebsd5.2 (gcc, icc)
  ix86-netbsd1.6 (gcc)
  ix86-qnx6.2 (gcc)
  ix86-solaris10 (gcc)
  ix86-unixware7.1.3 (cc)
  mips64-irix6.5 (gcc)
  sparc64-solaris8 (gcc, forte)
  sparc64-solaris9 (gcc)"

I've put it through a fair amount of testing, including using it
within compound indexes, expecting the index to be used for at
least '=', constructing many UUIDs quickly, in a sequence, and
converting it to and from string form. We chose to implement our
own encode / decode routines for performance reasons. With the
exception of testing it on a wider range of platforms, I would
call the module stable.

If there is interest - I'm sure Nathan and I would be willing to put
it on pgfoundry, and at some point give it up for inclusion into
PostgreSQL.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixed length datatypes. WAS [GENERAL] UUID's as

2006-06-28 Thread mark
On Thu, Jun 29, 2006 at 02:02:32AM -0400, [EMAIL PROTECTED] wrote:
> It was written by Nathan Wagner <[EMAIL PROTECTED]> and myself, and
> is based off the OSSP ( http://www.ossp.org/ ) UUID implementation.
> I'm not an expert on the license, but it seems acceptable to me:
> ...
> If there is interest - I'm sure Nathan and I would be willing to put
> it on pgfoundry, and at some point give it up for inclusion into
> PostgreSQL.

This might require a little bit of research. It appears that the
development version of OSSP UUID may provide its own PostgreSQL
'bindings'. I may try and contact the author of the OSSP UUID and
see whether any changes we have that he does not, can be rolled
into his version...

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Fixed length datatypes.

2006-06-28 Thread J. Andrew Rogers


On Jun 28, 2006, at 10:14 AM, [EMAIL PROTECTED] wrote:

All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)



Interestingly, the superior geometry capability is driving a lot of  
recent migration from MySQL to PostgreSQL in my own experience,  
especially with PostGIS.  The geometry parts may not get as much love  
as other parts, but they still get to leverage the very solid  
foundation they are built on top of.  The geometry capability of  
MySQL is basically checklist in nature, as it lacks the more  
sophisticated indexing and query execution that is really required to  
get passable performance from queries with geometry in them.  MySQL  
has similar geometry capability to PostgreSQL in theory if you don't  
look too closely, but in practice the engine is not up to the more  
rigorous demands of that kind of work.


With the nascent rise of the geospatial web, it is going to become a  
lot more important than it has been.



J. Andrew Rogers
[EMAIL PROTECTED]





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] [GENERAL] UUID's as primary keys

2006-06-28 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > Hm, so it could be stored on disk without the length header as long as
> > the length header is added to the in-memory representation? I don't
> > think the type system has hooks for reading and storing data to disk
> > though.
> 
> No, it doesn't, and we'd pay a nonzero price for allowing that.
> Currently the executor doesn't have to care (much) about whether a
> tuple is on-disk or in-memory --- the individual datums look the same
> either way.  Allowing them to be different would force a lot of
> format conversion steps that currently need not happen.

Is there ever a case where an entire tuple is passed around without knowing
the typmod of an attribute in the tuple?

The conversion would only really have to happen when the attribute is fetched
or stored, not when the tuple is being passed around wholesale. But I have a
feeling that would be more intrusive than just making the entire system typmod
aware.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly