Re: [HACKERS] build environment: a different makefile

2008-03-28 Thread Paul van den Bogaard

Peter,

finally I had a chance to check it out. One word: perfect!

Thanks
Paul

On 25-feb-2008, at 19:09, Peter Eisentraut wrote:


Am Mittwoch, 6. Februar 2008 schrieb Paul van den Bogaard:

I was hoping someone in the community already has a makefile that
"just" creates object files from C-sources directly that I can use to
try out the effect of in-lining to the performance of postgres.


This is now the default in 8.4devel.  Let us know what you find out.

--  
Peter Eisentraut

http://developer.postgresql.org/~petere/

---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings


 
-
Paul van den Bogaard
[EMAIL PROTECTED]

ISV-E  -- ISV Engineering, Opensource Engineering group

Sun Microsystems, Inc  phone:+31  
334 515 918
Saturnus 1  
extentsion: x (70)15918
3824 ME Amersfoort mobile:   +31  
651 913 354
The Netherlands 
fax:+31 334 515 001



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join Optimization

2008-03-28 Thread ITAGAKI Takahiro

"Gokulakannan Somasundaram" <[EMAIL PROTECTED]> wrote:

> I think the creation of minimal_tuple in the middle is a overhead which can
> be avoided by creating a mem-map and directly creating the minimal_tuple in
> the mem-map.

Many implementations of mem-map disallow to extend the sizes.
Do you have any solution about extending the mmap-ed region?

> Since Hash join is used mainly to join huge tables, this might
> benefit those warehouse customers of postgres.

If we use mmap, we will be restricted by virtual memory size.
It means we need to drop huge tempspace supports in 32bit platform, no?


Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Commitfest patches

2008-03-28 Thread Gregory Stark

Bruce, you seem to have removed one of my three patches from the queue. I
would actually prefer you remove the other two and put back that one. It's the
one I most urgently need feedback on to continue.

The patch I'm so interested in receiving feedback on is the patch to preread
pages in bitmap index scans using posix_fadvise. This is basically complete
modulo documentation and autoconf tests.

However there is a larger patch struggling to get out here. The patch I
submitted just adds a new low level routine to preread blocks and doesn't
change the buffer manager routines at all. That means we have to make two
trips through the buffer manager to check if the block is present and then
subsequently to assign a buffer and read it in (from cache) synchronously.

A more invasive form of this patch would be to assign and pin a buffer when
the preread is done. That would men subsequently we would have a pinned buffer
ready to go and not need to go back to the buffer manager a second time. We
would instead just "complete" the i/o by issuing a normal read call.

The up-side would be fewer trips through the buffer manager and attendant
locking. The down-side would be stealing buffers from the shared buffers which
could be holding other pages.

I don't feel like writing the more invasive patch only to throw it out and
take the one I've already written, but I don't know what other people's
feelings are between the two. I'm leaning towards the more invasive buffer
manager changes myself.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_standby for 8.2 (with last restart point)

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 00:54 -0400, Tom Lane wrote:
> Greg Smith <[EMAIL PROTECTED]> writes:
> > ... That was a small change in a utility that should never be run on a 
> > production system.  You're trying to get a change made to the code path 
> > people rely on for their *backups*.  Good luck with that.
> 
> While I quite agree with Greg's comments about not changing stable
> release branches unnecessarily, it seems that there's another
> consideration in this case.  If we don't back-patch %r then users
> will have to rely on hacky scripts like the one posted upthread.
> Is that really a net gain in reliability?
> 
> (I'm honestly not sure of the answer; I'm just thinking it might
> be open to debate.  In particular I don't remember how complicated
> the patch to add %r was.)

Here's the original patch, edited to remove pg_standby changes.

I've not even checked whether it will apply, but it seems fairly simple.

Gurjeet, would you like to freshen and test that up for apply to 8.2?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk
Index: doc/src/sgml/backup.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/backup.sgml,v
retrieving revision 2.97
diff -c -r2.97 backup.sgml
*** doc/src/sgml/backup.sgml	1 Feb 2007 00:28:16 -	2.97
--- doc/src/sgml/backup.sgml	9 Feb 2007 23:05:26 -
***
*** 526,531 
--- 526,536 
  archive, while any %f is replaced by the file name only.
  (The path name is relative to the working directory of the server,
  i.e., the cluster's data directory.)
+ 	A %r will be replaced by the name of the file containing
+ 	the last valid restartpoint. This is the earliest file that must be
+ 	kept to allow a restore to be restartable, so this information can
+ 	allow if the archive to be truncated to just the minimum required
+ 	to support restart of the current restore.
  Write %% if you need to embed an actual %
  character in the command.  The simplest useful command is something
  like:
***
*** 922,927 
--- 927,937 
  which is replaced by the path name to copy the log file to.
  (The path name is relative to the working directory of the server,
  i.e., the cluster's data directory.)
+ 	A %r will be replaced by the name of the file containing
+ 	the last valid restartpoint. This is the earliest file that must be
+ 	kept to allow a restore to be restartable, so this information can
+ 	allow if the archive to be truncated to just the minimum required
+ 	to support restart of the current restore.
  Write %% if you need to embed an actual %
  character in the command.  The simplest useful command is
  something like:
***
*** 1008,1020 

 
  The shell command to execute to retrieve an archived segment of
! the WAL file series. This parameter is required.
! Any %f in the string is
! replaced by the name of the file to retrieve from the archive,
! and any %p is replaced by the path name to copy
! it to on the server.
! (The path name is relative to the working directory of the server,
! i.e., the cluster's data directory.)
  Write %% to embed an actual % character
  in the command. 
 
--- 1018,1033 

 
  The shell command to execute to retrieve an archived segment of
! the WAL file series. This parameter is required. Any %f 
! 		in the string is replaced by the name of the file to retrieve from
! 		the archive, and any %p is replaced by the path name to
! 		copy it to on the server. (The path name is relative to the working
! 		directory of the server, i.e., the cluster's data directory.)
! 		A %r will be replaced by the name of the file containing
! 		the last valid restartpoint. This is the earliest file that must be
! 		kept to allow a restore to be restartable, so this information can
! 		allow if the archive to be truncated to just the minimum required
! 		to support restart of the current restore.
  Write %% to embed an actual % character
  in the command. 
 
***
*** 1024,1031 
  names that are not present in the archive; it must return nonzero
  when so asked.  Examples:
  
! restore_command = 'cp /mnt/server/archivedir/%f "%p"'
! restore_command = 'copy /mnt/server/archivedir/%f "%p"'  # Windows
  
 

--- 1037,1044 
  names that are not present in the archive; it must return nonzero
  when so asked.  Examples:
  
! restore_command = 'cp /mnt/server/archivedir/%f "%p" "%r"'
! restore_command = 'copy /mnt/server/archivedir/%f "%p" "%r"'  # Windows
  
 

***
*** 1374,1380 
  contact between the two database servers is the archive of WAL files
  that both share: primary writing

Re: [HACKERS] Commitfest patches

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 09:08 +, Gregory Stark wrote:

> A more invasive form of this patch would be to assign and pin a buffer when
> the preread is done. That would men subsequently we would have a pinned buffer
> ready to go and not need to go back to the buffer manager a second time. We
> would instead just "complete" the i/o by issuing a normal read call.

So if posix_fadvise did nothing or there was a longer than optimal
delay, this would be a net loss.

You'd need reasonable evidence that the posix_fadvise facility was a win
on all platforms and recent release levels before we could agree with
that.

I think we need a more thorough examination of this area before we
commit anything. Maybe you've done this, but I haven't seen the
analysis. Can you say more, please? Or at least say what you don't know,
so other experts listening can fill in the blanks.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Problem identifying constraints which should not be inherited

2008-03-28 Thread NikhilS
Hi Alex,

On Fri, Mar 28, 2008 at 4:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:

> On Thu, Mar 27, 2008 at 5:14 AM, NikhilS <[EMAIL PROTECTED]> wrote:
> > * Add logic to disallow ADD CONSTRAINT ONLY to parent of an inheritance
> > hierarchy
> >
> > * Add logic to mark inherited constraints in the children:
> > This can be achieved by introducing a new bool "coninherited" attribute
> in
> > pg_constraint. This will be set to true on only those check constraints
> that
> > are added to children via the inheritance mechanism
> >
> > * Add logic to disallow dropping inherited constraints directly on
> children
> > Obviously they will get dropped if a DROP CONSTRAINT is fired on the
> parent.
> > with recurse set to true (this is the default behaviour)
> >
> >  * Modify the pg_dump logic to use the new pg_constraint based attribute
> > logic for version 80300 (or should it be PG_VERSION_NUM 80400?) onwards.
> > Infact the current logic to determine if a check constraint is inherited
> is
> > to compare its name with the parent constraint name and the comment
> already
> > mentions that we should make changes in pg_constraint to avoid this
> > rudimentary way of determing the inheritance :).
> >
> >
> Attached is a WIP patch I have been playing with in my spare time.  It
> should take care of the first 2.  It does nothing for pg_dump or set
> (not) null/set default.
>
> Note it has some gross points (see comment in
> src/backend/catalog/heap.c AddRelationRawConstraints) and the
> regression tests I added are not quite up to par (and probably a bit
> redundant).  But in the interest of saving work I thought i would post
> it.
>
>
I took a quick look and it seems to be on the lines of attislocal and
attinhcount which is a good thing. I am not sure about your syscache related
changes though and also about using enums for pg_constraint attributes which
deviates from other catalog specifications. Its good that you posted your
WIP here immediately or it would have caused duplication of efforts from my
side :)

I will take a look at the pg_dump related changes if you want. We will need
changes in flagInhAttrs() and in getTableAttrs() to query the backend for
these 2 attributes for post 80300 versions.

P.S Alvaro, I think this patch did not reach the mailing list and was
stalled due to size restrictions or something.

Regards,
Nikhils
-- 
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Surfacing qualifiers

2008-03-28 Thread Tino Wildenhain

Tom Lane wrote:

David Fetter <[EMAIL PROTECTED]> writes:

You mentioned in an earlier mail that the information exposed was
inadequate.  Could you sketch out what information would really be
needed and where to find it?


The main problem with what you suggest is that it'll fail utterly
on join queries.

AFAICS any real improvement in the situation will require exposing
remote tables as a concept understood by the planner, complete
with ways to obtain index and statistical information at plan time.
After suitable decisions about join strategy and so forth, we'd
wind up with a plan containing a "RemoteTableScan" node which


I'd like to point out that Remote* might be a bit to narrow because
its also a general potential for SRF functions (e.g. any virtual table
construction). Would certainly be nice if we had a as general approach
as possible.

Cheers
Tino

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Gregory Stark
"Simon Riggs" <[EMAIL PROTECTED]> writes:

> On Fri, 2008-03-28 at 09:08 +, Gregory Stark wrote:
>
>> A more invasive form of this patch would be to assign and pin a buffer when
>> the preread is done. That would men subsequently we would have a pinned 
>> buffer
>> ready to go and not need to go back to the buffer manager a second time. We
>> would instead just "complete" the i/o by issuing a normal read call.
>
> So if posix_fadvise did nothing or there was a longer than optimal
> delay, this would be a net loss.
>
> You'd need reasonable evidence that the posix_fadvise facility was a win
> on all platforms and recent release levels before we could agree with
> that.

I posted a test program and asked for others to test it on various platforms
and various sized raid arrays. [EMAIL PROTECTED] was the only person who
bothered to run it and he only found a 16x speedup on his hardware.

 http://archives.postgresql.org/pgsql-performance/2008-01/msg00285.php

(unfortunately our mail archives seem to have failed to archive his message,
this is my followup to it)

> I think we need a more thorough examination of this area before we
> commit anything. Maybe you've done this, but I haven't seen the
> analysis. Can you say more, please? Or at least say what you don't know,
> so other experts listening can fill in the blanks.

For heaven's sake. I've been posting about this for months. Search the
archives for "RAID arrays and performance", "There's random access and then
there's random access", and "Bitmap index scan preread using posix_fadvise".

I described which interfaces worked on Linux and Solaris based on empirical
tests. I posted source code for synthetic benchmarks so we could test it on a
wide range of hardware. I posted graphs based on empirical results. I posted
mathematical formulas analysing just how much preread would be expected to
exercise a raid array fully. I'm not sure what else I can do to effect a more
thorough examination.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 11:26 +, Gregory Stark wrote:
> "Simon Riggs" <[EMAIL PROTECTED]> writes:

> For heaven's sake. I've been posting about this for months. 

Any chance of getting all that together on a single Wiki page, so we can
review everything? We'll need those docs after its committed anyway.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Problem identifying constraints which should not be inherited

2008-03-28 Thread Alvaro Herrera
NikhilS escribió:

> P.S Alvaro, I think this patch did not reach the mailing list and was
> stalled due to size restrictions or something.

Argh, you are right.  We don't have this on the archives anywhere :-(
Probably it got stuck on Maia ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [BUGS] Problem identifying constraints which should not be inherited

2008-03-28 Thread Alvaro Herrera
NikhilS escribió:

> P.S Alvaro, I think this patch did not reach the mailing list and was
> stalled due to size restrictions or something.

OK, it's archived now:
http://archives.postgresql.org/pgsql-patches/2008-03/msg00392.php
Thanks.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Alvaro Herrera
Tom Lane wrote:

> What I'm envisioning is that we lose the notion of "this is a
> serializable snapshot" that that function currently has, and just
> give it the rule "if MyProc->xmin is currently zero, then set it".
> Then the only additional mechanism needed is for the snapshot
> manager to detect when all snapshots are gone and zero out
> MyProc->xmin --- that would happen sometime during command shutdown,
> and per current discussion it shouldn't need a lock.

This is all easily done -- it's just a couple of extra lines.

However I am now having a definitional problem.  Perhaps it is so
obvious to everyone else that nobody bothered mentioning it.  I know I
wasn't aware until I tried a simple test and found that the Xmin wasn't
advancing as I was expecting.

The problem is that we always consider every transaction's PGPROC->xid
in calculating MyProc->xmin.  So if you have a long running transaction,
it doesn't matter how far beyond the snapshots are -- the value returned
by GetOldestXmin will always be at most the old transaction's Xid.  Even
if that transaction cannot see the old rows because all of its snapshots
are way in the future.

As far as I can see, for the purposes of VACUUM we can remove any tuple
that was deleted after the old transaction's Xid but before that
transaction's Xmin (i.e. all of its live snapshots).  This means we get
to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
in GetSnapshotData.  It would not surprise me, however, to find out that
I am overlooking something and this is incorrect.

Am I blind?

It is quite possible that for the other purposes that we're using Xmins
for, this is not so.  If that's the case, I would argue that we would
need to introduce a separate TransactionId to keep track of, which would
retain the current semantics of Xmin, and let VACUUM use what I am
proposing.  I haven't examined those other uses though.

Thoughs?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Heikki Linnakangas

Gregory Stark wrote:

I described which interfaces worked on Linux and Solaris based on empirical
tests. I posted source code for synthetic benchmarks so we could test it on a
wide range of hardware. I posted graphs based on empirical results. I posted
mathematical formulas analysing just how much preread would be expected to
exercise a raid array fully. I'm not sure what else I can do to effect a more
thorough examination.


I'm sure posix_fadvise is a win in the case where it's supposed to help: 
a scan that does a lot of random reads, on RAID array. And you've posted 
results demonstrating that. What we need to make sure is that there's no 
significant loss when it's not helping.


It seems that the worst case for this patch is a scan on a table that 
doesn't fit in shared_buffers, but is fully cached in the OS cache. In 
that case, the posix_fadvise calls would be a certain waste of time.


That could be alleviated by deciding at plan time whether to preread or 
not, based on effective_cache_size.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 10:35 -0300, Alvaro Herrera wrote:

> The problem is that we always consider every transaction's PGPROC->xid
> in calculating MyProc->xmin.  So if you have a long running
> transaction, it doesn't matter how far beyond the snapshots are -- the
> value returned by GetOldestXmin will always be at most the old
> transaction's Xid.  Even if that transaction cannot see the old rows
> because all of its snapshots are way in the future.

It may not have a TransactionId yet.

So we should have the capability to prevent long running read-only
transactions from causing a build up of dead row versions. But long
running write transactions would still be a problem.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Alvaro Herrera
Simon Riggs wrote:
> On Fri, 2008-03-28 at 10:35 -0300, Alvaro Herrera wrote:
> 
> > The problem is that we always consider every transaction's PGPROC->xid
> > in calculating MyProc->xmin.  So if you have a long running
> > transaction, it doesn't matter how far beyond the snapshots are -- the
> > value returned by GetOldestXmin will always be at most the old
> > transaction's Xid.  Even if that transaction cannot see the old rows
> > because all of its snapshots are way in the future.
> 
> It may not have a TransactionId yet.

How is this a problen?  If it ever gets one, it will be in the future.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Status of GIT mirror (Was [HACKERS] having problem in rsync'ing cvs)

2008-03-28 Thread Aidan Van Dyk
* Brendan Jurd <[EMAIL PROTECTED]> [080327 16:36]:
 
> Ah, the old "post-maintenance-disabled-cron" gaff.  One of my personal
> favourites. =)
> 
> I'm not sure that the git repos has fully recovered.  There seems to a
> block of commits missing, between 2008-03-25 13:09 and 2008-03-27
> 17:24 UTC.
> 
> Looking at the CVS logs, there was definitely commit action in that
> timeframe, but none of it is showing up on the git shortlog.

OK, so it should all be valid again.  

Sorry for any issues I caused to those trying to use that mirror
regularly...

a.
-- 
Aidan Van Dyk Create like a god,
[EMAIL PROTECTED]   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 11:26 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
> > On Fri, 2008-03-28 at 10:35 -0300, Alvaro Herrera wrote:
> > 
> > > The problem is that we always consider every transaction's PGPROC->xid
> > > in calculating MyProc->xmin.  So if you have a long running
> > > transaction, it doesn't matter how far beyond the snapshots are -- the
> > > value returned by GetOldestXmin will always be at most the old
> > > transaction's Xid.  Even if that transaction cannot see the old rows
> > > because all of its snapshots are way in the future.
> > 
> > It may not have a TransactionId yet.
> 
> How is this a problen?  If it ever gets one, it will be in the future.

Yeh, that was my point. So the problem you mention mostly goes away.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] segfault in locking code

2008-03-28 Thread Suresh
Hello,

I have a custom code in postgres which runs properly in some occasions and 
segfaults some times. The trace is as below :

Program received signal SIGSEGV, Segmentation fault.
0x081ae8c4 in LWLockRelease (lockid=664)
at ../../../../src/include/storage/s_lock.h:128
128 __asm__ __volatile__(


(gdb) where
#0  0x081ae8c4 in LWLockRelease (lockid=664)
at ../../../../src/include/storage/s_lock.h:128
#1  0x0808f820 in heap_fetch_tuple (relation=0xb5d986d8, snapshot=0xa298aa0, 
buffer=305, tid=0xa23f600, tuple=0xa29db0c, pgstat_info=0xa29db30, 
tupStat=0xbfac9374) at heapam.c:3404
#2  0x08144df2 in ExecNestLoop (node=0xa298f30) at nodeNestloop.c:452
#3  0x08136840 in ExecProcNode (node=0xa298f30) at execProcnode.c:352
#4  0x08135ba1 in ExecutorRun (queryDesc=0xa298ac8, 
direction=ForwardScanDirection, count=0) at execMain.c:1162
#5  0x081b7e60 in PortalRunSelect (portal=0xa296a98, 
forward=, count=0, dest=0x82d3308) at pquery.c:794
#6  0x081b8a88 in PortalRun (portal=0xa296a98, count=2147483647, 
dest=0x82d3308, altdest=0x82d3308, completionTag=0xbfac9608 "")
at pquery.c:646
#7  0x081b48fc in exec_simple_query (
query_string=0xa275b58 "select l_orderkey as a from tpcd.orders, 
tpcd.lineitem where o_orderkey=l_orderkey ;\n") at postgres.c:1003
#8  0x081b6371 in PostgresMain (argc=1, argv=0xa2379f0, 
username=0xa238398 "suresh") at postgres.c:3221
#9  0x081532e3 in main (argc=2, argv=Cannot access memory at address 0xfffd
) at main.c:411

It segfaults in the locking _asm_ code. What could be the reason for this 
variable behavior ?

Thanks and regards,
Suresh

   
-
Never miss a thing.   Make Yahoo your homepage.

Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> As far as I can see, for the purposes of VACUUM we can remove any tuple
> that was deleted after the old transaction's Xid but before that
> transaction's Xmin (i.e. all of its live snapshots).  This means we get
> to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
> in GetSnapshotData.  It would not surprise me, however, to find out that
> I am overlooking something and this is incorrect.

This seems entirely off-base to me.  In particular, if a transaction
has an XID then its XMIN will never be greater than that, so I don't
even see how you figure the case will arise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Alvaro Herrera
Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > As far as I can see, for the purposes of VACUUM we can remove any tuple
> > that was deleted after the old transaction's Xid but before that
> > transaction's Xmin (i.e. all of its live snapshots).  This means we get
> > to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
> > in GetSnapshotData.  It would not surprise me, however, to find out that
> > I am overlooking something and this is incorrect.
> 
> This seems entirely off-base to me.  In particular, if a transaction
> has an XID then its XMIN will never be greater than that, so I don't
> even see how you figure the case will arise.

My point exactly -- can we let the Xmin go past its Xid?  You imply we
can't, but why?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segfault in locking code

2008-03-28 Thread Tom Lane
Suresh <[EMAIL PROTECTED]> writes:
> I have a custom code in postgres which runs properly in some occasions and 
> segfaults some times. The trace is as below :

The traceback you show appears to lead through code that doesn't exist
in any public version of Postgres.  So I think it's your own bug to
solve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Problem identifying constraints which should not be inherited

2008-03-28 Thread Alex Hunsaker
On Fri, Mar 28, 2008 at 4:07 AM, NikhilS <[EMAIL PROTECTED]> wrote:
> Hi Alex,
>
>
> On Fri, Mar 28, 2008 at 4:58 AM, Alex Hunsaker <[EMAIL PROTECTED]> wrote:
> > Attached is a WIP patch I have been playing with in my spare time.  It
> > should take care of the first 2.  It does nothing for pg_dump or set
> > (not) null/set default.
> >

Note that should say first 3 items...

>
> I took a quick look and it seems to be on the lines of attislocal and
> attinhcount which is a good thing. I am not sure about your syscache related
> changes though and also about using enums for pg_constraint attributes which
> deviates from other catalog specifications. Its good that you posted your
> WIP here immediately or it would have caused duplication of efforts from my
> side :)
>

Yeah I was planning on taking the enums out.  Or at least the one i
did in pg_constraint.h  Like I said it was a WIP :) The syscache
stuff mainly for ease of use but can be taken out easily enough if
anyone objects.   (I added  unique index on conrelid, constrname and
SearchSysCache(CONSTRNAME...) for the curious who did not look at the
patch).

Ill take out the enums and see about getting a real fix for
create table (x int check constraint ...) inherits ((some table that
has an x int check constraint)

expect v2 sometime later today

> I will take a look at the pg_dump related changes if you want. We will need
> changes in flagInhAttrs() and in getTableAttrs() to query the backend for
> these 2 attributes for post 80300 versions.
>

That would be great!

> P.S Alvaro, I think this patch did not reach the mailing list and was
> stalled due to size restrictions or something.
>
>
> Regards,
> Nikhils
> --
> EnterpriseDB http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Zeugswetter Andreas OSB SD

Heikki wrote:
> It seems that the worst case for this patch is a scan on a table that 
> doesn't fit in shared_buffers, but is fully cached in the OS cache. In

> that case, the posix_fadvise calls would be a certain waste of time.

I think this is a misunderstanding, the fadvise is not issued to read
the
whole table and is not issued for table scans at all (and if it were it
would 
only advise for the next N pages).

So it has nothing to do with table size. The fadvise calls need to be
(and are) 
limited by what can be used in the near future, and not for the whole
statement.

e.g. N next level index pages that are relevant, or N relevant heap
pages one 
index leaf page points at. Maybe in the index case N does not need to be
limited,
since we have a natural limit on how many pointers fit on one page.

In general I think separate reader processes (or threads :-) that
directly read
into the bufferpool would be a more portable and efficient
implementation. 
E.g. it could use ScatterGather IO. So I think we should look, that the
fadvise 
solution is not obstruing that path, but I think it does not.

Gregory wrote:
>> A more invasive form of this patch would be to assign and pin a
buffer when
>> the preread is done. That would men subsequently we would have a
pinned buffer
>> ready to go and not need to go back to the buffer manager a second
time. We
>> would instead just "complete" the i/o by issuing a normal read call.

I guess you would rather need to mark the buffer for use for this page,
but let any backend that needs it first, pin it and issue the read.
I think the fadviser should not pin it in advance, since he cannot
guarantee to
actually read the page [soon]. Rather remember the buffer and later
check and pin 
it for the read. Else you might be blocking the buffer.
But I think doing something like this might be good since it avoids
issuing duplicate
fadvises.

Andreas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Status of GIT mirror (Was [HACKERS] having problem in rsync'ing cvs)

2008-03-28 Thread Brendan Jurd
On 29/03/2008, Aidan Van Dyk <[EMAIL PROTECTED]> wrote:
> * Brendan Jurd <[EMAIL PROTECTED]> [080327 16:36]:
>  >
>  > Looking at the CVS logs, there was definitely commit action in that
>  > timeframe, but none of it is showing up on the git shortlog.
>
> OK, so it should all be valid again.
>

Looks good to me.  Thanks Aidan.

BJ

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

As far as I can see, for the purposes of VACUUM we can remove any tuple
that was deleted after the old transaction's Xid but before that
transaction's Xmin (i.e. all of its live snapshots).  This means we get
to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
in GetSnapshotData.  It would not surprise me, however, to find out that
I am overlooking something and this is incorrect.

This seems entirely off-base to me.  In particular, if a transaction
has an XID then its XMIN will never be greater than that, so I don't
even see how you figure the case will arise.


My point exactly -- can we let the Xmin go past its Xid?  You imply we
can't, but why?


Everything < xmin is considered to be not running anymore. Other 
transactions would consider the still-alive transaction as aborted, and 
start setting hint bits etc.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Heikki Linnakangas

Zeugswetter Andreas OSB SD wrote:

Heikki wrote:
It seems that the worst case for this patch is a scan on a table that 
doesn't fit in shared_buffers, but is fully cached in the OS cache. In



that case, the posix_fadvise calls would be a certain waste of time.


I think this is a misunderstanding, the fadvise is not issued to read
the
whole table and is not issued for table scans at all (and if it were it
would 
only advise for the next N pages).


So it has nothing to do with table size. The fadvise calls need to be
(and are) 
limited by what can be used in the near future, and not for the whole

statement.


Right, I was sloppy. Instead of table size, what matters is the amount 
of data the scan needs to access. The point remains that if the data is 
already in OS cache, the posix_fadvise calls are a waste of time, 
regardless of how many pages ahead you advise.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Alvaro Herrera
Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> Alvaro Herrera <[EMAIL PROTECTED]> writes:
 As far as I can see, for the purposes of VACUUM we can remove any tuple
 that was deleted after the old transaction's Xid but before that
 transaction's Xmin (i.e. all of its live snapshots).  This means we get
 to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
 in GetSnapshotData.  It would not surprise me, however, to find out that
 I am overlooking something and this is incorrect.
>>> This seems entirely off-base to me.  In particular, if a transaction
>>> has an XID then its XMIN will never be greater than that, so I don't
>>> even see how you figure the case will arise.
>>
>> My point exactly -- can we let the Xmin go past its Xid?  You imply we
>> can't, but why?
>
> Everything < xmin is considered to be not running anymore. Other  
> transactions would consider the still-alive transaction as aborted, and  
> start setting hint bits etc.

Okay.  So let's say we invent another TransactionId counter -- we keep
Xmin for the current purposes, and the other counter keeps track of
snapshots ignoring Xid.  This new counter could be used by VACUUM to
trim dead tuples.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Bruce Momjian
Heikki Linnakangas wrote:
> > So it has nothing to do with table size. The fadvise calls need to be
> > (and are) 
> > limited by what can be used in the near future, and not for the whole
> > statement.
> 
> Right, I was sloppy. Instead of table size, what matters is the amount 
> of data the scan needs to access. The point remains that if the data is 
> already in OS cache, the posix_fadvise calls are a waste of time, 
> regardless of how many pages ahead you advise.

I now understand what posix_fadvise() is allowing us to do. 
posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will
need a certain block in the future --- this seems much cheaper than a
background reader.

We know we will need the blocks, and telling the kernel can't hurt,
except that there is overhead in telling the kernel.  Has anyone
measured how much overhead?  I would be interested in a test program
that read the same page over and over again from the kernel, with and
without a posix_fadvise() call.

Should we consider only telling the kernel X pages ahead, meaning when
we are on page 10 we tell it about page 16?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

2008-03-28 Thread Zubkovsky, Sergey

It seems I've found the cause and the workaround of the problem.
MSVC's stat() is implemented by using FindNextFile().
MSDN contains the following suspicious paragraph аbout FindNextFile():

"In rare cases, file attribute information on NTFS file systems may not be 
current at the time you call this function. To obtain the current NTFS file 
system file attributes, call GetFileInformationByHandle."

Since we generally cannot open an examined file, we need another way.

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was rewrote 
by adding GetFileAttributesEx() to correct stat's st_size value.
I had seen that a result of MSVC's stat() and a result of GetFileAttributesEx() 
may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on any 
NT-based OS.


Thanks,
Sergey Zubkovsky

-Original Message-
From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 27, 2008 3:54 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; 
Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT



Zubkovsky, Sergey wrote:
> Maybe this helps:
>
> "It is not an error to set a file pointer to a position beyond the end
> of the file. The size of the file does not increase until you call the
> SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
> increases the size of the file to the file pointer position plus the
> size of the buffer written, which results in the intervening bytes
> uninitialized."
>
> http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx
>
> According to Windows' lseek implementation (attached) SetEndOfFile()
> isn't called for this case.
>
>   
>

Yes, but we immediately follow the lseek bye a write(). See 
src/backend/storage/smgr/md.c:mdextend() .

cheers

andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Bruce Momjian
Gregory Stark wrote:
> 
> Bruce, you seem to have removed one of my three patches from the queue. I
> would actually prefer you remove the other two and put back that one. It's the
> one I most urgently need feedback on to continue.

I talked to Greg on IM.  The complaint was that his posix_fadvise()
patch was added to the TODO list as a URL, rather than getting him
feedback.  He is getting feedback now in another thread.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prereading using posix_fadvise

2008-03-28 Thread Gregory Stark
"Heikki Linnakangas" <[EMAIL PROTECTED]> writes:

> Right, I was sloppy. Instead of table size, what matters is the amount of data
> the scan needs to access. The point remains that if the data is already in OS
> cache, the posix_fadvise calls are a waste of time, regardless of how many
> pages ahead you advise.

I'm not sure that's really fair to the scan. The typical bitmap index scan
will probably be accessing far fewer pages than what fits in cache. And
prereading them will usually help unless you're always reading the same pages.
The usual case will be bitmap scans for different pages each time.

I'm picturing something like our mail archive search. Each user who uses it
retrieves just 10 hits or so. But there's no reason to think that because
those 10 hits fit in effective_cache_size that they'll actually be found
there.

It might make more sense to compare the table size. If the table size fits in
cache then any random bunch of pages is likely to be in cache somewhere.
Except we have no idea how much of the database this table represents. If the
user has a schema like ones we've seen posted to the lists many times with
hundreds of tables then deductions based on the size of a single table will be
questionable.

I'm also leery about scaling back the prereading for systems with large
effective_cache_size since those are precisely the systems which are likely to
have raid arrays and be helped the most by this on queries where it's helpful.

I feel like we're probably stuck optimizing for the case where i/o is
happening and hoping that the filesystem cache is efficient. If the entire
database fits in cache then the user has to adjust random_page_cost and should
probably set preread_pages to 0 (or effective_spindle_count to 1 depending on
which version of the patch you're reading) as well.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Martijn van Oosterhout
On Fri, Mar 28, 2008 at 11:41:58AM -0400, Bruce Momjian wrote:
> Should we consider only telling the kernel X pages ahead, meaning when
> we are on page 10 we tell it about page 16?

It's not so interesting for sequential reads, the kernel can work that
out for itself. Disk reads are usually in blocks of at least 128K
anyway, so there's a real good chance you have block 16 already.

The interesting case is index scan, where you so a posix_fadvise() on
the next block *before* returning the items in the current block. Then
by the time you've processed these tuples, the next block will
hopefully have been read in and we can proceed without delay.

Or fadvising all the tuples referred to from an index page at once so
the kernel can determine the optimal order to fetch them. The read()
will still be in order of the tuple, but the delay will (hopefully) be
less.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

So it has nothing to do with table size. The fadvise calls need to be
(and are) 
limited by what can be used in the near future, and not for the whole

statement.
Right, I was sloppy. Instead of table size, what matters is the amount 
of data the scan needs to access. The point remains that if the data is 
already in OS cache, the posix_fadvise calls are a waste of time, 
regardless of how many pages ahead you advise.


I now understand what posix_fadvise() is allowing us to do. 
posix_fadvise(POSIX_FADV_WILLNEED) allows us to tell the kernel we will

need a certain block in the future --- this seems much cheaper than a
background reader.


Yep.


We know we will need the blocks, and telling the kernel can't hurt,
except that there is overhead in telling the kernel.  Has anyone
measured how much overhead?  I would be interested in a test program
that read the same page over and over again from the kernel, with and
without a posix_fadvise() call.


Agreed, that needs to be benchmarked next. There's also some overhead in 
doing the buffer manager hash table lookup to check whether the page is 
in shared_buffers. We could reduce that by the more complex approach 
Greg mentioned of allocating a buffer in shared_buffers when we do 
posix_fadvise.



Should we consider only telling the kernel X pages ahead, meaning when
we are on page 10 we tell it about page 16?


Yes. You don't want to fire off thousands of posix_fadvise calls 
upfront. That'll just flood the kernel, and it will most likely ignore 
any advise after the first few hundred or so. I'm not sure what the 
appropriate amount of read ahead would be, though. Probably depends a 
lot on the OS and hardware, and needs to be a adjustable.


In some cases we can't easily read ahead more than a certain number of 
pages. For example, in a regular index scan, we can easily fire off 
posix_advise calls for all the heap pages referenced by a single index 
page, but reading ahead more than that becomes much more complex.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT

2008-03-28 Thread Zubkovsky, Sergey

We are not alone ;-)

http://sourceforge.net/project/shownotes.php?group_id=129038&release_id=557649

Version 1.02
2007-01-25
* Fix the stat function (on Windows) to use GetFileAttributesEx insead of 
FindFirstFile



-Original Message-
From: Zubkovsky, Sergey 
Sent: Friday, March 28, 2008 6:43 PM
To: Andrew Dunstan
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; 
Magnus Hagander
Subject: RE: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT


It seems I've found the cause and the workaround of the problem.
MSVC's stat() is implemented by using FindNextFile().
MSDN contains the following suspicious paragraph аbout FindNextFile():

"In rare cases, file attribute information on NTFS file systems may not be 
current at the time you call this function. To obtain the current NTFS file 
system file attributes, call GetFileInformationByHandle."

Since we generally cannot open an examined file, we need another way.

In the prepared custom build of PG 8.3.1 the native MSVC's stat() was rewrote 
by adding GetFileAttributesEx() to correct stat's st_size value.
I had seen that a result of MSVC's stat() and a result of GetFileAttributesEx() 
may be differ by the file size values at least.

The most important thing is the test in the original post
( http://archives.postgresql.org/pgsql-docs/2008-03/msg00041.php )
doesn't reproduce any inconsistence now.
All work fine.

This was tested on my WinXP SP2 platform but I suppose it will work on any 
NT-based OS.


Thanks,
Sergey Zubkovsky

-Original Message-
From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 27, 2008 3:54 PM
To: Zubkovsky, Sergey
Cc: Tom Lane; Alvaro Herrera; Gregory Stark; pgsql-hackers@postgresql.org; 
Magnus Hagander
Subject: Re: [HACKERS] [DOCS] pg_total_relation_size() and CHECKPOINT



Zubkovsky, Sergey wrote:
> Maybe this helps:
>
> "It is not an error to set a file pointer to a position beyond the end
> of the file. The size of the file does not increase until you call the
> SetEndOfFile, WriteFile, or WriteFileEx function. A write operation
> increases the size of the file to the file pointer position plus the
> size of the buffer written, which results in the intervening bytes
> uninitialized."
>
> http://msdn2.microsoft.com/en-us/library/aa365541(VS.85).aspx
>
> According to Windows' lseek implementation (attached) SetEndOfFile()
> isn't called for this case.
>
>   
>

Yes, but we immediately follow the lseek bye a write(). See 
src/backend/storage/smgr/md.c:mdextend() .

cheers

andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text <-> C string

2008-03-28 Thread Brendan Jurd
On 26/03/2008, Tom Lane <[EMAIL PROTECTED]> wrote:
>  There are no textout/textin calls left, but I may have missed some
>  places that were doing it the hard way with direct palloc/memcpy
>  manipulations.  It might be worth trolling all the VARDATA() references
>  to see if any more are easily replaceable.
>

I think there are perhaps a dozen such sites, and I hope to submit a
cleanup patch that gets rid of these soon.

I did come across one site I'm not sure about in utils/adt/xml.c:291

/*
 * We need a null-terminated string to pass to parse_xml_decl().  Rather
 * than make a separate copy, make the temporary result one byte bigger
 * than it needs to be.
 */
result = palloc(nbytes + 1 + VARHDRSZ);
SET_VARSIZE(result, nbytes + VARHDRSZ);
memcpy(VARDATA(result), str, nbytes);
str = VARDATA(result);
str[nbytes] = '\0';

The author seemed pretty sure he didn't want to make a separate copy
of the string, so I'm thinking there's not a whole lot I can do to
improve this site.

>  I notice in particular that xfunc.sgml contains sample C functions to
>  copy and concatenate text.  While these aren't directly replaceable
>  with the new functions, I wonder whether we ought to change the examples
>  to make them less certain to break if we ever change text's
>  representation.
>

Yes, these sample C functions are shown in tutorial/funcs.c and
funcs_new.c as well.  I agree that the examples could do with
changing, but don't have any keen insight on what to change them to.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] advancing snapshot's xmin

2008-03-28 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Heikki Linnakangas wrote:

Alvaro Herrera wrote:

Tom Lane wrote:

Alvaro Herrera <[EMAIL PROTECTED]> writes:

As far as I can see, for the purposes of VACUUM we can remove any tuple
that was deleted after the old transaction's Xid but before that
transaction's Xmin (i.e. all of its live snapshots).  This means we get
to ignore Xid in GetOldestXmin and in the TransactionXmin calculations
in GetSnapshotData.  It would not surprise me, however, to find out that
I am overlooking something and this is incorrect.

This seems entirely off-base to me.  In particular, if a transaction
has an XID then its XMIN will never be greater than that, so I don't
even see how you figure the case will arise.

My point exactly -- can we let the Xmin go past its Xid?  You imply we
can't, but why?
Everything < xmin is considered to be not running anymore. Other  
transactions would consider the still-alive transaction as aborted, and  
start setting hint bits etc.


Okay.  So let's say we invent another TransactionId counter -- we keep
Xmin for the current purposes, and the other counter keeps track of
snapshots ignoring Xid.  This new counter could be used by VACUUM to
trim dead tuples.


Hmm. So if we call that counter VacuumXmin for now, you could remove 
deleted rows with xmax < VacuumXmin, as long as that xmax is not in the 
set of running transactions? I guess that would work.


In general: VACUUM can remove any tuple that's not visible to any 
snapshot in the system. We don't want to keep all snapshots in shared 
memory, so we use some conservative approximation of that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transaction Snapshot Cloning

2008-03-28 Thread Chris Browne
[EMAIL PROTECTED] (Bruce Momjian) writes:
> Added to TODO:
>
> * Allow one transaction to see tuples using the snapshot of another
>   transaction
>
>   This would assist multiple backends in working together.
>   http://archives.postgresql.org/pgsql-hackers/2008-01/msg00400.php

FYI, code for this is presently available on pgFoundry, at:
   .

I have some benchmarking scripts to commit to it, once I get added to
the project. (Simon?  :-))

FYI, preliminary testing, on a machine hooked up to an EMC CX700 disk
array is showing that, if I do concurrent dumps of 4 tables, it gives
a bit better than a 2x speedup over dumping the four tables serially,
so there's definitely some fruit here.
-- 
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/rdbms.html
Talk a lot, don't you?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Prereading using posix_fadvise (was Re: [HACKERS] Commitfest patches)

2008-03-28 Thread Bruce Momjian
Heikki Linnakangas wrote:
> > Should we consider only telling the kernel X pages ahead, meaning when
> > we are on page 10 we tell it about page 16?
> 
> Yes. You don't want to fire off thousands of posix_fadvise calls 
> upfront. That'll just flood the kernel, and it will most likely ignore 
> any advise after the first few hundred or so. I'm not sure what the 
> appropriate amount of read ahead would be, though. Probably depends a 
> lot on the OS and hardware, and needs to be a adjustable.
> 
> In some cases we can't easily read ahead more than a certain number of 
> pages. For example, in a regular index scan, we can easily fire off 
> posix_advise calls for all the heap pages referenced by a single index 
> page, but reading ahead more than that becomes much more complex.

And if you read-ahead too far the pages might get pushed out of the
kernel cache before you ask to read them.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is psql command line interface broken on HEAD?

2008-03-28 Thread Zdenek Kotala

When I try run psql with a option on HEAD I get following message:

-bash-3.2$ /var/tmp/pg84_upg/bin/psql template1 -t
psql: FATAL:  role "-t" does not exist

when I change a order to

var/tmp/pg84_upg/bin/psql -t template1

then everything is OK.

Does it intention or it is a bug? Current behavior correspond with psql help but 
It works fine on 8.1, 8.2 (8.3 not tested).


Thanks Zdenek


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Bruce Momjian
Neil Conway wrote:
> On Mon, 2007-07-05 at 19:48 +0100, Tomas Doran wrote:
> > As suggested in the TODO list (and as I need the functionality  
> > myself), I have implemented the current_query interface to  
> > debug_query_string.

It actually has been removed from the TODO list since you saw it last.

> Comments:
> 
...
> * AFAIK debug_query_string() still does the wrong thing when the user
> submits multiple queries in a single protocol message (separated by
> semi-colons). Not sure there's a way to fix that that is both easy and
> efficient, though...

The problem with the last bullet is pretty serious.  It can be
illustrated with psql:

$ psql -c 'set log_statement="all";select 1;select 2;' test

Server log shows:

STATEMENT:  set log_statement=all;select 1;select 2;

Obviously this is what current_query() would return if we commit this
patch, and it probably isn't 100% accurate.  I see dblink exposes this:

http://www.postgresql.org/docs/8.3/static/contrib-dblink-current-query.html

Returns the currently executing interactive command string of the
local database session, or NULL if it can't be determined.  Note
that this function is not really related to dblink's
other functionality.  It is provided since it is sometimes useful
in generating queries to be forwarded to remote databases.

but making it more widely available with a possible inaccurate result is
a problem.  We can't think of anyway to fix this cleanly --- it would
require a separate parser pass to split queries by semicolons (which
psql does by default in interactive mode).  Right now the parser does
the splitting as part of its normal single-parse operation and just
creates parse trees that don't have string representations.

Perhaps we could name it received_query() to indicate it is what the
backend received and it not necessarily the _current_ query.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prereading using posix_fadvise

2008-03-28 Thread Gregory Stark


Someone wrote:
>>>
>>> Should we consider only telling the kernel X pages ahead, meaning when
>>> we are on page 10 we tell it about page 16?

The patch I posted specifically handles bitmap heap scans. It does in fact
prefetch only a limited number of pages from the bitmap stream based on a guc,
but it tries to be a bit clever about ramping up gradually.

The real danger here, imho, is doing read-ahead for blocks the client never
ends up reading. By ramping up the read-ahead gradually as the client reads
records we protect against that. 

> Heikki Linnakangas wrote:
>> 
>> Yes. You don't want to fire off thousands of posix_fadvise calls 
>> upfront. That'll just flood the kernel, and it will most likely ignore 
>> any advise after the first few hundred or so. I'm not sure what the 
>> appropriate amount of read ahead would be, though. Probably depends a 
>> lot on the OS and hardware, and needs to be a adjustable.

"Bruce Momjian" <[EMAIL PROTECTED]> writes:
>
> And if you read-ahead too far the pages might get pushed out of the
> kernel cache before you ask to read them.

While these concerns aren't entirely baseless the actual experiments seem to
show the point of diminishing returns is pretty far out there. Look at the
graphs below, keeping in mind that the X axis is the number of blocks
prefetched.

http://archives.postgresql.org/pgsql-hackers/2007-12/msg00088.php

The pink case is analogous to a bitmap index scan where the blocks are read in
order. In that case the point of diminishing returns is reached around 64
pages. But performance doesn't actually dip until around 512 pages. And even
prefetching 8,192 blocks the negative impact on performance is still much less
severe than using a smaller-than-optimal prefetch size.

This is on a piddly little 3-way raid. On a larger raid you would want even
larger prefetch sizes.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Gregory Stark

"Bruce Momjian" <[EMAIL PROTECTED]> writes:

> Gregory Stark wrote:
>> 
>> Bruce, you seem to have removed one of my three patches from the queue. I
>> would actually prefer you remove the other two and put back that one. It's 
>> the
>> one I most urgently need feedback on to continue.
>
> I talked to Greg on IM.  The complaint was that his posix_fadvise()
> patch was added to the TODO list as a URL, rather than getting him
> feedback.  He is getting feedback now in another thread.

Sort of. The feedback I'm getting is from people who want to discuss the
"easy" parameters of the patch like how much prefetching to do and when
without actually reading to see what's already been done. I'm happy to discuss
them as I find these things interesting too.

But what I really need is someone to read the patch and say "looks good" or
point out things they don't like. In particular, what I really, really want is
some guidance on the singular key question I asked.

I want to know if we're interested in the more invasive patch restructuring
the buffer manager. My feeling is that we probably are eventually. But I
wonder if people wouldn't feel more comfortable taking baby steps at first
which will have less impact in cases where it's not being heavily used.

It's a lot more work to do the invasive patch and there's not much point in
doing it if we're not interested in taking it when it's ready.

Here's an updated patch. It's mainly just a CVS update but it also includes
some window dressing: an autoconf test, some documentation, and some fancy
arithmetic to auto-tune the amount of prefetching based on a more real-world
parameter "effective_spindle_count". It also includes printing out how much
the prefetching target got ramped up to in the explain output -- though I'm
not sure we really want that in the tree, but it's nice for performance
testing.



bitmap-preread-v9.diff.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Tomas Doran


On 28 Mar 2008, at 17:23, Bruce Momjian wrote:

Neil Conway wrote:

On Mon, 2007-07-05 at 19:48 +0100, Tomas Doran wrote:

As suggested in the TODO list (and as I need the functionality
myself), I have implemented the current_query interface to
debug_query_string.


It actually has been removed from the TODO list since you saw it last.


I submitted a patch to make it do that a while ago :)


Comments:


...

* AFAIK debug_query_string() still does the wrong thing when the user
submits multiple queries in a single protocol message (separated by
semi-colons). Not sure there's a way to fix that that is both easy  
and

efficient, though...


The problem with the last bullet is pretty serious.  It can be
illustrated with psql:

$ psql -c 'set log_statement="all";select 1;select 2;' test

Server log shows:

STATEMENT:  set log_statement=all;select 1;select 2;

Obviously this is what current_query() would return if we commit this
patch, and it probably isn't 100% accurate.


Yeah, this was pointed out to me at the time.

Fortunately, for what I wanted to do, 'Don't do that then' was a very  
viable answer..



I see dblink exposes this:

http://www.postgresql.org/docs/8.3/static/contrib-dblink- 
current-query.html


Returns the currently executing interactive command string of the
local database session, or NULL if it can't be determined.  Note
that this function is not really related to dblink's
other functionality.  It is provided since it is sometimes useful
in generating queries to be forwarded to remote databases.


My patch provided this functionality in core, and made dblink's  
current procedure to do the same just delegate to the one that I  
provided (for backwards compatibility reasons)



but making it more widely available with a possible inaccurate  
result is

a problem.  We can't think of anyway to fix this cleanly --- it would
require a separate parser pass to split queries by semicolons (which
psql does by default in interactive mode).  Right now the parser does
the splitting as part of its normal single-parse operation and just
creates parse trees that don't have string representations.

Perhaps we could name it received_query() to indicate it is what the
backend received and it not necessarily the _current_ query.


reveived_query() sounds like a very sane name for me, and documenting  
it as such would allow you to expose the functionality without the  
possible complaints...


In a lot of environments where you actually want this, then  
constraining to 1 query per statement (outside the DB level) is very  
doable... I wouldn't like to see the functionality skipped over as  
providing this only solves 80% of cases.



In the particular application that I wrote the patch for, we needed  
to audit 'all access to encrypted credit card numbers' for PCI  
requirements..


Our solution was to put all cc number containing tables into their  
own schema / with no general permissions, and to use SECURITY DEFINER  
stored procedures to access them (and log the access).. However that  
wasn't quite good enough, so we got our DB access layer to iterate up  
the call stack (till outside our SQL abstraction), and add a comment  
to every query such that it took the form:


/* CodeFile-LineNo-UserId */ SELECT stored_procedure(arg1, arg2);

for all queries - so the caller information was encoded in the query  
info... Therefore, inside 'stored_procedure', logging the value of  
current_query() was perfect to satisfy our audit requirements, and we  
can just log the current query when we enter 'stored_procedure'.


Hope this helps to clarify that, whilst the current mechanism isn't  
in any way perfect - there are a number of use cases for including  
the functionality as-is.


Cheers
Tom


Cheers
Tom


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Alvaro Herrera
Tomas Doran wrote:

> On 28 Mar 2008, at 17:23, Bruce Momjian wrote:

>> Perhaps we could name it received_query() to indicate it is what the
>> backend received and it not necessarily the _current_ query.
>
> reveived_query() sounds like a very sane name for me, and documenting it 
> as such would allow you to expose the functionality without the possible 
> complaints...

client_query perhaps?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Bruce Momjian
Alvaro Herrera wrote:
> Tomas Doran wrote:
> 
> > On 28 Mar 2008, at 17:23, Bruce Momjian wrote:
> 
> >> Perhaps we could name it received_query() to indicate it is what the
> >> backend received and it not necessarily the _current_ query.
> >
> > reveived_query() sounds like a very sane name for me, and documenting it 
> > as such would allow you to expose the functionality without the possible 
> > complaints...
> 
> client_query perhaps?

Yea, that is consistent with what we do with other functions.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Heikki Linnakangas

Gregory Stark wrote:

I want to know if we're interested in the more invasive patch restructuring
the buffer manager. My feeling is that we probably are eventually. But I
wonder if people wouldn't feel more comfortable taking baby steps at first
which will have less impact in cases where it's not being heavily used.

It's a lot more work to do the invasive patch and there's not much point in
doing it if we're not interested in taking it when it's ready.


I like the simplicity of this patch. My biggest worry is the impact on 
performance when the posix_fadvise calls are not helping. I'd like to 
see some testing of that. How expensive are the extra bufmgr hash table 
lookups, compared to all the other stuff that's involved in a bitmap 
heap scan?


Another thing I wonder is whether this approach can easily be extended 
to other types of scans than bitmap heap scans. Normal index scans seem 
most interesting; they can generate a lot of random I/O. I don't see any 
big problems there, except again the worst-case performance: If you 
issue AdviseBuffer calls for all the heap tuples in an index page, in 
the naive way, you can issue hundreds of posix_fadvise calls. But what 
if they're all actually on the same page? Surely you only want to go 
through the buffer manager once (like we do in the scan, thanks to 
ReleaseAndReadBuffer()) in that case, and only call posix_fadvise once. 
But again, maybe you can convince me that it's a non-issue by some 
benchmarking.



Here's an updated patch. It's mainly just a CVS update but it also includes
some window dressing: an autoconf test, some documentation, and some fancy
arithmetic to auto-tune the amount of prefetching based on a more real-world
parameter "effective_spindle_count". 


I like the "effective_spindle_count". That's very intuitive.

The formula to turn that into preread_pages looks complex, though. I can 
see the reasoning behind it, but I doubt thing behave that beautifully 
in the real world. (That's not important right now, we have plenty of 
time to tweak that after more testing.)



It also includes printing out how much
the prefetching target got ramped up to in the explain output -- though I'm
not sure we really want that in the tree, but it's nice for performance
testing.


I don't understand the ramp up logic. Can you explain what that's for 
and how it works?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Commitfest patches

2008-03-28 Thread Martijn van Oosterhout
On Fri, Mar 28, 2008 at 05:34:30PM +, Gregory Stark wrote:
> But what I really need is someone to read the patch and say "looks good" or
> point out things they don't like. In particular, what I really, really want is
> some guidance on the singular key question I asked.

I was going to write all sorts of stuff, till I noticed Heikki said
basically everything I was going to say:

- I think normal index scans could benefit from this (it was measurable
when I was playing with AIO in postgres a few years back).
- The integration with the bitmap scan is good, neat even
- I think the number of preread_count is far too small, given you get a
benefit even if you only have one spindle.
- I don't understand the ramp-up method either.

People spend a lot of time worrying about hundreds of posix_fadvise()
calls but you don't need anywhere near that much to be effective. With
AIO I limited the number of outstanding requests to a dozen and it was
still useful. You lose nothing by capping the number of requests at any
point.

> I want to know if we're interested in the more invasive patch restructuring
> the buffer manager. My feeling is that we probably are eventually. But I
> wonder if people wouldn't feel more comfortable taking baby steps at first
> which will have less impact in cases where it's not being heavily used.

I think the way it is now is neat and simple and enough for now.

Have a nice day,
-- 
Martijn van Oosterhout   <[EMAIL PROTECTED]>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while 
> boarding. Thank you for flying nlogn airlines.


signature.asc
Description: Digital signature


Re: [HACKERS] Transaction Snapshot Cloning

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 11:33 -0400, Chris Browne wrote:
> [EMAIL PROTECTED] (Bruce Momjian) writes:
> > Added to TODO:
> >
> > * Allow one transaction to see tuples using the snapshot of another
> >   transaction
> >
> >   This would assist multiple backends in working together.
> >   http://archives.postgresql.org/pgsql-hackers/2008-01/msg00400.php
> 
> FYI, code for this is presently available on pgFoundry, at:
>.
> 
> I have some benchmarking scripts to commit to it, once I get added to
> the project. (Simon?  :-))
> 
> FYI, preliminary testing, on a machine hooked up to an EMC CX700 disk
> array is showing that, if I do concurrent dumps of 4 tables, it gives
> a bit better than a 2x speedup over dumping the four tables serially,
> so there's definitely some fruit here.

Oh, very cool. Thanks for testing. 

I'll add you now - must have missed that, sorry.

My view is that the TODO item is still needed because we want to work
this into the backend more fully.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Simon Riggs
On Fri, 2008-03-28 at 14:32 -0400, Bruce Momjian wrote:
> Alvaro Herrera wrote:
> > Tomas Doran wrote:
> > 
> > > On 28 Mar 2008, at 17:23, Bruce Momjian wrote:
> > 
> > >> Perhaps we could name it received_query() to indicate it is what the
> > >> backend received and it not necessarily the _current_ query.
> > >
> > > reveived_query() sounds like a very sane name for me, and documenting it 
> > > as such would allow you to expose the functionality without the possible 
> > > complaints...
> > 
> > client_query perhaps?
> 
> Yea, that is consistent with what we do with other functions.

How about client_request()

It's then clear that a request can be made up of many statements, which
will be executed in turn.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text <-> C string

2008-03-28 Thread Tom Lane
"Brendan Jurd" <[EMAIL PROTECTED]> writes:
> On 26/03/2008, Tom Lane <[EMAIL PROTECTED]> wrote:
>> There are no textout/textin calls left, but I may have missed some
>> places that were doing it the hard way with direct palloc/memcpy
>> manipulations.  It might be worth trolling all the VARDATA() references
>> to see if any more are easily replaceable.

> I think there are perhaps a dozen such sites, and I hope to submit a
> cleanup patch that gets rid of these soon.
> I did come across one site I'm not sure about in utils/adt/xml.c:291

I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCHES] [HACKERS] Text <-> C string

2008-03-28 Thread Brendan Jurd
On 29/03/2008, Tom Lane <[EMAIL PROTECTED]> wrote:
> I intentionally didn't touch xml.c, nor anyplace that is not dealing
>  in text, even if it happens to be binary-compatible with text.
>

Hmm, okay.  My original submission did include a few such changes; for
example, in xml_in and xml_out_internal I saw that the conversion
between xmltype and cstring was identical to the text conversion, so I
went ahead and used the text functions.  Feedback upthread suggested
that it was okay to go ahead with casting in identical cases. [1]

I saw that these changes made it into the commit, so I assumed that it
was the right call.

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.

Regards,
BJ

[1] http://archives.postgresql.org/pgsql-hackers/2007-09/msg01094.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is psql command line interface broken on HEAD?

2008-03-28 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> When I try run psql with a option on HEAD I get following message:
> -bash-3.2$ /var/tmp/pg84_upg/bin/psql template1 -t
> psql: FATAL:  role "-t" does not exist

That has never been considered supported.  Some versions of getopt,
on some platforms, will rearrange the order of words on the command
line in order to make it "work" (for small values of "work" IMHO).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCHES] Implemented current_query

2008-03-28 Thread Bruce Momjian
Simon Riggs wrote:
> On Fri, 2008-03-28 at 14:32 -0400, Bruce Momjian wrote:
> > Alvaro Herrera wrote:
> > > Tomas Doran wrote:
> > > 
> > > > On 28 Mar 2008, at 17:23, Bruce Momjian wrote:
> > > 
> > > >> Perhaps we could name it received_query() to indicate it is what the
> > > >> backend received and it not necessarily the _current_ query.
> > > >
> > > > reveived_query() sounds like a very sane name for me, and documenting 
> > > > it 
> > > > as such would allow you to expose the functionality without the 
> > > > possible 
> > > > complaints...
> > > 
> > > client_query perhaps?
> > 
> > Yea, that is consistent with what we do with other functions.
> 
> How about client_request()
> 
> It's then clear that a request can be made up of many statements, which
> will be executed in turn.

The problem with client_request() is that it is not clear it is a query
--- it could be a disonnection or cancel request, for example.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Third thoughts about the DISTINCT MAX() problem

2008-03-28 Thread Tom Lane
I just realized that the patch I applied here
http://archives.postgresql.org/pgsql-committers/2008-03/msg00531.php
for Taiki Yamaguchi's bug report here
http://archives.postgresql.org/pgsql-bugs/2008-03/msg00275.php
really doesn't work.  It assumes that an ungrouped aggregate
query can't return more than one row, which is true in straight
SQL ... but it's not true if you consider SRFs in the target list.
CVS HEAD gives the wrong answer for this example in the regression
database:

regression=# select max(unique1), generate_series(1,3) as g from tenk1 order by 
g desc;
 max  | g 
--+---
  | 1
  | 2
  | 3
(3 rows)

because it wrongly supposes it can discard the ORDER BY.

So, back to the drawing board.  I had thought of two approaches to
fixing the problem instead of just dodging it.  Plan A was to
apply planagg.c's Aggref->Param substitution inside EquivalenceClasses,
as in the draft patch here:
http://archives.postgresql.org/pgsql-patches/2008-03/msg00388.php
which I didn't entirely like for reasons mentioned in that post.
Plan B was to try to revert to the way sort clause matching was
done pre-8.3, that is have make_sort_from_pathkeys check first
for a matching ressortgroupref tag before it goes looking for equal()
expressions.  I had actually tried to do that first but got hung
up on the problem of identifying the correct sort operator ---
just taking the exposed type of the targetlist entry doesn't always
work, because of binary-compatible cases (eg, tlist entry may say
it yields varchar but we need to find the text opclass).  Perhaps
thinking a bit harder would yield a solution though.

Does anyone have comments for or against either of these approaches,
or perhaps a Plan C to consider?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Third thoughts about the DISTINCT MAX() problem

2008-03-28 Thread Tom Lane
I wrote:
> Plan B was to try to revert to the way sort clause matching was
> done pre-8.3, that is have make_sort_from_pathkeys check first
> for a matching ressortgroupref tag before it goes looking for equal()
> expressions.  I had actually tried to do that first but got hung
> up on the problem of identifying the correct sort operator ---
> just taking the exposed type of the targetlist entry doesn't always
> work, because of binary-compatible cases (eg, tlist entry may say
> it yields varchar but we need to find the text opclass).  Perhaps
> thinking a bit harder would yield a solution though.

Ah, got it.  I had previously attached a sortref to EquivalenceClasses
(ec_sortref) to deal properly with multiple textually-identical but
volatile expressions.  But that's really the wrong thing: sortrefs
should be attached to individual EquivalenceMembers, instead.  If
we do that then the existing logic in make_sort_from_pathkeys can be
made to use the sortref as a preferred (and faster) way of identifying
which tlist member matches a pathkey.

This implies a change in the EquivalenceClass data structures,
but those are never dumped to disk so it's not a problem to back-patch.

Note: we still need to be able to match on equal() since we may have
to deal with sort keys that are not any of the explicit ORDER BY
expressions and hence don't have a sortref.  But that's not a problem
for the MIN/MAX optimization since the only things left to do after
planagg.c are apply explicit ORDER BY or DISTINCT operations.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Third thoughts about the DISTINCT MAX() problem

2008-03-28 Thread Gurjeet Singh
On Sat, Mar 29, 2008 at 3:21 AM, Tom Lane <[EMAIL PROTECTED]> wrote:

> I just realized that the patch I applied here
> http://archives.postgresql.org/pgsql-committers/2008-03/msg00531.php
> for Taiki Yamaguchi's bug report here
> http://archives.postgresql.org/pgsql-bugs/2008-03/msg00275.php
> really doesn't work.  It assumes that an ungrouped aggregate
> query can't return more than one row, which is true in straight
> SQL ... but it's not true if you consider SRFs in the target list.
> CVS HEAD gives the wrong answer for this example in the regression
> database:
>
> regression=# select max(unique1), generate_series(1,3) as g from tenk1
> order by g desc;
>  max  | g
> --+---
>   | 1
>   | 2
>   | 3
> (3 rows)
>
> because it wrongly supposes it can discard the ORDER BY.
>
> So, back to the drawing board.  I had thought of two approaches to
> fixing the problem instead of just dodging it.  Plan A was to
> apply planagg.c's Aggref->Param substitution inside EquivalenceClasses,
> as in the draft patch here:
> http://archives.postgresql.org/pgsql-patches/2008-03/msg00388.php
> which I didn't entirely like for reasons mentioned in that post.
> Plan B was to try to revert to the way sort clause matching was
> done pre-8.3, that is have make_sort_from_pathkeys check first
> for a matching ressortgroupref tag before it goes looking for equal()
> expressions.  I had actually tried to do that first but got hung
> up on the problem of identifying the correct sort operator ---
> just taking the exposed type of the targetlist entry doesn't always
> work, because of binary-compatible cases (eg, tlist entry may say
> it yields varchar but we need to find the text opclass).  Perhaps
> thinking a bit harder would yield a solution though.
>
> Does anyone have comments for or against either of these approaches,
> or perhaps a Plan C to consider?
>
>
>

In the past I had seen suggestions (perhaps from you) that we should
disallow SRFs in target list... Although not for 8.3, but would this be a
good time for 8.4 to deprecate the usage of SRFs in targetlist?

Best regards,
-- 
[EMAIL PROTECTED]
[EMAIL PROTECTED] gmail | hotmail | indiatimes | yahoo }.com

EnterpriseDB http://www.enterprisedb.com

Mail sent from my BlackLaptop device