date:20081029

Re: [HACKERS] pre-MED

2008-10-29 Thread Joshua D. Drake


Tom Lane wrote:

Hannu Krosing <[EMAIL PROTECTED]> writes:

On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:

Also, PL/Perl shouldn't be the only language to have this capability.
How might we add similar capabilities to PL/PythonU 



I'll look at adding this to pl/pythonu.


I would argue that it's already designed wrong if there's need for
PL-specific implementation effort.


I would agree. This should be a generic API.

Joshua D. Drake



regards, tom lane




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Pavel Stehule

2008/10/30 Robert Haas <[EMAIL PROTECTED]>:
>> All of this is pie-in-the-sky for PL functions, and I think properly so:
>> the whole reason for supporting PLs is to enable doing things that SQL
>> does poorly or not at all.  So expecting SQL to interoperate very
>> closely with them seems impossible, or at least unreasonably limiting.
>> The real issue at hand is what to do with SQL-language functions.
>>
>> I'm currently going to have a look at just what it would take to support
>> both lazy and eager evaluation in functions.c (independently of what
>> syntax, if any, we settle on to expose the choice to the user).  If it's
>> either really awful or really easy we should know that before arguing
>> further.
>
> It occurs to me that for PL/perl and similar one could design an
> interface that is similar to the one that is used for C functions -
> that is, function is invoked multiple times, returns one value per
> call, and is given a place to stash its state across calls.  For
> example, for PL/perl, you could pass a mutable empty hash reference on
> the first call and then pass the same hash reference back on each
> subsequent call.  That wouldn't require being able to freeze/thaw the
> whole state, just being able to maintain the contents of that hash
> reference across calls.
>
> It would probably be a lot more difficult to make something like this
> work usefully for PL/pgsql, which as a language is rather underpowered
> (nonetheless I use it heavily; it's awesome for the things it is good
> at), but I suspect it could be applied to Python, PHP, etc. pretty
> easily.
>
> So that's at least three ways you can evaluate the function: generate
> the whole thing in one fell swoop, single function call but with lazy
> execution, or value-per-call mode.  I'm guessing someone could dream
> up other possibilities as well.  Now, who's volunteering to implement?
>  :-)
>

With session variables we could implement srf function in plpgsql like
current C srf function. Like

create or replace function foo()
returns record as $$
#option with_srf_context(datatype of srf context)
begin
   return row(...);
end;
$$ language plpgsql;

I thing it is implementable, but It's not very efective. There are lot
of initialisation code. But this technique is used for agregation
functions without problems. I belive, so it's should not be fast, but
it could be usefull for very large datasets, where current srf
functions should fail.

regards
Pavel Stehule



> ...Robert
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread David Fetter

On Wed, Oct 29, 2008 at 10:23:36PM -0400, Tom Lane wrote:
> Hannu Krosing <[EMAIL PROTECTED]> writes:
> > On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:
> >> Also, PL/Perl shouldn't be the only language to have this
> >> capability.  How might we add similar capabilities to PL/PythonU 
> 
> > I'll look at adding this to pl/pythonu.
> 
> I would argue that it's already designed wrong if there's need for
> PL-specific implementation effort.

I'm not sure how else to do this.  The current implementation returns
char *, which doesn't translate uniformly into the PLs.  More
sophisticated implementations--a tree or forest structure including
ANDs and ORs, etc.--are even less uniform to translate into PLs, at
least as far as I can tell.

The way I'm looking at it, this could be added onto SPI at varying
degrees of sophistication, but there would still be PL-specific
bindings for it, each of which would involve a PL-specific
implementation effort.

What big things have I missed here?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] recovery_target_time and last completed transaction log time

2008-10-29 Thread Tom Lane

"Kevin Grittner" <[EMAIL PROTECTED]> writes:
> Regarding the LOG message, it is confusing to have a recovery.conf
> file which contains this:
> recovery_target_time = '2008-10-16 17:00:00.0'
> and see this in the log:
> [2008-10-16 23:04:26.006 CDT] 19951 LOG:  last completed transaction
> was at log time 2008-10-16 17:00:23.205347-05

Actually, I think this *is* a bug: recoveryLastXTime is being set in
entirely the wrong place, ie, where we are *considering* whether to
apply a commit rather than after we have decided to do so.  The
result is just a misleading log message, so not too terrible, but
we should fix it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Please make sure your patches are on the wiki page

2008-10-29 Thread Josh Berkus

On Wednesday 29 October 2008 20:12, Tom Lane wrote:
> Earlier today I had a different thought about how to sort things early
> in the fest.  I think that there is a strong temptation to finish off
> the "simple" patches quickly so as to reduce the size of the list ---
> I know I've done that and I think others have too.

Actually, I'd really like it if you and the other advanced hackers ignored the 
simple patches.  I've got a list of 6 new reviewers, and I'd like to be able 
to give them those patches to review -- they generally can't help with the 
hard ones.

-- 
Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread Tom Lane

"David Blewett" <[EMAIL PROTECTED]> writes:
> Here's a vote for allowing this in plain SQL.

> I use the tablefunc contrib module as a way to build a view of a
> specific questionnaire's responses (using Elein's nice model here
> [1]). Currently, if I then write queries against these views that
> include WHERE clauses they don't perform very well as the underlying
> data size grows. I was using the afore-mentioned large view that casts
> everything to text, but recently I started using separate calls to the
> crosstab function for each underlying table, then joining them
> together based on their response ID. This seems to work much better
> for more complex queries, but I think it would still be beneficial to
> have access to these qualifiers so I could push down to each subquery
> the list of response ID's to pull. I don't have access to sample SQL
> at the moment, but if it is wanted I can try to get that this week.

Please.  Some real use-cases would be very helpful here.  I'm
particularly wondering whether the proposed deparse call actually yields
anything that's useful without extensive additional knowledge about
the query ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I think I like this best of all the suggestions - 
> suppress_redundant_updates_trigger() is what I have now.

> If there's no further discussion, I'll go ahead and commit this in a day 
> or two.

The documentation seems a bit lacking: it gives neither a hint of why
you might want to use this nor why it's not the built-in behavior.
Suggest expending a sentence or two pointing out that the trigger takes
nonzero execution time to do its comparisons, and that this may or may
not be repaid by eliminated updates, depending on whether the client
applications are actually in the habit of issuing useless update
commands.

I think you're missing an  item for the function name, also.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Please make sure your patches are on the wiki page

2008-10-29 Thread Tom Lane

Josh Berkus <[EMAIL PROTECTED]> writes:
>> (2) sorting the pending patches by complexity or subject matter

> Sorting them by complexity would be great, if I thought I could do it.  I'm 
> not sure I can.

We organized them by subject matter (or code area, really) in a couple
of the earlier fests.  I thought that was helpful then.  On the other
hand, the September fest didn't seem to break down that way.  Until we
see the final list it's hard to say how November will shake out.

Earlier today I had a different thought about how to sort things early
in the fest.  I think that there is a strong temptation to finish off
the "simple" patches quickly so as to reduce the size of the list ---
I know I've done that and I think others have too.  The trouble with
simple-first is that problematic patches get left till later, which
means that their authors don't have as much time to respond to any
criticisms that may ultimately be forthcoming.  I think it'd be a good
idea to intentionally try to focus on difficult patches early, so that
they can be bounced back to their authors with useful criticism while
there's still time to do something in response.  Not sure about details
of this, but seems like a process issue that we ought to consider.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)

2008-10-29 Thread KaiGai Kohei


Simon Riggs wrote:

On Wed, 2008-10-29 at 17:42 +0900, KaiGai Kohei wrote:


I've updated my patches, these are ready for CommitFest:Nov.

[1/6] 
http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch
[2/6] 
http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch
[3/6] 
http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch
[4/6] 
http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch
[5/6] 
http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch
[6/6] 
http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch

The comprehensive documentation for SE-PostgreSQL is here:
   http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.)

List of updates:
- Patches are rebased to the latest CVS HEAD.
- bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class
- bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be
   adjusted by st_mode.

Request for Comments:
- The 4th patch is actually needed? It can be replaced by wiki page.
- Do you think anything remained towards the final CommitFest?
- Do you have any reviewing comment? Most of patches are unchanged from
   the previous vesion. If you can comment anything, I can fix them without
   waiting for the final commit fest.



I'm copying some general comments from my contact here, verbatim. Other
comments have been requested and may be forthcoming:

By way of background "Common Criteria" (ISO Standard 15408) are in
effect pre-defined security requirements that have been agreed between
multiple friendly governments so that they can share the results from
independent lab work in each country and avoid the costs and duplication
of effort.   The published lab work results in two outputs: 
- a "Target of Evaluation" (TOE) i.e. tight definition of the software

version, configuration and environment (hardware, external controls)
which was the subject of the evaluation 
- an "Evaluation Report" which, in the "happy case" has assigns an

"Evaluation Assurance Level" (EAL) number to the product (which needless
to say is only valid if the product is used in its TOE 


If you're interested in reading more about formal Government security
evaluation schemes, these are some good sites: 


Thanks for your information.
However, I've also followed the Common Criteria for a few years, and
some of facilities came from its requirements. The "security_context"
system column reflects the requirement of labeled import/export, for
example. Don't worry.

Let's move our discussion into its implementation in the upcoming
CommitFest. It's a good time now.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> Not sure that's appropriate, but I can't see anything else that is 
>> very appropriate either.

> The plpgsql code uses errcode(ERRCODE_FEATURE_NOT_SUPPORTED) for this 
> situation, so I guess we should be consistent with that.

TRIGGERED_DATA_CHANGE_VIOLATION is most certainly NOT an appropriate
code here --- it's talking about invalid database content states.

The RI triggers use ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED for these
sorts of conditions, and I think that's probably best practice.  See
ri_CheckTrigger() in particular.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Please make sure your patches are on the wiki page

2008-10-29 Thread Josh Berkus

Robert,

> (1) moving all of the patches committed prior to 11/1 to a separate
> section or page

Why?

> (2) sorting the pending patches by complexity or subject matter

Sorting them by complexity would be great, if I thought I could do it.  I'm 
not sure I can.

-- 
Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread Tom Lane

Hannu Krosing <[EMAIL PROTECTED]> writes:
> On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:
>> Also, PL/Perl shouldn't be the only language to have this capability.
>> How might we add similar capabilities to PL/PythonU 

> I'll look at adding this to pl/pythonu.

I would argue that it's already designed wrong if there's need for
PL-specific implementation effort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BufferAccessStrategy for bulk insert

2008-10-29 Thread Simon Riggs


On Wed, 2008-10-29 at 21:58 -0400, Robert Haas wrote:
> > If you say its a loss you should publish timings to support that. Using
> > a BAS for VACUUM was a performance gain, not a loss.
> 
> Well, I can dig up and publish the timings from my laptop, but I'm not
> sure where that will get us.  Trust me, the numbers were higher with
> BAS, otherwise I wouldn't be worrying about this.  But I pretty much
> doubt anyone cares how my laptop runs PostgreSQL anyway, which is why
> I think someone should test this on good hardware and see what happens
> there.  The only change I made to disable the BAS was a one-line
> change in GetBulkInsertState to replace BAS_BULKWRITE with BAS_NORMAL,
> so it should be easy for someone to try it both ways.
> 
> Not at any point in the development of this patch was I able to match
> the 15-17% copy speedup, 20% CTAS speedup that you cited with your
> original email.  I did get speedups, but they were considerably
> smaller.  So either my testing methodology is different, or my
> hardware is different, or there is something wrong with my patch.  I
> don't think we're going to find out which it is until someone other
> than me looks at this.
> 
> In any event, VACUUM is a read-write workload, and specifically, it
> tends to write pages that have been written by other writers, and are
> therefore potentially already in shared buffers.  COPY and CTAS are
> basically write-only workloads, though with COPY on an existing table
> the FSM might guide you to free space on a page already in shared
> buffers, or you might find an index page you need there.  Still, if
> you are doing a large bulk data load, those effects are probably
> pretty small.  So, the profile is somewhat.
> 
> I'm not really trying to argue that the BAS is a bad idea, but it is
> certainly true that I do not have the data to prove that it is a good
> idea.

You should try profiling the patch. You can count the invocations of the
buffer access routines to check its all working in the right ratios.

Whatever timings you have are worth publishing. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Please make sure your patches are on the wiki page

2008-10-29 Thread Robert Haas

I wonder if we should consider:

(1) moving all of the patches committed prior to 11/1 to a separate
section or page
(2) sorting the pending patches by complexity or subject matter

...Robert

On Wed, Oct 29, 2008 at 5:26 PM, Josh Berkus <[EMAIL PROTECTED]> wrote:
> Patch submitters,
>
> Please make sure your patches are on the November CommitFest wiki page, with
> correct and updated links.
>
> http://wiki.postgresql.org/wiki/CommitFest_2008-11
>
> --
> Josh Berkus
> PostgreSQL
> San Francisco
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BufferAccessStrategy for bulk insert

2008-10-29 Thread Robert Haas

> If you say its a loss you should publish timings to support that. Using
> a BAS for VACUUM was a performance gain, not a loss.

Well, I can dig up and publish the timings from my laptop, but I'm not
sure where that will get us.  Trust me, the numbers were higher with
BAS, otherwise I wouldn't be worrying about this.  But I pretty much
doubt anyone cares how my laptop runs PostgreSQL anyway, which is why
I think someone should test this on good hardware and see what happens
there.  The only change I made to disable the BAS was a one-line
change in GetBulkInsertState to replace BAS_BULKWRITE with BAS_NORMAL,
so it should be easy for someone to try it both ways.

Not at any point in the development of this patch was I able to match
the 15-17% copy speedup, 20% CTAS speedup that you cited with your
original email.  I did get speedups, but they were considerably
smaller.  So either my testing methodology is different, or my
hardware is different, or there is something wrong with my patch.  I
don't think we're going to find out which it is until someone other
than me looks at this.

In any event, VACUUM is a read-write workload, and specifically, it
tends to write pages that have been written by other writers, and are
therefore potentially already in shared buffers.  COPY and CTAS are
basically write-only workloads, though with COPY on an existing table
the FSM might guide you to free space on a page already in shared
buffers, or you might find an index page you need there.  Still, if
you are doing a large bulk data load, those effects are probably
pretty small.  So, the profile is somewhat.

I'm not really trying to argue that the BAS is a bad idea, but it is
certainly true that I do not have the data to prove that it is a good
idea.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of PITR performance improvement for 8.4.

2008-10-29 Thread Simon Riggs


On Thu, 2008-10-30 at 09:46 +0900, Koichi Suzuki wrote:

> I'm not sure if blocks reffered from all WAL records in single WAL
> segment can fit kernel cache.   This is why current pg_readahead
> returns the last LSN and require starting LSN.   So far, with FPW, it
> seems that we can prefetch all the pages in a WAL segment.   So it
> will be okay for archive log compressed with pg_compresslog.   I'll
> test if it's okay in the case full_page_writes=off.

I'd prefer to be able to specify max_readahead_pages than have to
control things at a micro level like that. If you have lots of memory
you can set that higher.

> Anyway, I'd like to keep my proposal for 8.4 and continue the test and
> evaluation to report to the mailing list.
> 
> I'll also change the whole code to run in the core.

OK, I quite liked the idea of a separate program. That allows it to work
with 8.3 as well as 8.4. No problem with it being in core at all.

As ever, good thinking, good patch.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of PITR performance improvement for 8.4.

2008-10-29 Thread Koichi Suzuki

Hi,


2008/10/29 Simon Riggs <[EMAIL PROTECTED]>:
>
> On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote:
>
>> 1. You should avoid useless posix_fadvise() calls. In the naive
>> implementation, where you simply call posix_fadvise() for every page
>> referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls
>> per WAL record, and that's a lot of overhead. We face the same design
>> question as with Greg's patch to use posix_fadvise() to prefetch index
>> and bitmap scans: what should the interface to the buffer manager look
>> like? The simplest approach would be a new function call like
>> AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the
>> page if it's not in the buffer cache, but is a no-op otherwise. But that
>> means more overhead, since for every page access, we need to find the
>> page twice in the buffer cache; once for the AdviseBuffer() call, and
>> 2nd time for the actual ReadBuffer().
>
> That's a much smaller overhead than waiting for an I/O. The CPU overhead
> isn't really a problem if we're I/O bound.

As disccused last year about parallel recovery and random read
problem,  recovery is really I/O bound, especially when FPW is not
available.   And it is not practical to ask all the archive logs to
include huge FPWs.

>
>> It would be more efficient to pin
>> the buffer in the AdviseBuffer() call already, but that requires much
>> more changes to the callers.
>
> That would be hard to cleanup safely, plus we'd have difficulty with
> timing: is there enough buffer space to allow all the prefetched blocks
> live in cache at once? If not, this approach would cause problems.

I'm not positive to AdviseBuffer() adea.   If we do this, we need all
the pages reffered from a WAL segment in the shared buffer.   This may
be several GB and will compete with kernel cache.   Current
PostgreSQL highly relies on kernel cache (and kernel I/O schedule) and
it is not a good idea to have much shared buffer.   The worst case is
to spare half of the physical memory to the shared buffer.   The
performance will be very bad. Rather, I prefer to ask kernel to
prefetch.

>
>> 2. The format of each WAL record is different, so you need a "readahead
>> handler" for every resource manager, for every record type. It would be
>> a lot simpler if there was a standardized way to store that information
>> in the WAL records.
>
> I would prefer a new rmgr API call that returns a list of blocks. That's
> better than trying to make everything fit one pattern. If the call
> doesn't exist then that rmgr won't get prefetch.

Yes, I'd like this idea.   Could you let me try this API through
prefetch implementation in the core (if it is agreed)?

>
>> 3. IIRC I tried to handle just a few most important WAL records at
>> first, but it turned out that you really need to handle all WAL records
>> (that are used at all) before you see any benefit. Otherwise, every time
>> you hit a WAL record that you haven't done posix_fadvise() on, the
>> recovery "stalls", and you don't need much of those to diminish the gains.
>>
>> Not sure how these apply to your approach, it's very different. You seem
>> to handle 1. by collecting all the page references for the WAL file, and
>> sorting and removing the duplicates. I wonder how much CPU time is spent
>> on that?
>
> Removing duplicates seems like it will save CPU.

If we invoke posix_fadvise() to the blocks already in the kernel
cache, this call will just do nothing but consume some overhead in the
kernel.   I think duplicate removal saves more.

>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>



-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of PITR performance improvement for 8.4.

2008-10-29 Thread Koichi Suzuki

I'm not sure if blocks reffered from all WAL records in single WAL
segment can fit kernel cache.   This is why current pg_readahead
returns the last LSN and require starting LSN.   So far, with FPW, it
seems that we can prefetch all the pages in a WAL segment.   So it
will be okay for archive log compressed with pg_compresslog.   I'll
test if it's okay in the case full_page_writes=off.

Anyway, I'd like to keep my proposal for 8.4 and continue the test and
evaluation to report to the mailing list.

I'll also change the whole code to run in the core.

---
Koichi Suzuki

2008/10/29 Simon Riggs <[EMAIL PROTECTED]>:
>
> On Wed, 2008-10-29 at 09:55 +0900, Koichi Suzuki wrote:
>
>> I'd like to hear some more about these.   I'm more than happy to write
>> all the code inside PG core to avoid overhead to create another
>> process.
>
> Having an external program can help earlier releases also, so I think
> this is the right approach for now.
>
> In next PG release we should bring this into core, along with streaming.
>
> Interface would be better if it accepted
>   pg_readahead filename
> or pg_readahead filename start-lsn
>
> We don't always need a starting lsn.
>
> --
>  Simon Riggs   www.2ndQuadrant.com
>  PostgreSQL Training, Services and Support
>
>

-- 
--
Koichi Suzuki

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Robert Haas

> All of this is pie-in-the-sky for PL functions, and I think properly so:
> the whole reason for supporting PLs is to enable doing things that SQL
> does poorly or not at all.  So expecting SQL to interoperate very
> closely with them seems impossible, or at least unreasonably limiting.
> The real issue at hand is what to do with SQL-language functions.
>
> I'm currently going to have a look at just what it would take to support
> both lazy and eager evaluation in functions.c (independently of what
> syntax, if any, we settle on to expose the choice to the user).  If it's
> either really awful or really easy we should know that before arguing
> further.

It occurs to me that for PL/perl and similar one could design an
interface that is similar to the one that is used for C functions -
that is, function is invoked multiple times, returns one value per
call, and is given a place to stash its state across calls.  For
example, for PL/perl, you could pass a mutable empty hash reference on
the first call and then pass the same hash reference back on each
subsequent call.  That wouldn't require being able to freeze/thaw the
whole state, just being able to maintain the contents of that hash
reference across calls.

It would probably be a lot more difficult to make something like this
work usefully for PL/pgsql, which as a language is rather underpowered
(nonetheless I use it heavily; it's awesome for the things it is good
at), but I suspect it could be applied to Python, PHP, etc. pretty
easily.

So that's at least three ways you can evaluate the function: generate
the whole thing in one fell swoop, single function call but with lazy
execution, or value-per-call mode.  I'm guessing someone could dream
up other possibilities as well.  Now, who's volunteering to implement?
 :-)

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Please make sure your patches are on the wiki page

2008-10-29 Thread Josh Berkus

Patch submitters,

Please make sure your patches are on the November CommitFest wiki page, with 
correct and updated links.

http://wiki.postgresql.org/wiki/CommitFest_2008-11

-- 
Josh Berkus
PostgreSQL
San Francisco

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread Hannu Krosing

On Wed, 2008-10-29 at 10:33 -0700, Joshua D. Drake wrote:
> On Wed, 2008-10-29 at 19:17 +0200, Hannu Krosing wrote:
> > On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:
> > > Folks,
> > > 
> > > Please find enclosed a WIP patch to add the ability for functions to
> > > see the qualifiers of the query in which they're called.  It's not
> > > working just yet, and I'm not sure how best to get it working, but I'd
> > > like to see this as part of 8.4, as SQL/MED is just way too ambitious
> > > given the time frame.
> > > 
> > > Any tips, hints, pointers, etc. would be much appreciated.
> > > 
> > > Also, PL/Perl shouldn't be the only language to have this capability.
> > > How might we add similar capabilities to PL/PythonU 
> > 
> > I'll look at adding this to pl/pythonu. I have to finish some stuff
> > there before freeze anyway.
> 
> Have we tested plpython with version 3?

If you mean python 3.0 , then no, at least I have not tested it yet. not
even 2.6

-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BufferAccessStrategy for bulk insert

2008-10-29 Thread Simon Riggs


On Tue, 2008-10-28 at 23:45 -0400, Robert Haas wrote:

> One concern that I have about this approach is that the situation in
> which people are probably most concerned about COPY performance is
> restoring a dump.  In that case, the COPY will be the only thing
> running, and using a BufferAccessStrategy is an anti-optimization.  I
> don't think it's a very big effect (any testing anyone can do on real
> hardware rather than what I have would be appreciated) but I'm sort of
> unsold of optimizing for what I believe to be the less-common use
> case.  If the consensus is to reverse course on this point I'm happy
> to rip those changes back out and resubmit; they are a relatively
> small proportion of the patch.

Having COPY use a BAS is mainly to ensure it doesn't swamp the cache.
Which is a gain in itself.

If you say its a loss you should publish timings to support that. Using
a BAS for VACUUM was a performance gain, not a loss.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block-level CRC checks

2008-10-29 Thread Alvaro Herrera

Alvaro Herrera wrote:

> 3. the bgwriter is seen writing WAL entries at checkpoint.  At shutdown,
> this might cause an error to be reported on how there was not supposed
> to be activity on the log.  I didn't save the exact error report and I
> can't find it in the source :-(

LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating connection due to administrator command
LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  INSERT @ 0/67F05F0: prev 0/67F05C0; xid 0: Heap2 - hintbits: rel 
1663/16384/1259; blk 4
CONTEXT:  writing block 4 of relation 1663/16384/1259
LOG:  xlog flush request 0/67F06C0; write 0/0; flush 0/0
CONTEXT:  writing block 4 of relation 1663/16384/1259
LOG:  INSERT @ 0/67F06C0: prev 0/67F05F0; xid 0: Heap2 - hintbits: rel 
1663/16384/2608; blk 40
CONTEXT:  writing block 40 of relation 1663/16384/2608
LOG:  xlog flush request 0/67F0708; write 0/67F06C0; flush 0/67F06C0
CONTEXT:  writing block 40 of relation 1663/16384/2608
LOG:  INSERT @ 0/67F0708: prev 0/67F06C0; xid 0: Heap2 - hintbits: rel 
1663/16384/1249; blk 29
CONTEXT:  writing block 29 of relation 1663/16384/1249
LOG:  xlog flush request 0/67F0808; write 0/67F0708; flush 0/67F0708
CONTEXT:  writing block 29 of relation 1663/16384/1249
LOG:  INSERT @ 0/67F0808: prev 0/67F0708; xid 0: XLOG - checkpoint: redo 
0/67F05F0; tli 1; xid 0/9093; oid 90132; multi 1; offset 0; shutdown
LOG:  xlog flush request 0/67F0850; write 0/67F0808; flush 0/67F0808
PANIC:  concurrent transaction log activity while database system is shutting 
down
LOG:  background writer process (PID 17411) was terminated by signal 6: Aborted

I am completely at a loss what to do here.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL network discovery

2008-10-29 Thread Tom Lane

"Gevik Babakhani" <[EMAIL PROTECTED]> writes:
> I asked this question because I have a situation where a service like this
> would be very useful. If such a functionality would be accepted by the core
> team, I am willing to work on it.

The existing code uses APIs that Apple has deprecated, so it's going to
need to be fixed sooner or later.  The issue isn't whether to fix it,
it's how to not introduce any unwanted dependencies (like threading...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Andrew Dunstan




Andrew Dunstan wrote:



David Fetter wrote:


Maybe we should fix our C sample trigger, from which this was taken.



Yes :)

Does the attached have the right error code?

-elog(ERROR, "trigf: not called by trigger manager");
+ereport(ERROR,
+(error(TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("trigf: not called by trigger manager")));

  


Not sure that's appropriate, but I can't see anything else that is 
very appropriate either.



The plpgsql code uses errcode(ERRCODE_FEATURE_NOT_SUPPORTED) for this 
situation, so I guess we should be consistent with that.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)

2008-10-29 Thread Simon Riggs


On Wed, 2008-10-29 at 17:42 +0900, KaiGai Kohei wrote:

> I've updated my patches, these are ready for CommitFest:Nov.
> 
> [1/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch
> [2/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch
> [3/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch
> [4/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch
> [5/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch
> [6/6] 
> http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch
> 
> The comprehensive documentation for SE-PostgreSQL is here:
>http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.)
> 
> List of updates:
> - Patches are rebased to the latest CVS HEAD.
> - bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class
> - bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be
>adjusted by st_mode.
> 
> Request for Comments:
> - The 4th patch is actually needed? It can be replaced by wiki page.
> - Do you think anything remained towards the final CommitFest?
> - Do you have any reviewing comment? Most of patches are unchanged from
>the previous vesion. If you can comment anything, I can fix them without
>waiting for the final commit fest.
> 

I'm copying some general comments from my contact here, verbatim. Other
comments have been requested and may be forthcoming:

By way of background "Common Criteria" (ISO Standard 15408) are in
effect pre-defined security requirements that have been agreed between
multiple friendly governments so that they can share the results from
independent lab work in each country and avoid the costs and duplication
of effort.   The published lab work results in two outputs: 
- a "Target of Evaluation" (TOE) i.e. tight definition of the software
version, configuration and environment (hardware, external controls)
which was the subject of the evaluation 
- an "Evaluation Report" which, in the "happy case" has assigns an
"Evaluation Assurance Level" (EAL) number to the product (which needless
to say is only valid if the product is used in its TOE 

If you're interested in reading more about formal Government security
evaluation schemes, these are some good sites: 

General 
http://www.commoncriteriaportal.org/ 

UK 
http://www.cesg.gov.uk/ 

Australia 
Defence Signals Directorate www.dsd.gov.au/infosec/ 

Canada 
Communications Security Establishment www.cse.dnd.ca 

France 
Direction Centrale de la Sécurité des Systèmes d'Information
www.ssi.gouv.fr/en/ 

Germany 
Bundesamt fur Sicherheit in der Informationstechnik www.bsi.bund.de 

Japan 
Japan Information Technology Security Evaluation and Certification
Scheme (JISEC) www.ipa.go.jp/security/jisec/jisec_e/index.html 

USA 
National Institute of Standards and Technologywww.nist.gov 
National Information Assurance Partnership (NIAP)
 www.nsa.gov/ia/industry/niap.cfm 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Andrew Dunstan




David Fetter wrote:


Maybe we should fix our C sample trigger, from which this was taken.



Yes :)

Does the attached have the right error code?

-elog(ERROR, "trigf: not called by trigger manager");
+ereport(ERROR,
+(error(TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("trigf: not called by trigger manager")));

  


Not sure that's appropriate, but I can't see anything else that is very 
appropriate either.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread David Fetter

On Wed, Oct 29, 2008 at 03:48:09PM -0400, Andrew Dunstan wrote:
>
>>> + /* make sure it's called as a trigger */
>>> + if (!CALLED_AS_TRIGGER(fcinfo))
>>> + elog(ERROR, "suppress_redundant_updates_trigger: must be called 
>>> as trigger");
>>
>> Shouldn't these all be ereport()?
>
> Good point.
>
> I'll fix them.
>
> Maybe we should fix our C sample trigger, from which this was taken.

Yes :)

Does the attached have the right error code?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/doc/src/sgml/trigger.sgml b/doc/src/sgml/trigger.sgml
index a3f17c9..69430ea 100644
--- a/doc/src/sgml/trigger.sgml
+++ b/doc/src/sgml/trigger.sgml
@@ -581,7 +581,9 @@ trigf(PG_FUNCTION_ARGS)
 
 /* make sure it's called as a trigger at all */
 if (!CALLED_AS_TRIGGER(fcinfo))
-elog(ERROR, "trigf: not called by trigger manager");
+ereport(ERROR,
+(error(TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("trigf: not called by trigger manager")));
 
 /* tuple to return to executor */
 if (TRIGGER_FIRED_BY_UPDATE(trigdata->tg_event))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL network discovery

2008-10-29 Thread Gevik Babakhani

I asked this question because I have a situation where a service like this
would be very useful. If such a functionality would be accepted by the core
team, I am willing to work on it.

> -Original Message-
> From: Alvaro Herrera [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 29, 2008 8:50 PM
> To: Gevik Babakhani
> Cc: 'PGSQL Hackers'
> Subject: Re: [HACKERS] PostgreSQL network discovery
> 
> Gevik Babakhani wrote:
> > Hi,
> > 
> > There is a nice little feature within MSSQL where it is possible to 
> > enumerate all MSSQL servers on the local network.
> > I wonder how this can be made possible with PG. Pinging every 
> > IP:PGPORT within the current subnet is one of the options 
> but I guess 
> > that would be just a bad solution.  Any thoughts?
> 
> We have rendezvous support too.  We need to update it to use 
> the newer Avahi library, but the one person who proposed 
> using the thread interface got scolded for that and fled :-)  
> The interface we'd need to use is complex and the patch would 
> be a lot bigger.
> 
> If you want to work on it, you're welcome to do so :-)
> 
> -- 
> Alvaro Herrera
> http://www.CommandPrompt.com/
> PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL network discovery

2008-10-29 Thread Gevik Babakhani

If I am not mistaken, it only works fine on OSX environments. (I might be
very wrong here) 

> -Original Message-
> From: Magnus Hagander [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, October 29, 2008 8:47 PM
> To: Gevik Babakhani
> Cc: 'PGSQL Hackers'
> Subject: Re: [HACKERS] PostgreSQL network discovery
> 
> Gevik Babakhani wrote:
> > Hi,
> > 
> > There is a nice little feature within MSSQL where it is possible to 
> > enumerate all MSSQL servers on the local network.
> > I wonder how this can be made possible with PG. Pinging every 
> > IP:PGPORT within the current subnet is one of the options 
> but I guess 
> > that would be just a bad solution.  Any thoughts?
> 
> Isn't the bonjour support supposed to do this?
> (Never used it myself though)
> 
> //Magnus


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Block-level CRC checks

2008-10-29 Thread Alvaro Herrera

Alvaro Herrera wrote:

> There are some gotchas in this patch:
> 
> 1. it does not consider hint bits other than the ones defined in htup.h.
> Some index AMs use hint bits to "kill" tuples (LP_DEAD mostly, I think).
> This means that CRCs will be broken for such pages when pages are torn.

The "other hint bits" are:

- LP_DEAD as used by the various callers of ItemIdMarkDead.
- PD_PAGE_FULL
- BTPageOpaque->btpo_flags and btpo_cycleid

All of them are changed with only SetBufferCommitInfoNeedsSave being
called afterwards.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL network discovery

2008-10-29 Thread Alvaro Herrera

Gevik Babakhani wrote:
> Hi,
> 
> There is a nice little feature within MSSQL where it is possible to
> enumerate all MSSQL servers on the local network. 
> I wonder how this can be made possible with PG. Pinging every IP:PGPORT
> within the current subnet is one of the options but I guess that would be
> just a bad solution.  Any thoughts?

We have rendezvous support too.  We need to update it to use the newer
Avahi library, but the one person who proposed using the thread
interface got scolded for that and fled :-)  The interface we'd need to
use is complex and the patch would be a lot bigger.

If you want to work on it, you're welcome to do so :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Andrew Dunstan




Alvaro Herrera wrote:

Andrew Dunstan escribió:

  

+ /* make sure it's called as a trigger */
+ if (!CALLED_AS_TRIGGER(fcinfo))
+ elog(ERROR, "suppress_redundant_updates_trigger: must be called as 
trigger");



Shouldn't these all be ereport()?

  


Good point.

I'll fix them.

Maybe we should fix our C sample trigger, from which this was taken.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL network discovery

2008-10-29 Thread Magnus Hagander

Gevik Babakhani wrote:
> Hi,
> 
> There is a nice little feature within MSSQL where it is possible to
> enumerate all MSSQL servers on the local network. 
> I wonder how this can be made possible with PG. Pinging every IP:PGPORT
> within the current subnet is one of the options but I guess that would be
> just a bad solution.  Any thoughts?

Isn't the bonjour support supposed to do this?
(Never used it myself though)

//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL network discovery

2008-10-29 Thread Gevik Babakhani

Hi,

There is a nice little feature within MSSQL where it is possible to
enumerate all MSSQL servers on the local network. 
I wonder how this can be made possible with PG. Pinging every IP:PGPORT
within the current subnet is one of the options but I guess that would be
just a bad solution.  Any thoughts?

Regards,
Gevik
http://www.truesoftware.net/gevik/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Alvaro Herrera

Andrew Dunstan escribió:

> + /* make sure it's called as a trigger */
> + if (!CALLED_AS_TRIGGER(fcinfo))
> + elog(ERROR, "suppress_redundant_updates_trigger: must be called as 
> trigger");

Shouldn't these all be ereport()?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum: I need some explanation

2008-10-29 Thread Noah Freire

Hi Alvaro,
On Wed, Oct 29, 2008 at 3:46 PM, Alvaro Herrera
<[EMAIL PROTECTED]>wrote:

> Noah Freire escribió:
>
> > please check the first log message: the vacuum threshold is 6,000,050
> rows
> > and the number of dead tuples is 16,697,969. Even though the number of
> > dead_tuples is greater than the threshold the autovacuum is not being
> > triggered for this table. So, besides this condition (dead_tuples >
> > threshold) what else is taken into account by autovacuum?
>
> That there's no other process doing the same.  Did you check
> pg_stat_activity to ensure that there's really no autovacuum worker
> processing this table?


 datid | datname | procpid | usesysid | usename | current_query | waiting |
xact_start | query_start | backend_start | client_addr | client_port
---+-+-+--+--+-+-+---+---+---+-+-
 45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM
public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29
11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | |

indeed; autovacuum is currently running for accounts. It is running for 1
hour but the number of dead rows keeps increasing. Apparently autovacuum
cannot pace the rate of updates on this table (please check the two
snapshots of pg_stat_user_tables below taken with a 2 seconds interval
between them). It would be better to run vacuum manually than using
autovacuum in this case?
pgbench=# select relname, n_dead_tup from pg_stat_user_tables where
relname='accounts';
 relname | n_dead_tup
--+
 accounts | 19917490
(1 row)

pgbench=# select relname, n_dead_tup from pg_stat_user_tables where
relname='accounts';
 relname | n_dead_tup
--+
 accounts | 19923767
(1 row)
Thanks,
-Noah

Re: [HACKERS] minimal update

2008-10-29 Thread Magnus Hagander

Andrew Dunstan wrote:
> 
> 
> Kenneth Marshall wrote:
>> On Wed, Oct 22, 2008 at 06:05:26PM -0400, Tom Lane wrote:
>>  
>>> Simon Riggs <[EMAIL PROTECTED]> writes:
>>>
> On Wed, Oct 22, 2008 at 3:24 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>
>> "Minimal" really fails to convey the point here IMHO.  How about
>> something like "suppress_no_op_updates_trigger"?
>>   
 I think it means something to us, but "no op" is a very technical
 phrase
 that probably doesn't travel very well.
   
>>> Agreed --- I was hoping someone could improve on that part.  The only
>>> other words I could come up with were "empty" and "useless", neither of
>>> which seem quite le mot juste ...
>>>
>>> regards, tom lane
>>>
>>> 
>> redundant?
>>
>>
>>   
> 
> I think I like this best of all the suggestions -
> suppress_redundant_updates_trigger() is what I have now.
> 
> If there's no further discussion, I'll go ahead and commit this in a day
> or two.

Nitpicking, but you have:
+
+   Currently PostgreSQL provides one built in trigger
+ function, suppress_redundant_updates_trigger,


Should we perhaps mention the fulltext triggers (with the appropriate
links) here? If it's intended to be an authoritative list of the
"userspace" triggers we ship, I think that may be a good idea.


//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum: I need some explanation

2008-10-29 Thread Alvaro Herrera

Noah Freire escribió:

>  datid | datname | procpid | usesysid | usename | current_query | waiting |
> xact_start | query_start | backend_start | client_addr | client_port
> ---+-+-+--+--+-+-+---+---+---+-+-
>  45974 | pgbench | 14660 | 10 | postgres | autovacuum: VACUUM
> public.accounts | f | 2008-10-29 11:09:05.610857-07 | 2008-10-29
> 11:09:05.610857-07 | 2008-10-29 11:09:03.45083-07 | |
> 
> indeed; autovacuum is currently running for accounts. It is running for 1
> hour but the number of dead rows keeps increasing. Apparently autovacuum
> cannot pace the rate of updates on this table (please check the two
> snapshots of pg_stat_user_tables below taken with a 2 seconds interval
> between them).

Maybe you just need to decrease the vacuum_cost_delay, so that it goes a
bit faster.

Keep in mind that the number of dead tuples only decreases when vacuum
finishes, not while it is working.

> It would be better to run vacuum manually than using
> autovacuum in this case?

It would be exactly the same (unless it used different cost_limit/delay
settings, but then you can configure them via pg_autovacuum)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] minimal update

2008-10-29 Thread Andrew Dunstan




Kenneth Marshall wrote:

On Wed, Oct 22, 2008 at 06:05:26PM -0400, Tom Lane wrote:
  

Simon Riggs <[EMAIL PROTECTED]> writes:


On Wed, Oct 22, 2008 at 3:24 PM, Tom Lane <[EMAIL PROTECTED]> wrote:


"Minimal" really fails to convey the point here IMHO.  How about
something like "suppress_no_op_updates_trigger"?
  

I think it means something to us, but "no op" is a very technical phrase
that probably doesn't travel very well.
  

Agreed --- I was hoping someone could improve on that part.  The only
other words I could come up with were "empty" and "useless", neither of
which seem quite le mot juste ...

regards, tom lane



redundant?


  


I think I like this best of all the suggestions - 
suppress_redundant_updates_trigger() is what I have now.


If there's no further discussion, I'll go ahead and commit this in a day 
or two.


cheers

andrew
? GNUmakefile
? config.log
? config.status
? contrib/spi/.deps
? src/Makefile.global
? src/backend/postgres
? src/backend/access/common/.deps
? src/backend/access/gin/.deps
? src/backend/access/gist/.deps
? src/backend/access/hash/.deps
? src/backend/access/heap/.deps
? src/backend/access/index/.deps
? src/backend/access/nbtree/.deps
? src/backend/access/transam/.deps
? src/backend/bootstrap/.deps
? src/backend/catalog/.deps
? src/backend/catalog/postgres.bki
? src/backend/catalog/postgres.description
? src/backend/catalog/postgres.shdescription
? src/backend/commands/.deps
? src/backend/executor/.deps
? src/backend/lib/.deps
? src/backend/libpq/.deps
? src/backend/main/.deps
? src/backend/nodes/.deps
? src/backend/optimizer/geqo/.deps
? src/backend/optimizer/path/.deps
? src/backend/optimizer/plan/.deps
? src/backend/optimizer/prep/.deps
? src/backend/optimizer/util/.deps
? src/backend/parser/.deps
? src/backend/port/.deps
? src/backend/postmaster/.deps
? src/backend/regex/.deps
? src/backend/rewrite/.deps
? src/backend/snowball/.deps
? src/backend/snowball/snowball_create.sql
? src/backend/storage/buffer/.deps
? src/backend/storage/file/.deps
? src/backend/storage/freespace/.deps
? src/backend/storage/ipc/.deps
? src/backend/storage/large_object/.deps
? src/backend/storage/lmgr/.deps
? src/backend/storage/page/.deps
? src/backend/storage/smgr/.deps
? src/backend/tcop/.deps
? src/backend/tsearch/.deps
? src/backend/utils/.deps
? src/backend/utils/probes.h
? src/backend/utils/adt/.deps
? src/backend/utils/cache/.deps
? src/backend/utils/error/.deps
? src/backend/utils/fmgr/.deps
? src/backend/utils/hash/.deps
? src/backend/utils/init/.deps
? src/backend/utils/mb/.deps
? src/backend/utils/mb/conversion_procs/conversion_create.sql
? src/backend/utils/mb/conversion_procs/ascii_and_mic/.deps
? src/backend/utils/mb/conversion_procs/cyrillic_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_cn_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_jis_2004_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/euc_jp_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/euc_kr_and_mic/.deps
? src/backend/utils/mb/conversion_procs/euc_tw_and_big5/.deps
? src/backend/utils/mb/conversion_procs/latin2_and_win1250/.deps
? src/backend/utils/mb/conversion_procs/latin_and_mic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_ascii/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_big5/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_cyrillic/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_cn/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_jp/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_kr/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_euc_tw/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gb18030/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_gbk/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_johab/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_shift_jis_2004/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_sjis/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_uhc/.deps
? src/backend/utils/mb/conversion_procs/utf8_and_win/.deps
? src/backend/utils/misc/.deps
? src/backend/utils/mmgr/.deps
? src/backend/utils/resowner/.deps
? src/backend/utils/sort/.deps
? src/backend/utils/time/.deps
? src/bin/initdb/.deps
? src/bin/initdb/initdb
? src/bin/pg_config/.deps
? src/bin/pg_config/pg_config
? src/bin/pg_controldata/.deps
? src/bin/pg_controldata/pg_controldata
? src/bin/pg_ctl/.deps
? src/bin/pg_ctl/pg_ctl
? src/bin/pg_dump/.deps
? src/bin/pg_dump/pg_dump
? src/bin/pg_dump/pg_dumpall
? src/bin/pg_dump/pg_restore
? src/bin/pg_resetxlog/.deps
? src/bin/pg_resetxlog/pg_resetxlog
? src/bin/psql/.deps
? src/bin/psql/psql
? src/bin/scripts/.deps
? src/bin/scripts/clusterdb
? src/bin/scripts/createdb
? src/bin/scripts/cr

Re: [HACKERS] Block-level CRC checks

2008-10-29 Thread Alvaro Herrera

Alvaro Herrera wrote:

> So this is what I ended up doing; attached.

Oh, another thing.  The contents for the WAL log message here is very
simplistic; just store all the t_infomask and t_infomask2 relevant bits,
for all the tuples in the table.  A possible optimization to reduce the
WAL traffic is to add another infomask bit which indicates whether a
hint bit has been set since the last time we visited the page.  I'm
unsure if this is worth the pain.  (Another possibility, even more
painful, is to choose at runtime between the two formats, depending on
the number of tuples that need hint bits logged.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] autovacuum: I need some explanation

2008-10-29 Thread Alvaro Herrera

Noah Freire escribió:

> please check the first log message: the vacuum threshold is 6,000,050 rows
> and the number of dead tuples is 16,697,969. Even though the number of
> dead_tuples is greater than the threshold the autovacuum is not being
> triggered for this table. So, besides this condition (dead_tuples >
> threshold) what else is taken into account by autovacuum?

That there's no other process doing the same.  Did you check
pg_stat_activity to ensure that there's really no autovacuum worker
processing this table?

> <2008-10-29 11:09:03.453 PDT>DEBUG: 0: accounts: vac: 16697969
> (threshold 650), anl: 16697969 (threshold 12048)
> <2008-10-29 11:09:05.610 PDT>DEBUG: 0: accounts: vac: 16699578
> (threshold 650), anl: 16699578 (threshold 12048)
> <2008-10-29 11:10:03.563 PDT>DEBUG: 0: accounts: vac: 16735906
> (threshold 650), anl: 16735906 (threshold 12048)

Are these log entries by the same process?  Please add %p to
log_line_prefix to see what's going on.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread David Blewett

On Wed, Oct 29, 2008 at 12:40 PM, David Fetter <[EMAIL PROTECTED]> wrote:
> Also, PL/Perl shouldn't be the only language to have this capability.
> How might we add similar capabilities to PL/PythonU and PL/Tcl?  To
> the rest of the PLs?  Would it make any sense to have it in SQL
> language functions?

Here's a vote for allowing this in plain SQL.

I use the tablefunc contrib module as a way to build a view of a
specific questionnaire's responses (using Elein's nice model here
[1]). Currently, if I then write queries against these views that
include WHERE clauses they don't perform very well as the underlying
data size grows. I was using the afore-mentioned large view that casts
everything to text, but recently I started using separate calls to the
crosstab function for each underlying table, then joining them
together based on their response ID. This seems to work much better
for more complex queries, but I think it would still be beneficial to
have access to these qualifiers so I could push down to each subquery
the list of response ID's to pull. I don't have access to sample SQL
at the moment, but if it is wanted I can try to get that this week.

David Blewett

1. http://www.varlena.com/GeneralBits/110.php

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] recovery_target_time and last completed transaction log time

2008-10-29 Thread Kevin Grittner

I'm posting on hackers because I think that there is a problem in one
of three places, and I'm not sure which:
 
(1)  The documentation needs clarification.
(2)  A LOG message needs clarification.
(3)  There is a bug in recovery.  (unlikely)
 
This issue was previously posted here with no response:
 
http://archives.postgresql.org/pgsql-admin/2008-10/msg00123.php
 
Regarding the documentation -- even if I'm interpreting it correctly,
it might be good to specifically state that it is the commit times of
database transactions which matter for recovery_target_time; that
transactions started but not committed by that time will be omitted,
regardless of the recovery_target_inclusive setting.
 
Regarding the LOG message, it is confusing to have a recovery.conf
file which contains this:
 
recovery_target_time = '2008-10-16 17:00:00.0'
 
and see this in the log:
 
[2008-10-16 23:04:26.006 CDT] 19951 LOG:  last completed transaction
was at log time 2008-10-16 17:00:23.205347-05
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] autovacuum: I need some explanation

2008-10-29 Thread Noah Freire

Hello,

I have a table (accounts) with 600,000,000 rows. A heavy high-concurrent
workload that makes mostly updates on this table generates a lot of dead
tuples in its run, which is expected due to MVCC.
The problem is that even though autovacuum is enabled, the autovacuum worker
does not vacuum this table (I entered custom autovacuum settings for this
table in pg_autovacuum to try to force a situation). Autovacuum is working
for other smaller tables but not for accounts.

<2008-10-29 11:09:03.453 PDT>DEBUG: 0: accounts: vac: 16697969
(threshold 650), anl: 16697969 (threshold 12048)
<2008-10-29 11:09:05.610 PDT>DEBUG: 0: accounts: vac: 16699578
(threshold 650), anl: 16699578 (threshold 12048)
<2008-10-29 11:10:03.563 PDT>DEBUG: 0: accounts: vac: 16735906
(threshold 650), anl: 16735906 (threshold 12048)


please check the first log message: the vacuum threshold is 6,000,050 rows
and the number of dead tuples is 16,697,969. Even though the number of
dead_tuples is greater than the threshold the autovacuum is not being
triggered for this table. So, besides this condition (dead_tuples >
threshold) what else is taken into account by autovacuum?

Thank you,

-Noah

Re: [HACKERS] Block-level CRC checks

2008-10-29 Thread Alvaro Herrera

Simon Riggs wrote:

> But perhaps writing a single WAL record if you scan whole page and set
> all bits at once. Then it makes sense in some cases.

So this is what I ended up doing; attached.

There are some gotchas in this patch:

1. it does not consider hint bits other than the ones defined in htup.h.
Some index AMs use hint bits to "kill" tuples (LP_DEAD mostly, I think).
This means that CRCs will be broken for such pages when pages are torn.

2. some parts of the code could be considered modularity violations.
For example, tqual.c is setting a bit in a Page structure; bufmgr.c is
later checking that bit to determine when to log.

3. the bgwriter is seen writing WAL entries at checkpoint.  At shutdown,
this might cause an error to be reported on how there was not supposed
to be activity on the log.  I didn't save the exact error report and I
can't find it in the source :-(


So it "mostly works" at this time.  I very much welcome opinions to
improve the weak points.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Index: src/backend/access/heap/heapam.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/access/heap/heapam.c,v
retrieving revision 1.266
diff -c -p -r1.266 heapam.c
*** src/backend/access/heap/heapam.c	27 Oct 2008 21:50:12 -	1.266
--- src/backend/access/heap/heapam.c	29 Oct 2008 17:56:25 -
*** log_newpage(RelFileNode *rnode, ForkNumb
*** 4007,4012 
--- 4007,4064 
  	return recptr;
  }
  
+ XLogRecPtr
+ log_hintbits(RelFileNode *rnode, ForkNumber forkNum, BlockNumber blkno,
+ 			 Page page)
+ {
+ 	xl_heap_hintbits xlrec;
+ 	OffsetNumber	i;
+ 	XLogRecPtr		recptr;
+ 	XLogRecData		rdata[2];
+ 	uint16			bits[MaxHeapTuplesPerPage * 4];
+ 	intpos = 0;
+ 
+ 	for (i = FirstOffsetNumber; i <= PageGetMaxOffsetNumber(page);
+ 		 i = OffsetNumberNext(i))
+ 	{
+ 		HeapTupleHeader	htup;
+ 		ItemId		lp = PageGetItemId(page, i);
+ 
+ 		if (!ItemIdHasStorage(lp))
+ 			continue;
+ 
+ 		htup = (HeapTupleHeader) PageGetItem(page, lp);
+ 
+ 		bits[pos++] = htup->t_infomask & HEAP_XACT_MASK;
+ 		bits[pos++] = htup->t_infomask2 & HEAP2_XACT_MASK;
+ 	}
+ 
+ 	/* NO ELOG(ERROR) from here till hint bits are logged */
+ 	START_CRIT_SECTION();
+ 
+ 	xlrec.node = *rnode;
+ 	xlrec.block = blkno;
+ 
+ 	rdata[0].data = (char *) &xlrec;
+ 	rdata[0].len = SizeOfHeapHintbits;
+ 	rdata[0].buffer = InvalidBuffer;
+ 	rdata[0].next = &(rdata[1]);
+ 
+ 	rdata[1].data = (char *) bits;
+ 	rdata[1].len = sizeof(uint16) * pos;
+ 	rdata[1].buffer = InvalidBuffer;
+ 	rdata[1].next = NULL;
+ 
+ 	recptr = XLogInsert(RM_HEAP2_ID, XLOG_HEAP2_HINTBITS, rdata);
+ 
+ 	PageSetLSN(page, recptr);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 
+ 	END_CRIT_SECTION();
+ 
+ 	return recptr;
+ }
+ 
  /*
   * Handles CLEAN and CLEAN_MOVE record types
   */
*** heap_xlog_freeze(XLogRecPtr lsn, XLogRec
*** 4113,4118 
--- 4165,4230 
  }
  
  static void
+ heap_xlog_hintbits(XLogRecPtr lsn, XLogRecord *record)
+ {
+ 	xl_heap_hintbits *xlrec = (xl_heap_hintbits *) XLogRecGetData(record);
+ 	Buffer		buffer;
+ 	Page		page;
+ 
+ 	buffer = XLogReadBuffer(xlrec->node, xlrec->block, false);
+ 	if (!BufferIsValid(buffer))
+ 		return;
+ 	page = (Page) BufferGetPage(buffer);
+ 
+ 	if (XLByteLE(lsn, PageGetLSN(page)))
+ 	{
+ 		UnlockReleaseBuffer(buffer);
+ 		return;
+ 	}
+ 
+ 	if (record->xl_len > SizeOfHeapHintbits)
+ 	{
+ 		uint16 *infomask;
+ 		uint16 *infomask_end;
+ 		OffsetNumber offset = FirstOffsetNumber;
+ 
+ 		infomask = (uint16 *) ((char *) xlrec + SizeOfHeapHintbits);
+ 		infomask_end = (uint16 *) ((char *) xlrec + record->xl_len);
+ 
+ 		while (infomask < infomask_end)
+ 		{
+ 			for (;;)
+ 			{
+ HeapTupleHeader	htup;
+ ItemId		lp = PageGetItemId(page, offset);
+ 
+ if (!ItemIdHasStorage(lp))
+ {
+ 	offset++;
+ 	continue;
+ }
+ 
+ htup = (HeapTupleHeader) PageGetItem(page, lp);
+ 
+ htup->t_infomask |= *infomask;
+ infomask++;
+ htup->t_infomask2 |= *infomask;
+ infomask++;
+ 
+ offset++;
+ 
+ break;
+ 			}
+ 		}
+ 	}
+ 
+ 	PageSetLSN(page, lsn);
+ 	PageSetTLI(page, ThisTimeLineID);
+ 	MarkBufferDirty(buffer);
+ 	UnlockReleaseBuffer(buffer);
+ }
+ 
+ static void
  heap_xlog_newpage(XLogRecPtr lsn, XLogRecord *record)
  {
  	xl_heap_newpage *xlrec = (xl_heap_newpage *) XLogRecGetData(record);
*** heap2_redo(XLogRecPtr lsn, XLogRecord *r
*** 4614,4619 
--- 4726,4734 
  		case XLOG_HEAP2_CLEAN_MOVE:
  			heap_xlog_clean(lsn, record, true);
  			break;
+ 		case XLOG_HEAP2_HINTBITS:
+ 			heap_xlog_hintbits(lsn, record);
+ 			break;
  		default:
  			elog(PANIC, "heap2_redo: unknown op code %u", info);
  	}
*** heap2_desc(StringInfo buf, uint8 xl_info
*** 4755,4760 
--- 4870,4883 
  		 xlrec->node.spcNode, xlrec->node.dbNode,
  		 xlrec->node.relNode, xlrec->block);

Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

2008-10-29 Thread Dann Corbit

>Hi Simon,
>
>He is going to do some investigation in the methods and
>write down the possibilities and then he is going to implement
>something from that for PostgreSQL.
>
>When will this work be complete? We are days away from completing main
>work on 8.4, so you won't get much discussion on this for a few months
>yet. Will it be complete in time for 8.5? Or much earlier even?
> 
>The first guess is that the work will be done for 8.6. 
>Dano is supposed to finish the work and defend his thesis in something
a bit more than a year.
>
>Julius, you don't mention what your role is in this. In what sense is
>Dano's master's thesis a "we" thing?
> 
>I am Dano's mentor and we have a closed contact with Zdenek as well. 
>We would like the project to become a "we" thing as another reason why
to work on the project. 
>It seems to be better to research some ideas at the begging and discuss
the stuff
>during development than just individually writing some piece of code
which could 
>be published afterwards. Especially, when this area seems to be of
interest of 
>more people.

Threads are where future performance is going to come from:
General purpose->
http://www.setup32.com/hardware/cpuchipset/32core-processors-intel-reach
e.php

GPU->
http://wwwx.cs.unc.edu/~lastra/Research/GPU_performance.html
http://www.cs.unc.edu/~geom/GPUSORT/results.html

Database engines that want to exploit the ultimate in performance will
utilize multiple threads of execution.
True, the same thing can be realized by multiple processes, but a
process is more expensive than a thread.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Multi CPU Queries - Feedback and/or suggestions wanted!

2008-10-29 Thread Julius Stroffek


Hi Simon,


He is going to do some investigation in the methods and
write down the possibilities and then he is going to implement
something from that for PostgreSQL.



When will this work be complete? We are days away from completing main
work on 8.4, so you won't get much discussion on this for a few months
yet. Will it be complete in time for 8.5? Or much earlier even?
  
The first guess is that the work will be done for 8.6. Dano is supposed 
to finish the work and defend his thesis in something a bit more than a 
year.

Julius, you don't mention what your role is in this. In what sense is
Dano's master's thesis a "we" thing?
  
I am Dano's mentor and we have a closed contact with Zdenek as well. We 
would like the project to become a "we" thing as another reason why to 
work on the project. It seems to be better to research some ideas at the 
begging and discuss the stuff during development than just individually 
writing some piece of code which could be published afterwards. 
Especially, when this area seems to be of interest of more people.


Cheers

Julo

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Joshua D. Drake

On Wed, 2008-10-29 at 19:15 +0200, Hannu Krosing wrote:
>  
> > SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
> > ( I would have to check the docs)
> 
> And the only other GUC slave needs for replication is MCP/Slave connect
> info ?

https://projects.commandprompt.com/public/replicator/wiki/Documentation


> 
> > > > Well actually that was my decision years ago and it was made for two 
> > > > reasons:
> > > > 
> > > > 1. Speed
> > > 
> > > Is this some MySQL kind of speed, achieved by compromising on ACID ?
> > > 
> > 
> > No this is the kind that realizes that throwing triggers on 300
> > relations is a slow way to go about replication.
> 
> It may be slow-ish to set up initially, but in your (old) tips wikipage
> you say that replicating more than 1000 relations would be prohibitively
> slow.
> 

Depends on the workload I would assume.


> > > Actually I see trigger based replication as a power tool. You can make it 
> > > (almost) 
> > > as fast as any source level hack, even if the latter compromises on on 
> > > ACID, plus 
> > > you can be much more flexible on what/when/how you replicate.
> > 
> > Like I said, really not interested in the trigger versus not discussion.
> 
> Why not ? Because you are actually doing it with triggers, just not
> standard postgreSQL table-level triggers, but your own ones patched into
> backend ?

Because it isn't productive. I am happy to answer all questions (that I
can) about usage, implementation etc... but hashing through decisions
that were made 6 years ago isn't helpful, imo.

> One more question about triggers - what happens to triggers, foreign
> keys and other constraints on slave ? 

Alvaro?

>  
> > One slave doesn't effect the load on the master any more than fifty.
> 
> But have you measured, what is impact of replication when going from 0
> to 1 slaves ? 

Not anytime recently no.

> 
> You can't replicate a table on Replicator by just one command:

Actually you can. We provide stored procedures you can use if you like.

> 
> ALTER TABLE public.accounts ENABLE REPLICATION;
> ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2;
> 
> You first have to set things up by telling master and slave about 
> themselves ("Thou Art the Master") and about each other.

Actually you have to tell the MCP but yes there is configuration to be
had. It takes about 15 minutes from start to finish for a current
master,mcp,two slave scenario.

> Is Replicator always replicating from one single master to one or more
> slaves ?

Yes.

> 
> Or is it possible to set it up so, that some tables are replicated from
> one and some others from some other master.

We do not support cascading like that but we do support partial
replication. E.g;

ALTER TABLE foo ENABLE REPLICATION;
ALTER TABLE foo ENABLE REPLICATION ON SLAVE 1;
ALTER TABLE BAR ENABLE REPLICATION;
ALTER TABLE BAR ENABLE REPLICATION ON SLAVE 2;

> 
> I don't mean real multi-master, just that the single master is not the
> same for all tables - A good example would be a sales system where each
> office has its own sales table for inserting/updating data, but has a
> read-only fresh copy of all other offices data via replication .
> 

Yes this you could do. What you can't do is this:

MASTER->[SLAVE0|MASTER]->SLAVE1

Sincerely,

Joshua D. Drake


-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread Joshua D. Drake

On Wed, 2008-10-29 at 19:17 +0200, Hannu Krosing wrote:
> On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:
> > Folks,
> > 
> > Please find enclosed a WIP patch to add the ability for functions to
> > see the qualifiers of the query in which they're called.  It's not
> > working just yet, and I'm not sure how best to get it working, but I'd
> > like to see this as part of 8.4, as SQL/MED is just way too ambitious
> > given the time frame.
> > 
> > Any tips, hints, pointers, etc. would be much appreciated.
> > 
> > Also, PL/Perl shouldn't be the only language to have this capability.
> > How might we add similar capabilities to PL/PythonU 
> 
> I'll look at adding this to pl/pythonu. I have to finish some stuff
> there before freeze anyway.

Have we tested plpython with version 3?

> 
> --
> Hannu
> 
> 
> 
-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Hannu Krosing

On Wed, 2008-10-29 at 11:58 -0400, Tom Lane wrote:
> Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> > And I fail to see how the user would control which behavior will get 
> > chosen, 
> 
> Oh, I'm sorry, I didn't realize you misunderstood my syntax example.
> I was suggesting that the SQL function manager recognize some optional
> non-SQL keywords at the start of a SQL function body, along the lines of
> 
> CREATE FUNCTION foo(...) RETURNS SETOF something AS $$
> #option eager
> SELECT ... $$ LANGUAGE SQL;
> 
> versus
> 
> CREATE FUNCTION foo(...) RETURNS SETOF something AS $$
> #option lazy
> SELECT ... $$ LANGUAGE SQL;
> 
> (I'm not wedded to this particular spelling of it, but there is
> precedent in plpgsql.)
> 
> Now of course the bigger problem with either this syntax or yours is
> that attaching such a property to a function is arguably the Wrong Thing
> in the first place.  Which one is the best way is likely to depend on
> the calling query more than it does on the function.  However, I see no
> solution to that problem except function inlining; and if the function
> gets inlined then all this discussion is moot anyhow.

I have some vague ideas about extending SET-returning functions to
NODE-returning functions, which will have some extra methods (for OO
languages like python) or extra functions (for C, somewhat similar to
how AGGREGATE functions are defined) to interact with planner/optimiser,
so that planner can ask the function instance things like "can you do
fast start" or "how many rows for theses args" during planning and also
can advise function about strategies once the plan is chosen.

That would be something which could be very useful for SQL/MED
implementation as well.

-
Hannu

-- 
--
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pre-MED

2008-10-29 Thread Hannu Krosing

On Wed, 2008-10-29 at 09:40 -0700, David Fetter wrote:
> Folks,
> 
> Please find enclosed a WIP patch to add the ability for functions to
> see the qualifiers of the query in which they're called.  It's not
> working just yet, and I'm not sure how best to get it working, but I'd
> like to see this as part of 8.4, as SQL/MED is just way too ambitious
> given the time frame.
> 
> Any tips, hints, pointers, etc. would be much appreciated.
> 
> Also, PL/Perl shouldn't be the only language to have this capability.
> How might we add similar capabilities to PL/PythonU 

I'll look at adding this to pl/pythonu. I have to finish some stuff
there before freeze anyway.

--
Hannu



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Tom Lane

Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> Le mercredi 29 octobre 2008, Tom Lane a Ã©critÂ :
>> However, I see no 
>> solution to that problem except function inlining; and if the function
>> gets inlined then all this discussion is moot anyhow.

> How to inline PLs functions?

All of this is pie-in-the-sky for PL functions, and I think properly so:
the whole reason for supporting PLs is to enable doing things that SQL
does poorly or not at all.  So expecting SQL to interoperate very
closely with them seems impossible, or at least unreasonably limiting.
The real issue at hand is what to do with SQL-language functions.

I'm currently going to have a look at just what it would take to support
both lazy and eager evaluation in functions.c (independently of what
syntax, if any, we settle on to expose the choice to the user).  If it's
either really awful or really easy we should know that before arguing
further.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Hannu Krosing

On Wed, 2008-10-29 at 09:01 -0700, Joshua D. Drake wrote:
> On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> > > Alvaro Herrera wrote:
> > > Case in point. To replicate a table currently you do this:
> > > 
> > > ALTER TABLE foo ENABLE REPLICATION;
> > > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;
> > 
> > How do you define SLAVE 0 ?
> 
> SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
> ( I would have to check the docs)

And the only other GUC slave needs for replication is MCP/Slave connect
info ?

> > > Well actually that was my decision years ago and it was made for two 
> > > reasons:
> > > 
> > > 1. Speed
> > 
> > Is this some MySQL kind of speed, achieved by compromising on ACID ?
> > 
> 
> No this is the kind that realizes that throwing triggers on 300
> relations is a slow way to go about replication.

It may be slow-ish to set up initially, but in your (old) tips wikipage
you say that replicating more than 1000 relations would be prohibitively
slow.

I don't think this would be true for trigger-based replication.

And FK  checks implemented using with triggers, and I have not heard
much complaining about it being a hack, or unsuitable upwards of some
N-hundred tables.

> > Actually I see trigger based replication as a power tool. You can make it 
> > (almost) 
> > as fast as any source level hack, even if the latter compromises on on 
> > ACID, plus 
> > you can be much more flexible on what/when/how you replicate.
> 
> Like I said, really not interested in the trigger versus not discussion.

Why not ? Because you are actually doing it with triggers, just not
standard postgreSQL table-level triggers, but your own ones patched into
backend ?

When doing it without triggers (or to be more exact, with triggers
hardwired into backend) you still have to solve most of the same
problems that trigger-based replication does, just with less visibility
and less help from existing infrastructure.

One more question about triggers - what happens to triggers, foreign
keys and other constraints on slave ? 

> > One of the core requirements of the original replicator which can still 
> > > be seen today AND is a requirement of 1.9 as well is:
> > > 
> > > Though shall not cause grief to thy master even if thy replicates many 
> > > children.
> > 
> > IOW, one slave and many slaves cause similar load on master.
> 
> One slave doesn't effect the load on the master any more than fifty.

But have you measured, what is impact of replication when going from 0
to 1 slaves ? 

...

> > > and our user 
> > > experience is far more friendly than anything else.
> > 
> > Do you mean that you have a friendly UI for beginners ?
> > 
> 
> Not just beginners. People don't like complicated software. Generally
> speaking, if you go to a DBA of other systems, someone with 10 years
> experience and you show them Slony and they look at you like your nuts.

I wonder why. To me Slony's UI (slonik) seems quite simplistic, the only
thing they have above what Replicator seems to have is grouping tables
so that tables inside one group are always guaranteed to be in a
consistent state on the slave.

Otherways I can't see how 

--
create set (id=1, origin=1, comment="Master");
set add table (set id=1, origin=1, id=1, fully qualified name = 
'public.accounts');
---
store node (id=2, comment = 'Slave node');
store path (server = 1, client = 2, conninfo='dbname=$MASTERDBNAME 
host=$MASTERHOST user=$REPLICATIONUSER');
--
subscribe set ( id = 1, provider = 1, receiver = 2,
--

can be made much simpler

You can't replicate a table on Replicator by just one command:

ALTER TABLE public.accounts ENABLE REPLICATION;
ALTER TABLE public.accounts ENABLE REPLICATION ON SLAVE 2;

You first have to set things up by telling master and slave about 
themselves ("Thou Art the Master") and about each other.

> It doesn't matter that Slony works great. It doesn't matter that it is
> proven in the field. 

Actually slony works great for simple cases, and not-so-great for more
complex topologies or higher loads. It is relatively easy to push Slony
over by accident.

And it was Slony's apparent _simplicity_ that fooled me into believing
that it is a robust system, similar to postgreSQL proper. A belief, the
consequences of which caused me a lot of grief and frustration for about
two years

> > Or is it proved to be robust in continued use, with changes going smoothly 
> > in and automatic recovery from network outages and random reboots on live 
> > systems in 24/7 environment ?
> > 
> 
> Well we have had plenty of people use it although I admit over the last
> year we have been weening them off in preparation for 1.8. 1.8 is a big
> step up from previous releases and yes it has some issues all software
> does but we are working on them.

Is Replicator always replicating from one single master to one or more
slaves ?

Or is it possib

Re: [HACKERS] pre-MED

2008-10-29 Thread Jonah H. Harris

On Wed, Oct 29, 2008 at 12:40 PM, David Fetter <[EMAIL PROTECTED]> wrote:
> Please find enclosed a WIP patch to add the ability for functions to
> see the qualifiers of the query in which they're called.  It's not
> working just yet, and I'm not sure how best to get it working, but I'd
> like to see this as part of 8.4, as SQL/MED is just way too ambitious
> given the time frame.

To be more specific, SQL/MED is going to be 8.5.  This is an overall
improvement for accessing the predicate.

-- 
Jonah H. Harris, Senior DBA
myYearbook.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Dimitri Fontaine

Le mercredi 29 octobre 2008, Tom Lane a écrit :
> Now of course the bigger problem with either this syntax or yours is
> that attaching such a property to a function is arguably the Wrong Thing
> in the first place.  Which one is the best way is likely to depend on
> the calling query more than it does on the function.

Let the planner figure this out, and add in some starting cost considerations 
too maybe? That sounds even better, yes.

> However, I see no 
> solution to that problem except function inlining; and if the function
> gets inlined then all this discussion is moot anyhow.

How to inline PLs functions?
-- 
dim


signature.asc
Description: This is a digitally signed message part.

[HACKERS] pre-MED

2008-10-29 Thread David Fetter

Folks,

Please find enclosed a WIP patch to add the ability for functions to
see the qualifiers of the query in which they're called.  It's not
working just yet, and I'm not sure how best to get it working, but I'd
like to see this as part of 8.4, as SQL/MED is just way too ambitious
given the time frame.

Any tips, hints, pointers, etc. would be much appreciated.

Also, PL/Perl shouldn't be the only language to have this capability.
How might we add similar capabilities to PL/PythonU and PL/Tcl?  To
the rest of the PLs?  Would it make any sense to have it in SQL
language functions?

Cheers,
David.
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: [EMAIL PROTECTED]

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 7a9ddcd..c5b35be 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -728,6 +728,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool 
is_async, bool do_get)
char   *conname = NULL;
remoteConn *rconn = NULL;
boolfail = true;/* default to backward 
compatible */
+   ReturnSetInfo   *rsi;   /* set up for qual-pushing */
 
/* create a function context for cross-call persistence */
funcctx = SRF_FIRSTCALL_INIT();
@@ -802,6 +803,17 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool 
is_async, bool do_get)
elog(ERROR, "wrong number of arguments");
}
 
+   if (sql && rsi->qual) /* add qualifiers if available. */
+   {
+   char *quals = rsinfo_get_qual_str(rsi);
+   char *qualifiedQuery = palloc(strlen(sql) + strlen(" 
WHERE ") +
+   
  strlen(quals) + 1);
+
+   sprintf(qualifiedQuery, "%s WHERE %s", sql, quals);
+
+   sql = qualifiedQuery;
+   }
+
if (!conn)
DBLINK_CONN_NOT_AVAIL;
 
diff --git a/doc/src/sgml/plperl.sgml b/doc/src/sgml/plperl.sgml
index 2f2e53b..f0c6587 100644
--- a/doc/src/sgml/plperl.sgml
+++ b/doc/src/sgml/plperl.sgml
@@ -873,6 +873,41 @@ CREATE TRIGGER test_valid_id_trig
   
  
 
+ 
+  PL/Perl Qualifiers
+
+  
+   PL/Perl exposes qualifiers in the current query.  In a function,
+   the hash reference $_QUAL contains information
+   about the currently executing query.  $_QUAL is
+   a global variable, which gets a separate local value for each query.
+   The fields (currently just one) of $_QUAL are:
+
+
+ 
+  $_TD->{qual_string}
+  
+   
+A string containing all the qualifiers for the current query.
+   
+  
+ 
+
+  
+  
+   Here is an example of a function using $_QUAL.
+
+CREATE OR REPLACE FUNCTION show_quals()
+RETURNS TEXT
+LANGUAGE plperl
+AS $$
+return $_QUAL->{qual_string};
+$$;
+
+  
+
+ 
+
  
   Limitations and Missing Features
 
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 4c4742d..d6f7ef8 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -45,6 +45,7 @@
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "optimizer/clauses.h"
 #include "optimizer/planmain.h"
 #include "pgstat.h"
 #include "utils/acl.h"
@@ -1693,6 +1694,27 @@ ExecMakeFunctionResultNoSets(FuncExprState *fcache,
return result;
 }
 
+/*
+ *
+ * rsinfo_get_qual_str
+ *
+ * Get either an empty string or a batch of qualifiers.
+ *
+ */
+char *
+rsinfo_get_qual_str(ReturnSetInfo *rsinfo)
+{
+   Node*qual;
+   List*context;
+
+   if (rsinfo->qual == NIL)
+   return pstrdup("");
+
+   qual = (Node *) make_ands_explicit(rsinfo->qual);
+   context = deparse_context_for_plan(NULL, NULL, rsinfo->rtable, NULL);
+
+   return deparse_expression(qual, context, false, false);
+}
 
 /*
  * ExecMakeTableFunctionResult
@@ -1703,6 +1725,7 @@ ExecMakeFunctionResultNoSets(FuncExprState *fcache,
 Tuplestorestate *
 ExecMakeTableFunctionResult(ExprState *funcexpr,
ExprContext *econtext,
+   List *qual, List 
*rtable,
TupleDesc expectedDesc,
bool randomAccess)
 {
@@ -1736,6 +1759,8 @@ ExecMakeTableFunctionResult(ExprState *funcexpr,
InitFunctionCallInfoData(fcinfo, NULL, 0, NULL, (Node *) &rsinfo);
rsinfo.type = T_ReturnSetInfo;
rsinfo.econtext = econtext;
+   rsinfo.qual = qual;
+   rsinfo.rtable = rtable;
rsinfo.expectedDesc =

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Joshua D. Drake

On Wed, 2008-10-29 at 09:54 +0200, Hannu Krosing wrote:
> On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:
> > Hannu Krosing wrote:
> > > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:

> it was in subsection "mcp_server mysteriously dies"
> http://www.commandprompt.com/products/mammothreplicator/tips , 
> 

Ehh, need to remove that. That is very old.

Joshua D. Drake

-- 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Joshua D. Drake

On Wed, 2008-10-29 at 12:02 +0200, Hannu Krosing wrote:
> On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> > Alvaro Herrera wrote:
> > Case in point. To replicate a table currently you do this:
> > 
> > ALTER TABLE foo ENABLE REPLICATION;
> > ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;
> 
> How do you define SLAVE 0 ?

SLAVE 0 is defined by a GUC on the slave, replication_slave_no I think.
( I would have to check the docs)

> > Well actually that was my decision years ago and it was made for two 
> > reasons:
> > 
> > 1. Speed
> 
> Is this some MySQL kind of speed, achieved by compromising on ACID ?
> 

No this is the kind that realizes that throwing triggers on 300
relations is a slow way to go about replication.

> Actually I see trigger based replication as a power tool. You can make it 
> (almost) 
> as fast as any source level hack, even if the latter compromises on on ACID, 
> plus 
> you can be much more flexible on what/when/how you replicate.
> 

Like I said, really not interested in the trigger versus not discussion.

> One of the core requirements of the original replicator which can still 
> > be seen today AND is a requirement of 1.9 as well is:
> > 
> > Though shall not cause grief to thy master even if thy replicates many 
> > children.
> 
> IOW, one slave and many slaves cause similar load on master.

One slave doesn't effect the load on the master any more than fifty.

> 
> And you do it by initially replicating to a single "distributor" slave ?
> 

Essentially.

> > That is where the original idea of the MCP came from. They theory was, 
> > we could have 50 slaves and the master wouldn't care.
> 
> A good goal. But why would anybody _need_ 50 slaves ? 
> 
> Perhaps some application with small data amount (meaning it fits on one 
> server) 
> and massively (100:1) skewed towards reading, like dynamically generated 
> mostly-static-content web pages ?

Well it was just a number. It could be any number but yes your example
would fit.

> Why not just make a new kind of wal-logged heap, which stores your version of 
> trx log ?
> 

That would be a question for Alvaro or Alexey not I :)

> > and our user 
> > experience is far more friendly than anything else.
> 
> Do you mean that you have a friendly UI for beginners ?
> 

Not just beginners. People don't like complicated software. Generally
speaking, if you go to a DBA of other systems, someone with 10 years
experience and you show them Slony and they look at you like your nuts.
It doesn't matter that Slony works great. It doesn't matter that it is
proven in the field. 

> Or is it proved to be robust in continued use, with changes going smoothly 
> in and automatic recovery from network outages and random reboots on live 
> systems in 24/7 environment ?
> 

Well we have had plenty of people use it although I admit over the last
year we have been weening them off in preparation for 1.8. 1.8 is a big
step up from previous releases and yes it has some issues all software
does but we are working on them.

Sincerely,

Joshua D. Drake

-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Tom Lane

Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> And I fail to see how the user would control which behavior will get chosen, 

Oh, I'm sorry, I didn't realize you misunderstood my syntax example.
I was suggesting that the SQL function manager recognize some optional
non-SQL keywords at the start of a SQL function body, along the lines of

CREATE FUNCTION foo(...) RETURNS SETOF something AS $$
#option eager
SELECT ... $$ LANGUAGE SQL;

versus

CREATE FUNCTION foo(...) RETURNS SETOF something AS $$
#option lazy
SELECT ... $$ LANGUAGE SQL;

(I'm not wedded to this particular spelling of it, but there is
precedent in plpgsql.)

Now of course the bigger problem with either this syntax or yours is
that attaching such a property to a function is arguably the Wrong Thing
in the first place.  Which one is the best way is likely to depend on
the calling query more than it does on the function.  However, I see no
solution to that problem except function inlining; and if the function
gets inlined then all this discussion is moot anyhow.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Dimitri Fontaine

Le mercredi 29 octobre 2008, Tom Lane a écrit :
> Well, call-per-value is *necessary* for lazy evaluation, but it's not
> *sufficient*.  You need a function implementation that can suspend and
> resume execution, and that's difficult in general.

Ok, I think I begin to understand how things are tied together. Thanks again 
for your patience explaining :)

> Well, we'd just tweak how the executor gets called inside functions.c.
> The main problem is that we'd have to have two different sets of
> behavior there, depending on whether we are trying to evaluate commands
> a row at a time or all at once, plus interlocks to disallow cases like
> using LAZY with a RETURNING query.  It's certainly possible but I
> believe it will make functions.c a lot longer and uglier than it would
> be without it.

And I fail to see how the user would control which behavior will get chosen, 
which I think was part of the "going further with your ideas" sub thread.

> Nobody has refuted the argument that sticking a LIMIT into the function
> would accomplish the same result.

Fair enough.

> > What's against PLpgSQL implementing a YIELD statement?
>
> Feel free to try it, if you want.

Hehe, not this year.
But being able to ask questions and get clarifications from hackers certainly 
is a step in this direction. Feeling ready and organizing one's time around 
it is the next :)

> When you get done you might have some 
> grasp of why it'll be nearly impossible for PLs that we don't control
> the entire implementation of.

Hence the YIELD / new API idea, with the LAZY property which would be optional 
for PLs and only implemented in plpgsql (and maybe plpython, as python 
supports the generator functions concept) first.
Maybe having optional features for PLs has not yet been done?

But again, I was started in this only by misunderstanding your call here:

 > I think the PL side of the problem is the hard part --- if we knew how
 > to solve these issues for plpgsql then SQL functions would surely be
 > easy.

I'm not being sticky on the feature request, just struggling to understand 
correctly the issues at hand, recognizing that easy choice of EAGER or LAZY 
function evaluation would be great as a user, even if unsupported in a number 
of PLs.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Tom Lane

Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> Le mercredi 29 octobre 2008, Tom Lane a Ã©critÂ :
>> So the fact that it's possible for SQL-language functions is an
>> idiosyncrasy of that language, not something we should cram into the
>> general CREATE FUNCTION syntax in the vain hope that having syntax
>> might cause an implementation to appear someday.

> Ok, that confirms that lazy evaluation and call-per-value are distinct 
> things, 
> for once, and that what you where after was not an easy syntax bit. :)

Well, call-per-value is *necessary* for lazy evaluation, but it's not
*sufficient*.  You need a function implementation that can suspend and
resume execution, and that's difficult in general.

>> Therefore, if we were going to expose a knob to the user to control this
>> behavior, I'd be inclined to make it a part of the language-specific
>> syntax of SQL function bodies.

> How would we support the option for SQL functions?

Well, we'd just tweak how the executor gets called inside functions.c.
The main problem is that we'd have to have two different sets of
behavior there, depending on whether we are trying to evaluate commands
a row at a time or all at once, plus interlocks to disallow cases like
using LAZY with a RETURNING query.  It's certainly possible but I
believe it will make functions.c a lot longer and uglier than it would
be without it.

>> Mind you, I'm not exactly *for* this, because I think it will result
>> in making functions.c a whole lot more complex and hard to maintain
>> than it needs to be, in exchange for a behavior that I don't believe
>> is especially useful in most cases, and can easily be worked around
>> when it is useful.

> From what I understand, the lazy evaluation of functions is not seen as easy 
> to be worked around by people asking for it.

Nobody has refuted the argument that sticking a LIMIT into the function
would accomplish the same result.

> What's against PLpgSQL implementing a YIELD statement?

Feel free to try it, if you want.  When you get done you might have some
grasp of why it'll be nearly impossible for PLs that we don't control
the entire implementation of.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WIP parallel restore patch

2008-10-29 Thread Andrew Dunstan



Attached is my latest parallel restore patch. I think it's functionally 
complete for Unix.


Many bugs have been fixed since the last patch, and the hardcoded 
limitation to two table dependencies is removed. It seems fairly robust 
in my recent testing.


Remaining to be done:

. code cleanup
. better error checking in a few places
. final decision re command line option names/defaults
. documentation
. Windows support.

cheers

andrew



parallel_restore_10.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Dimitri Fontaine

Le mercredi 29 octobre 2008, Tom Lane a écrit :
> So the fact that it's possible for SQL-language functions is an
> idiosyncrasy of that language, not something we should cram into the
> general CREATE FUNCTION syntax in the vain hope that having syntax
> might cause an implementation to appear someday.

Ok, that confirms that lazy evaluation and call-per-value are distinct things, 
for once, and that what you where after was not an easy syntax bit. :)

> Therefore, if we were going to expose a knob to the user to control this
> behavior, I'd be inclined to make it a part of the language-specific
> syntax of SQL function bodies.

How would we support the option for SQL functions?

> Mind you, I'm not exactly *for* this, because I think it will result
> in making functions.c a whole lot more complex and hard to maintain
> than it needs to be, in exchange for a behavior that I don't believe
> is especially useful in most cases, and can easily be worked around
> when it is useful.

From what I understand, the lazy evaluation of functions is not seen as easy 
to be worked around by people asking for it.

> But if people are going to be sticky about the 
> point, something like this might be a workable compromise.

What's against PLpgSQL implementing a YIELD statement? Wouldn't it be simpler 
to integrate for both hackers and users?

This would maybe even allow to have a new API in the executor for this, and 
each PL would be free to add support for it when best suits them. Maybe 
that's exactly what you're calling "a whole lot more complex and hard to 
maintain than it needs to be", though.

Regards,
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Tom Lane

Dimitri Fontaine <[EMAIL PROTECTED]> writes:
> Done this way, the user could also choose for the function to be lazy or to 
> use a tuplestore whatever the language in which it's written.

The odds of this ever happening for any of the PLs are not
distinguishable from zero.  It might be nice to have, but the amount of
work involved would be incredibly out of proportion to the benefits ---
even assuming that it's technically possible at all, which I rather
doubt for the PLs that depend on language interpreters that aren't under
our control.

So the fact that it's possible for SQL-language functions is an
idiosyncrasy of that language, not something we should cram into the
general CREATE FUNCTION syntax in the vain hope that having syntax
might cause an implementation to appear someday.

Therefore, if we were going to expose a knob to the user to control this
behavior, I'd be inclined to make it a part of the language-specific
syntax of SQL function bodies.  We could take a hint from the
(underdocumented) #option syntax in plpgsql, something like

CREATE FUNCTION foo(...) RETURNS SETOF something AS $$
#option lazy
SELECT ... $$ LANGUAGE SQL;

Mind you, I'm not exactly *for* this, because I think it will result
in making functions.c a whole lot more complex and hard to maintain
than it needs to be, in exchange for a behavior that I don't believe
is especially useful in most cases, and can easily be worked around
when it is useful.  But if people are going to be sticky about the
point, something like this might be a workable compromise.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Andrew Sullivan

On Wed, Oct 29, 2008 at 12:02:20PM +0200, Hannu Krosing wrote:
 
> A good goal. But why would anybody _need_ 50 slaves ? 

They might have a contractual responsibility for extremely wide
geographic distribution.  Or they might be building an application
that needs extremely wide network-topological distribution to avoid
large loads on any one network.  For instance, I can imagine building
a network of nameservers in which you peered the nameservers,
colocated in every ISP you could think of.  If you were backing the
nameserver with Postgres, this would work.  To be clear, this is _not_
the case with any product I've ever built, but it is a design I have
seen deployed.  That design was supposed to be on top of Oracle.
There were well over 50 slaves.  I don't really believe they had that
many Oracle-using slaves, though.

A


-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updating FSM on recovery

2008-10-29 Thread Robert Haas

> Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
> Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber
> blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
> Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, ForkNumber
> forkNum, BlockNumber blockNum, ReadBufferMode mode);
>
> Thoughts?

I'm not sure why we would abbreviate Extended to Ext when nothing else
in here is abbreviated.  Seems needlessly inconsistent.

We may also want to rethink our approach to BufferAccessStrategy a
bit.  Right now, we don't admit that
GetBufferAccessStrategy(BAS_NORMAL) just returns a NULL pointer - we
expect the caller to get that strategy and later call
FreeBufferAccessStrategy it just as if it were a real object.
Particularly in light of this API change, I think we should just give
up on that.  Otherwise, a caller that wants to specify a fork number
or ReadBufferMode has to get and free an access strategy that doesn't
amount to anything.  Perhaps it would be sufficient to do this:

#define NormalBufferAccessStrategy NULL

That way, it would be easy to grep for any place where we used this to
get around a useless pair of get/free calls if we ever need to go back
and make a normal buffer access strategy into a real object.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Robert Haas

> So my idea would be to have the SQL function behavior choose to return values
> either via tuplestore or via value-per-call, depending on the user
> setting "generator" or "lazy".
> Done this way, the user could also choose for the function to be lazy or to
> use a tuplestore whatever the language in which it's written.

The problem is not the general PostgreSQL executor, but whatever body
of code executes PL/pgsql functions (and other PL languages).  It does
not, as I understand it, support freezing execution of the function
midway through and picking up again later.  I haven't looked at the
code, but based on previous experience, that could turn out to be a
pretty major refactoring. I suspect it would be worthwhile and quite
welcome to many users - but I doubt very much that it would be easy.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some problem with casting unknown to smallint

2008-10-29 Thread Pavel Stehule

2008/10/29 Tom Lane <[EMAIL PROTECTED]>:
> "Pavel Stehule" <[EMAIL PROTECTED]> writes:
>> I understand. So every smallint should be call with casting?
>
> A long time ago we tried to make small integer literals be interpreted
> as int2 initially, instead of int4, and the attempt failed rather
> spectacularly.  (It broke most of the regression tests, IIRC, in ways
> that suggested that many client applications would have problems too.)
> Perhaps PG's type system has matured to the point where it'd work better
> now, but I'm not really interested in trying it.  I don't see very much
> point in declaring functions to take smallint rather than int anyway...

I found this question on one czech it specialized site. It's mostly
beginner's problem.

regards
Pavel Stehule


>
>regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some problem with casting unknown to smallint

2008-10-29 Thread Tom Lane

"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> I understand. So every smallint should be call with casting?

A long time ago we tried to make small integer literals be interpreted
as int2 initially, instead of int4, and the attempt failed rather
spectacularly.  (It broke most of the regression tests, IIRC, in ways
that suggested that many client applications would have problems too.)
Perhaps PG's type system has matured to the point where it'd work better
now, but I'm not really interested in trying it.  I don't see very much
point in declaring functions to take smallint rather than int anyway...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updating FSM on recovery

2008-10-29 Thread Tom Lane

Heikki Linnakangas <[EMAIL PROTECTED]> writes:
> The FSM would be perfectly happy to just initialize torn or otherwise 
> damaged pages, so I think we should add yet another mode to ReadBuffer() 
> to allow that. We could also treat read() errors as merely warnings in 
> that mode, effectively the same as with zero_damaged_pages=on.

> The ReadBuffer() interface is already pretty complex, with all the 
> different variants. We should probably keep the good old ReadBuffer() 
> the same, for the sake of simplicity in the callers, but try to reduce 
> the number of other variatns.

Indeed.  Did you see the discussion about the similarly-too-complex
heap_insert API a couple days ago in connection with bulk-write
scenarios?  The conclusion there was to try to shift stuff into a
bitmask options argument, in hopes that future additions might not
require touching every caller.  Can we do it similarly here?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some problem with casting unknown to smallint

2008-10-29 Thread Pavel Stehule

2008/10/29 Tom Lane <[EMAIL PROTECTED]>:
> "Pavel Stehule" <[EMAIL PROTECTED]> writes:
>> I am not sure, it's probably bug.
>
>> postgres=# create function t1(smallint) returns smallint as $$select
>> $1$$ language sql;
>> CREATE FUNCTION
>> postgres=# select t1(10);
>> ERROR:  function t1(integer) does not exist
>
> That's not "unknown to smallint", that's "integer to smallint",
> which is not an implicit cast.
>
> unknown to smallint would be t1('10')
>

I understand. So every smallint should be call with casting?

regards
Pavel Stehule


>regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] some problem with casting unknown to smallint

2008-10-29 Thread Tom Lane

"Pavel Stehule" <[EMAIL PROTECTED]> writes:
> I am not sure, it's probably bug.

> postgres=# create function t1(smallint) returns smallint as $$select
> $1$$ language sql;
> CREATE FUNCTION
> postgres=# select t1(10);
> ERROR:  function t1(integer) does not exist

That's not "unknown to smallint", that's "integer to smallint",
which is not an implicit cast.

unknown to smallint would be t1('10')

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Updating FSM on recovery

2008-10-29 Thread Heikki Linnakangas


Tom Lane wrote:

Heikki Linnakangas <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

As far as the ugliness in XLogRecordPageWithFreeSpace goes: couldn't you
just call XLogReadBufferWithFork with init = true, and then initialize
the page if PageIsNew?


With init=true, we don't even try to read the page from the disk (since 
8.3), so all FSM pages accessed during recovery would be zeroed out. I 
don't think that's what you intended.


Ah, right.  Maybe the API change you suggested in the comment is the
way to go.


Done, patch attached. But while I was hacking that, I realized another 
problem:


Because changes to FSM pages are not WAL-logged, they can be "torn" if 
at crash, one part of the page is flushed to disk, but another is not. 
The FSM code will recover from internally inconsistent pages, caused by 
torn pages or other errors, but we still have a problem if the FSM file 
is extended, and the new page is torn. It can happen that the first part 
of the page, containing the page header, doesn't make it to disk, but 
other parts of the page do. ReadBuffer() checks that the page header is 
valid, so it will throw an error on a torn page like that. ReadBuffer() 
doesn't complain about a page that is all-zeros, but it's not in this 
scenario.


The FSM would be perfectly happy to just initialize torn or otherwise 
damaged pages, so I think we should add yet another mode to ReadBuffer() 
to allow that. We could also treat read() errors as merely warnings in 
that mode, effectively the same as with zero_damaged_pages=on.


The ReadBuffer() interface is already pretty complex, with all the 
different variants. We should probably keep the good old ReadBuffer() 
the same, for the sake of simplicity in the callers, but try to reduce 
the number of other variatns.


The current API is this:

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferWithFork(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum);
Buffer ReadBufferWithStrategy(Relation reln, BlockNumber blockNum, 
BufferAccessStrategy strategy);
Buffer ReadOrZeroBuffer(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, 
ForkNumber forkNum, BlockNumber blockNum, bool zeroPage);


Here's my proposal for new API:

typedef enum
{
  RBM_NORMAL,   /* checks header, ereport(ERROR) on errors */
  RBM_INIT,		/* just allocate a buffer, don't read from disk. Caller 
must initialize the page */
  RBM_INIT_ON_ERROR	/* read, but instead of ERRORing, return an 
all-zeros page */

} ReadBufferMode;

Buffer ReadBuffer(Relation reln, BlockNumber blockNum);
Buffer ReadBufferExt(Relation reln, ForkNumber forkNum, BlockNumber 
blockNum, BufferAccessStrategy strategy, ReadBufferMode mode);
Buffer ReadBufferWithoutRelcache(RelFileNode rnode, bool isTemp, 
ForkNumber forkNum, BlockNumber blockNum, ReadBufferMode mode);


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/access/heap/heapam.c
--- src/backend/access/heap/heapam.c
***
*** 54,59 
--- 54,60 
  #include "miscadmin.h"
  #include "pgstat.h"
  #include "storage/bufmgr.h"
+ #include "storage/freespace.h"
  #include "storage/lmgr.h"
  #include "storage/procarray.h"
  #include "storage/smgr.h"
***
*** 4029,4034  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
--- 4030,4036 
  	int			nredirected;
  	int			ndead;
  	int			nunused;
+ 	Size		freespace;
  
  	if (record->xl_info & XLR_BKP_BLOCK_1)
  		return;
***
*** 4060,4065  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
--- 4062,4069 
  			nowunused, nunused,
  			clean_move);
  
+ 	freespace = PageGetHeapFreeSpace(page); /* needed to update FSM below */
+ 
  	/*
  	 * Note: we don't worry about updating the page's prunability hints.
  	 * At worst this will cause an extra prune cycle to occur soon.
***
*** 4069,4074  heap_xlog_clean(XLogRecPtr lsn, XLogRecord *record, bool clean_move)
--- 4073,4087 
  	PageSetTLI(page, ThisTimeLineID);
  	MarkBufferDirty(buffer);
  	UnlockReleaseBuffer(buffer);
+ 
+ 	/*
+ 	 * Update the FSM as well.
+ 	 *
+ 	 * XXX: We don't get here if the page was restored from full page image.
+ 	 * We don't bother to update the FSM in that case, it doesn't need to be
+ 	 * totally accurate anyway.
+ 	 */
+ 	XLogRecordPageWithFreeSpace(xlrec->node, xlrec->block, freespace);
  }
  
  static void
***
*** 4212,4226  heap_xlog_insert(XLogRecPtr lsn, XLogRecord *record)
  	HeapTupleHeader htup;
  	xl_heap_header xlhdr;
  	uint32		newlen;
  
  	if (record->xl_info & XLR_BKP_BLOCK_1)
  		return;
  
  	if (record->xl_info & XLOG_HEAP_INIT_PAGE)
  	{
! 		buffer = XLogReadBuffer(xlrec->target.node,
! 			 ItemPointerGetBlockNumber(&(xlrec->target.tid)),
! true);
  		Assert(BufferIsValid(buffer));
  		page = (Page) BufferGetPage(buffer);
  
--- 4225,4241

Re: [HACKERS] WIP patch: convert SQL-language functions to return tuplestores

2008-10-29 Thread Dimitri Fontaine

Le mardi 28 octobre 2008, Pavel Stehule a écrit :
> 2008/10/28 Dimitri Fontaine <[EMAIL PROTECTED]>:
> > Hi,
> >
> > In the python language, functions that lazily return collections are
> > called generators and use the yield keyword instead of return.
> > http://www.python.org/doc/2.5.2/tut/node11.html#SECTION001110
> >0
> >
> > Maybe having such a concept in PostgreSQL would allow the user to choose
> > between current behavior (materializing) and lazy computing, with a new
> > internal API to get done in the executor maybe.
>
> lazy computing is good idea, but I am afraid so it should be really
> wery hard implemented. You should to store somewhere current state,
> stop execution, return back from node, and you should be able restore
> PL state and continue in process. I can't to see it without thread
> support.

I'm not sure to understand what is the current situation then. By reading this 
Tom's commit message
  Extend ExecMakeFunctionResult() to support set-returning functions that
  return via a tuplestore instead of value-per-call
  ...
  For the moment, SQL functions still do things  the old way.
  
http://git.postgresql.org/?p=postgresql.git;a=commit;h=6d5301be5ece6394433d73288e0fafaed6326485


I had the impression we already have a lazy implementation, this 
value-per-call returning code path, which still exists for SQL functions.

> > CREATE FUNCTION my_generator_example(integer, integer)
> >  returns setof integer
> >  generator
> >  language SQL
> > $f$
> >  SELECT generate_series($1, $2);
> > $f$;

So my idea would be to have the SQL function behavior choose to return values 
either via tuplestore or via value-per-call, depending on the user 
setting "generator" or "lazy".
Done this way, the user could also choose for the function to be lazy or to 
use a tuplestore whatever the language in which it's written.

Current behaviour would then mean the default depends on the language, lazy 
for SQL and tuplestore for PL/pgSQL. Well, it will have to be documented, 
whatever the final choice is.

Is it possible? A good idea?
-- 
dim


signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Hannu Krosing

On Tue, 2008-10-28 at 22:16 -0700, Joshua D. Drake wrote: 
> Alvaro Herrera wrote:
> > Hannu Krosing wrote:
> >> On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
> 
> >> Will there be an helper application for setting up and configuring
> >> changes in replication. or will it all be done using added SQL
> >> commands ?
> > 
> > Well, the interface I work on is all SQL commands :-)
> >
> 
> Case in point. To replicate a table currently you do this:
> 
> ALTER TABLE foo ENABLE REPLICATION;
> ALTER TABLE foo ENABLE REPLICATION ON SLAVE 0;

How do you define SLAVE 0 ?

That is, do you use something similar to SQL/MED , discussed currently
on other thread on pgsql-hackers, or just tables, or external conf
files, or ... ?

> >>> Hmm I am not sure. We are pretty deep into the core and only use
> >>> triggers for GRANT/REVOKE/CREATE ROLE .
> >> By the way, why did you choose "pretty deep into the core" approach
> >> instead of triggers ?
> > 
> > Speed maybe?  I don't know.
> 
> Well actually that was my decision years ago and it was made for two 
> reasons:
> 
> 1. Speed

Is this some MySQL kind of speed, achieved by compromising on ACID ?

> 2. Lots of people think trigger replication is a hack. (not interested 
> in arguing just stating an observation).

And making the illogical conclusion that non-trigger replication is not a hack ?

Probably the same people who think that "expensive" and "closed source" 
automatically means high quality :)

Actually I see trigger based replication as a power tool. You can make it 
(almost) 
as fast as any source level hack, even if the latter compromises on on ACID, 
plus 
you can be much more flexible on what/when/how you replicate.

> One of the core requirements of the original replicator which can still 
> be seen today AND is a requirement of 1.9 as well is:
> 
> Though shall not cause grief to thy master even if thy replicates many 
> children.

IOW, one slave and many slaves cause similar load on master.

And you do it by initially replicating to a single "distributor" slave ?

> That is where the original idea of the MCP came from. They theory was, 
> we could have 50 slaves and the master wouldn't care.

A good goal. But why would anybody _need_ 50 slaves ? 

Perhaps some application with small data amount (meaning it fits on one server) 
and massively (100:1) skewed towards reading, like dynamically generated 
mostly-static-content web pages ?

> >> I mean, you probably end up duplicating (or missing) lots of
> >> postgreSQL-s internal goodness instead of just using what is already
> >> available ?
> > 
> 
> Well yes and no. We have made more and more use of the internal 
> postgresql code through 1.7, 1.8. I expect that trend will continue. A 
> hairy wart would be the use of own log, but hey its BSD if someone can 
> figure out how to make it work with WAL and not lose the feature set we 
> have 

Why not just make a new kind of wal-logged heap, which stores your version of 
trx log ?

> I would not vote against it and would happily except the patch 
> (assuming technical sign off by Alvaro and Alexey).
>
> >> I hope something useful will come out of this too, though I hoped that
> >> it already had some advantages over trigger-based replication, like
> 
> Well it does. We can replicate large objects for example 

I guess we get more benefit to general public in long run by making it possible 
to have triggers on large objects (and  system objects or DLL statements).

They could be some kind of restricted or "system" triggers, at least for DDL, 
but 
exposing the mechanisms to power users without them having to start patching 
backend 
code would be a good thing.

> and our user 
> experience is far more friendly than anything else.

Do you mean that you have a friendly UI for beginners ?

Or is it proved to be robust in continued use, with changes going smoothly 
in and automatic recovery from network outages and random reboots on live 
systems in 24/7 environment ?

> >> ability to replicate DDL .
> > 
> 
> That is what 1.9 is all about. Remember that the "hope" (no laughing 
> now) is that 1.9 will hit for 8.3 and 8.4 around the time 8.4 releases. 
> So this isn't a year away.
> 
> > I fear that our approach to replication is so ad-hoc that there's not
> > much to be gained from elsewhere.  Replicator is pretty much a fork
> > that's not likely to yield anything useful to upstream.
> 
> That is probably true.

If/when you get DDL replication done (and tested, and robust, ... :) ), it 
at least solves the following two problems for others as well

1) where to get at DDL statements at right time

2) how to put them in replication stream in right place/ right order

Having these solved is a big thing in itself, even if no actual code reuse is 
possible.

It may be easier to get nr 2) right in a more flexible trigger-based 
environment, 
so I try to talk Marko into looking at it for pgQ as well ;)

-- 
--
Hannu Krosing

Re: [HACKERS] Proposal of PITR performance improvement for 8.4.

2008-10-29 Thread Simon Riggs


On Wed, 2008-10-29 at 09:55 +0900, Koichi Suzuki wrote:

> I'd like to hear some more about these.   I'm more than happy to write
> all the code inside PG core to avoid overhead to create another
> process.

Having an external program can help earlier releases also, so I think
this is the right approach for now.

In next PG release we should bring this into core, along with streaming.

Interface would be better if it accepted
   pg_readahead filename
or pg_readahead filename start-lsn

We don't always need a starting lsn.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Updates of SE-PostgreSQL 8.4devel patches (r1155)

2008-10-29 Thread KaiGai Kohei


I've updated my patches, these are ready for CommitFest:Nov.

[1/6] 
http://sepgsql.googlecode.com/files/sepostgresql-sepgsql-8.4devel-3-r1155.patch
[2/6] 
http://sepgsql.googlecode.com/files/sepostgresql-pg_dump-8.4devel-3-r1155.patch
[3/6] 
http://sepgsql.googlecode.com/files/sepostgresql-policy-8.4devel-3-r1155.patch
[4/6] 
http://sepgsql.googlecode.com/files/sepostgresql-docs-8.4devel-3-r1155.patch
[5/6] 
http://sepgsql.googlecode.com/files/sepostgresql-tests-8.4devel-3-r1155.patch
[6/6] 
http://sepgsql.googlecode.com/files/sepostgresql-row_acl-8.4devel-3-r1155.patch

The comprehensive documentation for SE-PostgreSQL is here:
  http://wiki.postgresql.org/wiki/SEPostgreSQL (it is now under reworking.)

List of updates:
- Patches are rebased to the latest CVS HEAD.
- bugfix: TRUNCATE checks assumed SECCLASS_DB_TUPLE object class
- bugfix: sepgsqlCopyFile assumed SECCLASS_FILE object class, but it has to be
  adjusted by st_mode.

Request for Comments:
- The 4th patch is actually needed? It can be replaced by wiki page.
- Do you think anything remained towards the final CommitFest?
- Do you have any reviewing comment? Most of patches are unchanged from
  the previous vesion. If you can comment anything, I can fix them without
  waiting for the final commit fest.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <[EMAIL PROTECTED]>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal of PITR performance improvement for 8.4.

2008-10-29 Thread Simon Riggs


On Tue, 2008-10-28 at 14:21 +0200, Heikki Linnakangas wrote:

> 1. You should avoid useless posix_fadvise() calls. In the naive 
> implementation, where you simply call posix_fadvise() for every page 
> referenced in every WAL record, you'll do 1-2 posix_fadvise() syscalls 
> per WAL record, and that's a lot of overhead. We face the same design 
> question as with Greg's patch to use posix_fadvise() to prefetch index 
> and bitmap scans: what should the interface to the buffer manager look 
> like? The simplest approach would be a new function call like 
> AdviseBuffer(Relation, BlockNumber), that calls posix_fadvise() for the 
> page if it's not in the buffer cache, but is a no-op otherwise. But that 
> means more overhead, since for every page access, we need to find the 
> page twice in the buffer cache; once for the AdviseBuffer() call, and 
> 2nd time for the actual ReadBuffer(). 

That's a much smaller overhead than waiting for an I/O. The CPU overhead
isn't really a problem if we're I/O bound.

> It would be more efficient to pin 
> the buffer in the AdviseBuffer() call already, but that requires much 
> more changes to the callers.

That would be hard to cleanup safely, plus we'd have difficulty with
timing: is there enough buffer space to allow all the prefetched blocks
live in cache at once? If not, this approach would cause problems.

> 2. The format of each WAL record is different, so you need a "readahead 
> handler" for every resource manager, for every record type. It would be 
> a lot simpler if there was a standardized way to store that information 
> in the WAL records.

I would prefer a new rmgr API call that returns a list of blocks. That's
better than trying to make everything fit one pattern. If the call
doesn't exist then that rmgr won't get prefetch.

> 3. IIRC I tried to handle just a few most important WAL records at 
> first, but it turned out that you really need to handle all WAL records 
> (that are used at all) before you see any benefit. Otherwise, every time 
> you hit a WAL record that you haven't done posix_fadvise() on, the 
> recovery "stalls", and you don't need much of those to diminish the gains.
> 
> Not sure how these apply to your approach, it's very different. You seem 
> to handle 1. by collecting all the page references for the WAL file, and 
> sorting and removing the duplicates. I wonder how much CPU time is spent 
> on that?

Removing duplicates seems like it will save CPU.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Feature Request - Table Definition query

2008-10-29 Thread Svenne Krap



It could be useful to have a command that returns the table definition 
(like pg_dump -st) from within the query interface. This could be 
particularly useful if one doesn't have access to or prefers not to 
manipulate results coming from a shell. If I have an API from which to 
query the database, it seems silly to have to spawn a shell and run a 
command to query the very same database.
Yes. That functionality would be really appreciated by myself and a few 
pgsql-users I know.
(I'm not exactly familiar with how pg_dump manipulates or extracts 
that data, so this could be a very, very stupid question. If so, 
please forgive me.)
Me neither, but I think one of the problems is, that during upgrade you 
are supposed to use pg_dump from the new version on the existing 
database instance.
The server could most probably only generate its own format (not the 
new) which might be a problem.


Perhaps it could be implemented by stored function (autoinstalled and 
available in contrib or perhaps installed by a psql on first run) in a 
namespace like pg_dump_84 (i.e. tagged with server version). If you want 
to upgrade you could then install the new set of functions..

Psql could then be a thin wrapper (which calls the needed functions).

/Svenne


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] some problem with casting unknown to smallint

2008-10-29 Thread Pavel Stehule

Hello

I am not sure, it's probably bug.

postgres=# create function t1(smallint) returns smallint as $$select
$1$$ language sql;
CREATE FUNCTION
postgres=# create function t2(bigint) returns bigint as $$select $1$$
language sql;
CREATE FUNCTION
postgres=# select t1(10);
ERROR:  function t1(integer) does not exist
LINE 1: select t1(10);
   ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

but
postgres=# select t2(10);
 t2

 10
(1 row)

Regards
Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL + Replicator developer meeting 10/28

2008-10-29 Thread Hannu Krosing

On Tue, 2008-10-28 at 22:37 -0300, Alvaro Herrera wrote:
> Hannu Krosing wrote:
> > On Tue, 2008-10-28 at 15:18 -0700, Joshua Drake wrote:
> 
> > > The two obvious problems with the existing MCP architecture is:
> > > 
> > >  1. Single point of failure
> > 
> > For async replication there is always SPoF, at least the master until
> > first slave has aquired log is a SPoF, or do you plan that both Master
> > and "MCP|Slave" to keep the log and be able to step in for each other if
> > the other fails?
> 
> Yeah, with the new architecture there is still going to be a bit of a
> SPoF in the master->MCP but it's a lot smaller than the current setup,
> in which if you lose the MCP you basically lose everything.
> 
> > >  2. Portability
> > 
> > Portability to where ? Other DBMS's ? Other PG versions ?
> 
> Other operating systems mainly.  The trouble is we never got around to
> porting the MCP to any OS beyond Linux; I think it should work on
> Solaris and BSDs, but surely not Windows.  We want to just get rid of
> what I consider a (crappy) reimplementation of postmaster; instead we
> should just let postmaster do the job.
> 
> Additionally we would get rid of the ugly way we "import" backend code
> into the MCP server.
> 
> 
> > for me there was also two more problems:
> > 
> > 3. separate "replication log", which at least seems to be able to get
> > out of sync with main DB. 
> > 
> > Why don't you just use a DB table, WAL-logged and all
> 
> The whole replication log thing is a topic of dissent in the team ;-)

I see. To work reliably, the replication log should work very similar to
WAL, so why just not use a table + WAL, or if you want extra performance
from storing it on a separate disk, then work on having multiple WAL's
in backend ;)

> > 4. Also, again from reading Replicator FAQ, it seems that there is a
> > window of corruption/data loss when rotating the Replicators transaction
> > log. I think that doing it with copy/truncate either needs locking the
> > logfile (== bad performance, during copy/truncate) or is just a
> > data-eating failure waiting to happen.
> 
> Hmm, what Replicator FAQ?  We used to have this copy/truncate problem,
> and we rearchitected the log to avoid this (we use a rotating setup
> now)

it was in subsection "mcp_server mysteriously dies"
http://www.commandprompt.com/products/mammothreplicator/tips , 

> > > Master->MCP|Slave ->Slave1
> > >   ->Slave2
> > >   ->Slave3
> > > 
> > > The process being, Master sends data to MCP|Slave, MCP|Slave writes it
> > > to disk (optionally restores it)
> > 
> > Will this first send be sync or async ? Or have you planned to have it
> > be configurable among several robustness vs. performance levels, similar
> > to the planned integrated WAL-shipping.
> 
> It is async, and we haven't talked about sync.
> 
> > if async, will it also use MVCC for keeping log on Master (l.ike Slony
> > and pgQ do), just to be at least as reliable as postgreSQL core itself
> > and not require a full resync at server crash.
> 
> You mean WAL?  We don't currently.

So hopw do you cope with possible loss of sync on master crash ?

> > > Alvaro or Alexey can speak more technically about implementation than I
> > > can.
> > 
> > Alvaro - I guess you already have discussed most of it, but basically
> > you need to solve all the same problems that WAL-shipping based Hot
> > Standby is solving and Slony/pgQ/Londiste have solved.
> 
> If you mean that we're duplicating the effort that's already going
> elsewhere, my opinion is yes, we are.

duplicating the effort is not always a bad thing. I was mostly
suggesting to watch discussions and dig around in materials and/or
asking people who have been working on these same issues.

And of course to _think_ deeply about design before writing lots of
duplicate code which ends up being an often inferior implementation of
something that already exists, ( see:
http://thedailywtf.com/Articles/The_Complicator_0x27_s_Gloves.aspx )
;-)


> > Hopefully you get it more robust than Slony when making changes under
> > high load :)
> 
> Hmm, I don't know about lack of robustness in Slony, so I don't know.

Slony is brittle once you start using it under high load and tends to
display all kinds of frustrating qualities 

 1) it has not enough controls put in for conf changes  to guarantee
either success or clean rollback, do if something goes wrong (like some
conf change has not propagated to all nodes, in right order, you end up
with no working replication.

 2) you usually can't test for 1) on your test setup, as it happens only
under really high loads, which most test setups don't provide.

there are/were other warts (like forcing an index scan covering the
whole table, or being unable to continue replication after some slonik
downtime because postgreSQL would give query too complex errors on
generated 700kb lobg query), some of which are fixed in 1.x, some are
maybe fixed in 2.0. 

I was a heavy user (at Sky

82 matches

Mail list logo