date:20140415

[HACKERS] BGWorkers, shared memory pointers, and postmaster restart

2014-04-15 Thread Craig Ringer

Hi all

I've been using the dynamic BGWorker support for some recent work, and I
think I've found an issue with how postmaster restarts are handled.

TL;DR: I don't think there's a safe way to use a BGWorker (static or
dynamic) with bgw_restart_time != BGW_NEVER_RESTART and a bgw_main_arg
Datum that points into shared memory, and think we might need a API
change to fix that.



If the postmaster restarts due to a crash, it calls shared memory
startup hooks and sets the found pointer to false, so they're re-inited.
This makes sense as shm is presumed to be corrupt.

Dynamic BGWorkers are killed as a part of restart. However, they're not
unregistered, and are relaunched if there's a bgw_restart_time set.
They're called with the same bgw_main_arg Datum as with the original launch.

If this Datum is a pointer into the shared memory that just got zeroed
and re-inited, exciting things happen.

In my case the worker restart generally happens after shm is re-inited,
and since it gets reinitialized with the same contents the dynamic
bgworker registered during postmaster startup restarts normally. I only
noticed the problem because my shared memory init hook launches
bgworkers once shm is set up, and I was getting two copies of a bgworker
after postmaster restart.

This also affects static bgworkers that use a pointer into shared memory
to pass data on EXEC_BACKEND platforms (Windows).

For dynamic bgworkers, it seems like it'd be OK to just require
extensions to re-register them after a postmaster restart, or add a
BGW_UNREGISTER_ON_POSTMASTER_RESTART flag to let that be controlled so
bgworkers that didn't receive pointers into shm didn't have to deal with
it. So any pointer into shared memory that's being re-initialized would
be discarded when the bgworker was unregistered during postmater
restart. Dynamic bgworkers are new in 9.4, so we still have the freedom
to change behaviour for them.

But that won't fix static bgworkers that have a pointer into shm as an
argument. In non-EXEC_BACKEND platforms we can just pass a pointer to
palloc'd postmaster memory, but that won't work if we have to exec()
after fork(). I don't think it's reasonable to say "well, don't do that"
- if you can only send a single pass-by-value Datum to a bgworker's main
function, their utility is greatly reduced.


I could always set up an exit hook / SIGQUIT handler that tries to
unregister dynamic bgworkers using their BGWorkerHandle s, from the
worker that initially registered them. If the worker that registered
them is the one that crashed and triggered a postmaster restart this
won't do any good, so it's a half-solution at best.

I can't stash the BGWorkerHandle s for the launched workers in shm and
unregister them during postmaster restart either. For one thing, shm is
presumed corrupt. For another, BGWorkerHandle is an incomplete struct
with no exposed size, so I can't copy it into shm anyway.

This seems like a bit of a pickle. Am I missing something obvious?

The only way around it that I can currently see is to have a single
static bgworker with no pointer argument launch and manage all the other
workers required for the extension. It would launch them all with
bgw_restart_time = BGW_NEVER_RESTART and set its self as the
bgw_notify_pid . If they die, it unregisters them then registers a new
one. Effectively, it's doing the work the bgworker code does already,
except that it makes sure the bgworkers don't get relaunched on
postmaster restart. Since it doesn't depend on shm being valid, this
should work, but it's messy and seems like there should be a better way.

Do we need to change how we define BGWorkers to require that a BGWorker
with a Datum that points to shared memory be automatically unregistered
by the postmaster on restart? This would have to apply to static
BGWorkers too, so we'd want to think about adding a flag for it and
maybe even backpatching the flag into 9.3; it'd only affect extensions
that actually used the combination described above.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-15 Thread Craig Ringer

On 04/15/2014 10:17 PM, Tom Lane wrote:
>> > I actually think we should *add* a LIBPQEXPORT that handles this for
>> > libpq, much like PGDLLEXPORT does for postgres(.exe). And in the
>> > process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or
>> > something.
> My reaction to that is "not bloody likely".  I remarked on this upthread
> already, but there is absolutely no way that I want to clutter our source
> code with platform-specific markings like that.
> 
> Perhaps somebody could try a Windows build with PGDLLEXPORT defined to
> empty, and verify that it works, and if so do a pgbench comparison
> against a build done the existing way?

Good idea.

Personally, I don't care about Windows enough. I want it to work, but
performance optimisation is beyond what I'm bothered with.

Another useful test would be to modify libpq as described above, so its
headers set __declspec(dllexport) on its exports during compilation and
its headers set __declspec(dllimport) when included while compiling
other binaries that will link to libpq. Then use *that* in pgbench too
and see if it makes any meaningful difference.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?

2014-04-15 Thread Michael Paquier

On Wed, Apr 16, 2014 at 12:35 PM, Amit Langote  wrote:
> Hi,
>
> Attached adds CREATE MATERIALIZED VIEW AS to the list of statements
> that can be EXPLAINed.
Now that you mention that, REFRESH MATERIALIZED VIEW can be EXPLAIN'ed
as well, except that it returns that and does not error out:
=# explain refresh materialized view aam;
QUERY PLAN
---
 Utility statements have no plan structure
(1 row)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can we make beta testing better?

2014-04-15 Thread Rod Taylor

On Tue, Apr 15, 2014 at 5:47 PM, Josh Berkus  wrote:

> Hackers,
>
> I think 9.3 has given us evidence that our users aren't giving new
> versions of PostgreSQL substantial beta testing, or if they are, they
> aren't sharing the results with us.
>
> How can we make beta testing better and more effective?  How can we get
> more users to actually throw serious workloads at new versions and share
> the results?
>
> I've tried a couple of things over the last two years and they haven't
> worked all that well.  Since we're about to go into another beta testing
> period, we need something new.  Ideas?
>

I think it boils down to making it really easy to create a workload
generator. Most companies have simple single-threaded regression tests for
functionality but very few companies have good parallel workload generators
which reflect activities in their production environment.

A documented beta test process/toolset which does the following would help:
1) Enables full query logging
2) Creates a replica of a production DB, record $TIME when it stops.
3) Allow user to make changes (upgrade to 9.4, change hardware, change
kernel settings, ...)
4) Plays queries from the CSV logs starting from $TIME mimicking actual
timing and transaction boundaries

If Pg can make it easy to duplicate activities currently going on in
production inside another environment, I would be pleased to fire a couple
billion queries through it over the next few weeks.

#4 should include reporting useful to the project, such as a sampling of
queries which performed significantly worse and a few relative performance
stats for overall execution time.

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Robert Haas

On Mon, Apr 14, 2014 at 1:11 PM, Peter Geoghegan  wrote:
> In the past, various hackers have noted problems they've observed with
> this scheme. A common pathology is to see frantic searching for a
> victim buffer only to find all buffer usage_count values at 5. It may
> take multiple revolutions of the clock hand before a victim buffer is
> found, as usage_count is decremented for each and every buffer.  Also,
> BufFreelistLock contention is considered a serious bottleneck [1],
> which is related.

I think that the basic problem here is that usage counts increase when
buffers are referenced, but they decrease when buffers are evicted,
and those two things are not in any necessary way connected to each
other.  In particular, if no eviction is happening, reference counts
will converge to the maximum value.  I've read a few papers about
algorithms that attempt to segregate the list of buffers into "hot"
and "cold" lists, and an important property of such algorithms is that
they mustn't be allowed to make everything hot.  It's easy to be too
simplistic, here: an algorithm that requires that no more than half
the list be hot will fall over badly on a workload where the working
set exceeds the available cache and the really hot portion of the
working set is 60% of the available cache.  So you need a more
sophisticated algorithm than that.  But that core property that not
all buffers can be hot must somehow be preserved, and our algorithm
doesn't.

This isn't a fundamental property of the usage-count idea; it's an
artifact of the fact that usage count decreases are tied to eviction
pressure rather than access pressure.  For example, suppose we made a
rule that if the total usage counts of all buffers exceed 3 *
NBuffers, then every time you bump the usage count of a buffer from N
to N+1, you're required to advance the clock sweep far enough to
decrease the reference count of a buffer by one.  When you want to
reclaiim a buffer, you advance a separate clock sweep until you find a
buffer with a zero usage count; if you circle the whole ring without
finding one, then you reclaim the buffer you saw with the lowest usage
count.  There are obvious scalability problems here (everyone fighting
over the right to advance the clock sweep) but ignore that for the
sake of the thought experiment: now you have an algorithm where not
all buffers can be hot.  If some buffers are hotter than others, then
whenever their usage count is decreased it will immediately get pushed
back up again, but some other buffer then has to absorb the decrease.
Only the buffers that are really hot can maintain high usage counts,
because *somebody* has to have a low usage count.

Even ignoring scalability concerns, this might not be (and probably
isn't) exactly what we want to implement, but I think it illustrates
an important control principle all the same: buffer "cooling" needs to
be driven by the same underlying phenomenon - probably buffer access -
as buffer "heating".  If they're driven by unrelated phenomena, then
the rates may be wildly incomparable, and you'll end up with
everything hot or everything cold.  If that happens, you lose, because
with everything the same, there's no principled way to decide which
things are actually best to evict.

If we come up with some good solution for shared buffers, we should
also consider it applying it to SLRU eviction.  I believe that the
current situation around CLOG eviction is none too pretty.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

On 04/16/2014 06:12 AM, Hannu Krosing wrote:
> On 04/16/2014 01:25 AM, Tom Lane wrote:
>> Hannu Krosing  writes:
>>> Is there a way to force it to prefer a plan where the results of (select
>>> id from onemillion where data > '0.9' limit 100)
>>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
>>> all at once ?
>> You could write the query like that:
>>
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
> My actual use-case was about a join between a local and a remote table
> and without rewriting the query (they come from ORM)
>
> I was hoping to be able to nudge postgresql towards a better plan via some
> tuning of table/fdw options or GUCs.
>
> for example, would postgresql use the WHERE id IN (...) query on remote
> side for a query like
>
> select r.data, l.data
>   from onemillion_pgfdw r
>   join onemillion l
> on r.id = l.id and l.data > '0.999';
>
> if it recognizes that the local side returns only 1000 rows ?
>
> or would it still use 1000 individual WHERE id = $1 queries.
>
> Is getting the foreign data via IN and then turning the data into a hash
> for joining one of the plans it considers at all ?
It sees that could we need an extra tuning parameter for choosing the

ID IN (...) + HASH plan over individual SELECT .. WHERE ID = $1

something between

`fdw_startup_cost` and `fdw_tuple_cost`

to signify that an IN query returning 1000 rows runs faster than 1000 =
queries

as I understan currently they both would be estimated as

fdw_startup_cost + 1000 * fdw_tuple_cost

the new parameter could be fdw_call_cost or fdw_query_cost and would
estimate
how much each individual call to fdw costs, thus favouring calls which
return more
data in one call

Cheers
Hannu
>
> Best
> Hannu
>
>> Or at least you should be able to, except when I try it I get
>>
>> explain analyze
>> select * from onemillion_pgfdw where id = any (array(select id from
>> onemillion where data > '0.9' limit 100));
>> ERROR:  operator does not exist: integer = integer[]
>> HINT:  No operator matches the given name and argument type(s). You might 
>> need to add explicit type casts.
>> CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
>> public.onemillion WHERE ((id = ANY ((SELECT null::integer[]
>>
>> so there's something the remote-estimate code is getting wrong here.
>> (It seems to work without remote_estimate, though.)
>>
>>  regards, tom lane
>


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Amit Kapila

On Wed, Apr 16, 2014 at 5:00 AM, Peter Geoghegan  wrote:
> On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma  wrote:
>> There's a paper on a non blocking GCLOCK algorithm, that does lock
>> free clock sweep and buffer pinning[7]. If we decide to stay with
>> GCLOCK it may be interesting, although I still believe that some
>> variant of buffer nailing[8] is a better idea, my experience shows
>> that most of the locking overhead is cache line bouncing ignoring the
>> extreme cases where our naive spinlock implementation blows up.
>
> You might be right about that, but lets handle one problem at a time.
> Who knows what the bottleneck will end up being if and when we address
> the naivety around frequency? I want to better characterize that
> problem first.

Just to summarize you about the previous discussion and the
improvements that we decided to do in this area based on feedback
are as follows:

1. Bgwriter needs to be improved so that it can help in reducing
usage count and finding next victim buffer (run the clock sweep
and add buffers to the free list).
2. SetLatch for bgwriter (wakeup bgwriter) when elements in freelist
are less.
3. Split the workdone globallock (Buffreelist) in StrategyGetBuffer
(a spinlock for the freelist, and an lwlock for the clock sweep).
Here we can try to make it lock free based on atomic ops as
well.
4. Bgwriter needs to be more aggressive, logic based on which it
calculates how many buffers it needs to process needs to be
improved.
5. Contention around buffer mapping locks.
6. Cacheline bouncing around the buffer header spinlocks, is there
anything we can do to reduce this?
7. Choose Optimistically used buffer in StrategyGetBuffer().
8. Don't bump the usage count every time buffer is pinned.

I have already addressed some of these improvements in patch[1]
and for other's, I have plan to work on them for 9.5.

I think here you want to address the improvements related to usage
count and see if it can get us win in some of commonly used scenario's,
without affecting any other commonly used scenario.  I feel this is good
idea to pursue and see if we can get good benefits with it.

Infact few days back, I had ran some tests manually to see the
problems around BufFreeListLock (currently I don't have script ready)
and more recently Jason Petersen has done some benchmarking
in this area which you can refer it here[2].

I wonder if we can work together to improve things in this area.

[1]
http://www.postgresql.org/message-id/006e01ce926c$c7768680$56639380$@kap...@huawei.com
[2]
https://googledrive.com/host/0Bx33JCTmOADOeTIwaE9KX21yWEk/Concurrency%20Limits%20with%20Large%20Working%20Sets

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

On 04/16/2014 01:25 AM, Tom Lane wrote:
> Hannu Krosing  writes:
>> Is there a way to force it to prefer a plan where the results of (select
>> id from onemillion where data > '0.9' limit 100)
>> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
>> all at once ?
> You could write the query like that:
>
> select * from onemillion_pgfdw where id = any (array(select id from
> onemillion where data > '0.9' limit 100));
My actual use-case was about a join between a local and a remote table
and without rewriting the query (they come from ORM)

I was hoping to be able to nudge postgresql towards a better plan via some
tuning of table/fdw options or GUCs.

for example, would postgresql use the WHERE id IN (...) query on remote
side for a query like

select r.data, l.data
  from onemillion_pgfdw r
  join onemillion l
on r.id = l.id and l.data > '0.999';

if it recognizes that the local side returns only 1000 rows ?

or would it still use 1000 individual WHERE id = $1 queries.

Is getting the foreign data via IN and then turning the data into a hash
for joining one of the plans it considers at all ?

Best
Hannu

>
> Or at least you should be able to, except when I try it I get
>
> explain analyze
> select * from onemillion_pgfdw where id = any (array(select id from
> onemillion where data > '0.9' limit 100));
> ERROR:  operator does not exist: integer = integer[]
> HINT:  No operator matches the given name and argument type(s). You might 
> need to add explicit type casts.
> CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
> public.onemillion WHERE ((id = ANY ((SELECT null::integer[]
>
> so there's something the remote-estimate code is getting wrong here.
> (It seems to work without remote_estimate, though.)
>
>   regards, tom lane


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Kouhei Kaigai

> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane  wrote:
> >>> Yeah.  After a fast review of the custom-scan and cache-scan
> >>> patches, it seems to me that my original fears are largely
> >>> confirmed: the custom scan patch is not going to be sufficient to
> >>> allow development of any truly new plan type.  Yeah, you can plug in
> >>> some new execution node types, but actually doing anything
> >>> interesting is going to require patching other parts of the system.
> >
> >> Without prejudice to the rest of what you said, this argument doesn't
> >> hold much water with me.  I mean, anything that our extensibility
> >> mechanism doesn't support today will require new hooks, but does that
> >> mean we're never going to add any more hooks?  I sure hope not.
> >
> > No, that's not what I said.  ISTM that the argument for the
> > custom-scan API is that it allows interesting new things to be done
> > *without further modifying the core code*.  But the example
> > application (cache_scan) fails to demonstrate that, and indeed seems
> > to be a counterexample.  Whether we'd accept cache_scan on its own
> > merits is a separate question.  The problem for me is that custom-scan
> > isn't showing that it can support what was claimed without doing
> > serious damage to modularity and maintainability of the core code.
> 
> I think there's two separate things in there, one of which I agree with
> and one of which I disagree with.  I agree that we must avoid damaging the
> modularity and maintainability of the core code; I don't agree that
> custom-scan needs to be able to do interesting things with zero additional
> changes to the core code.  If we come up with three interesting applications
> for custom scan that require 5 new hooks between them, I'll consider that
> a major success - assuming those hooks don't unduly limit future changes
> we may wish to make in the core code.  I think your concern about exposing
> APIs that may not be terribly stable is well-founded, but I don't think
> that means we shouldn't expose *anything*.
> 
I agree 100%.

We usually change hook definition release-by-release, and it is author's
responsibility to follow the newer interface if he continues to maintain
his extension on the newer release also.
Probably, it is a gray stuff neither black nor white. If we can design
a perfect interface, it might be good but has no evolution further.
Of course, it does not justify poor designed interface, but an important
stuff is to find out a best way at this moment. It may take core
refactoring, not just exposing static functions. What I tried to implement
is the only way to implement it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Kouhei Kaigai

> Andres Freund  writes:
> > What I think this discussion shows that this patch isn't ready for
> > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's
> > pretty damn late for a relatively complex patch. And obviously we
> > don't have agreement on the course forward.
> > I don't think we need to stop discussing, but I think it's pretty
> > clear that this isn't 9.4 material. And that it's far from "Ready for
> Committer".
> 
Yep, today is the expected feature freeze date towards v9.4.
It is little bit late to include v9.4 features, unfortunately.

> Yeah.  I'm still not exactly convinced that custom-scan will ever allow
> independent development of new plan types (which, with all due respect to
> Robert, is what it was being sold as last year in Ottawa).  But I'm not
> opposed in principle to committing it, if we can find a way to have a cleaner
> API for things like setrefs.c.  It seems like late-stage planner processing
> in general is an issue for this patch (createplan.c and subselect.c are
> also looking messy).  EXPLAIN isn't too great either.
> 
> I'm not sure exactly what to do about those cases, but I wonder whether
> things would get better if we had the equivalent of
> expression_tree_walker/mutator capability for plan nodes.  The state of
> affairs in setrefs and subselect, at least, is a bit reminiscent of the
> bad old days when we had lots of different bespoke code for traversing
> expression trees.
>
Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
we can pass a walker/mutator function's pointer instead of exposing static
functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with walker/
mutator pointer, all it has to do for sub-plans are calling this new
plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-15 Thread Tom Lane

Peter Eisentraut  writes:
> Let me point out again that my patch doesn't actually do anything about
> PGDLLEXPORT or the like.  It just adds automatic prototypes into
> PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and
> reduce some boilerplate in general.

Hmm ... for some reason I had gotten it in my head that you were adding
PGDLLEXPORT to the autogenerated extern declarations, but at least the
version of the patch in <1389762012.24046.2.ca...@vanquo.pezone.net>
doesn't do that, so the point is moot.

I still object to the aspect of the patch that moves the externs for
_PG_init/_PG_fini into fmgr.h: that is conceptually wrong and will create
more confusion than the trivial code savings is worth.  But I won't
complain if you commit the PG_FUNCTION_INFO_V1 changes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [doc] EXPLAIN CREATE MATERIALIZED VIEW AS?

2014-04-15 Thread Amit Langote

Hi,

Attached adds CREATE MATERIALIZED VIEW AS to the list of statements
that can be EXPLAINed.

--
Amit


explain-create-materialized-view-as.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-15 Thread Peter Eisentraut

On 4/14/14, 3:28 PM, Peter Eisentraut wrote:
> On 4/4/14, 10:07 AM, Andres Freund wrote:
>> If
>> somebody previously tried to do the correct thing and attached
>> PGDLLEXPORT to their own *function* prototoype, it would cause problems
>> now.
> 
> What is the difference (on affected platforms) between
> 
> Datum funcname(PG_FUNCTION_ARGS);
> 
> and writing (effectively)
> 
> PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
> Datum funcname(PG_FUNCTION_ARGS);
> 
> or for that matter
> 
> Datum funcname(PG_FUNCTION_ARGS);
> PGDLLEXPORT Datum funcname(PG_FUNCTION_ARGS);
> 
> 
> If there isn't a difference, then my patch is fine.  Otherwise, it might
> be good to document the issues for extension authors.

Let me point out again that my patch doesn't actually do anything about
PGDLLEXPORT or the like.  It just adds automatic prototypes into
PG_FUNCTION_INFO_V1, to reduce compiler warnings in extensions and
reduce some boilerplate in general.

If it turns out that this might help someone optimize the Windows build,
then great.  But I gather it won't, so so what.  Or maybe it can be made
to work, in which case extension authors will get compiler errors about
places they need to clean up.  So it could still help that way, but who
knows, that's not the point.

If there are still concerns in this area, we could commit just the part
that adds the prototype to PG_FUNCTION_INFO_V1 and let that sit for a
while, and then remove the (now redundant) explicit prototypes in a
later release, so if there is a need to revert it, it won't be so big.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-15 Thread Peter Eisentraut

On 4/15/14, 11:11 PM, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
>>> Yes. It probably won't be a huge change, but it will need a bit of code.
> 
>> It might be more future-proof if the build farm just called make
>> check-world and used some other way to identify the individual tests in
>> that output.  Otherwise, we'll need a new build farm release every time
>> a test suite is added.
> 
> That argument would be more convincing if "make check-world" worked
> on Windows ...

What about it doesn't work on Windows?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The question about the type numeric

2014-04-15 Thread sure.postgres

Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
 * In the NumericShort format, the remaining 14 bits of the header word
 * (n_short.n_header) are allocated as follows: 1 for sign (positive or
 * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
 * commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before the 
decimal point.
So the sign of the number 12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567 
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

2014-04-15



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website：www.highgo.com
Mobile：18766416137

Re: [HACKERS] New option in pg_basebackup to exclude pg_log files during base backup

2014-04-15 Thread Peter Eisentraut

On 4/9/14, 10:57 AM, Magnus Hagander wrote:
> So it'd be an array, and by default you'd have something like:
> basebackup_skip_path = $log_directory
> ?
> 
> Maybe use it to skip backup labels by default as well.
> basebackup_skip_path = $log_directory, $backup_label_files
> 
> 
> I hadn't considered any details, but yes, someting along that line. And
> then you could also include arbitrary filenames or directories should
> you want.

What are the use cases for excluding anything else?

pg_basebackup ought to have some intelligence about what files are
appropriate to include or exclude, depending on what the user is trying
to do.  It shouldn't become a general file copying tool.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-15 Thread Tom Lane

Peter Eisentraut  writes:
> On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
>> Yes. It probably won't be a huge change, but it will need a bit of code.

> It might be more future-proof if the build farm just called make
> check-world and used some other way to identify the individual tests in
> that output.  Otherwise, we'll need a new build farm release every time
> a test suite is added.

That argument would be more convincing if "make check-world" worked
on Windows ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add TAP tests for client programs

2014-04-15 Thread Peter Eisentraut

On 4/14/14, 10:30 PM, Andrew Dunstan wrote:
> 
> On 04/14/2014 10:17 PM, Tom Lane wrote:
>> Peter Eisentraut  writes:
>>> Add TAP tests for client programs
>> I assume the buildfarm would need to be taught about this?
>>
>>
> 
> 
> Yes. It probably won't be a huge change, but it will need a bit of code.

It might be more future-proof if the build farm just called make
check-world and used some other way to identify the individual tests in
that output.  Otherwise, we'll need a new build farm release every time
a test suite is added.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Amit Kapila

On Tue, Apr 15, 2014 at 11:53 PM, Christian Ullrich
 wrote:
> * From: Robert Haas
>> Why not just pass a command-line switch?
>
> Because, as I wrote in the message you are quoting, I did not think that
> having a command-line option for the sole purpose of telling the
> postmaster who its parent is was a suitable solution.
>
> I had already given up on that idea based on Amit's advice, and I will
> create a patch based on a command-line option.
>
> While I have you here, though, any suggestions on what the name of that
> option should be? I think --background is about right.

--background as switch name seems to be okay.

> Also, how should
> I treat the option on non-Windows platforms? Should it just not be there
> (= error), or be ignored if present?

I think ignored for non-windows is better way to proceed.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-15 Thread Amit Kapila

On Wed, Apr 16, 2014 at 3:01 AM, Robert Haas  wrote:
> On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila  wrote:
>> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas  wrote:
>>> For the create case, I'm wondering if we should put the block that
>>> tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
>>> is your opinion?
>>
>> Either way is okay, but I think the way you are suggesting is better as it
>> will make code consistent with other place (PGSharedMemoryCreate()).
>
> OK, can you prepare a patch?

Please find attached patch to address this issue.
One minor point to note is that now we have to call GetLastError() twice,
once inside error path and once to check EEXIST, but I think that is okay
as existing code in PGSharedMemoryCreate() does it that way.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

fix_dsm_invalid_errcode_issue-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The question about the type numeric

2014-04-15 Thread David G Johnston

sure.postgres wrote
> Hi hackers,
> 
> I am learning about numeric .
> The comment of NumericShort format is:
>  * In the NumericShort format, the remaining 14 bits of the header word
>  * (n_short.n_header) are allocated as follows: 1 for sign (positive or
>  * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
>  * commonly-encountered values can be represented this way.
> 
> So the Max of the NumericShort format should be up to 508 digits before
> the decimal point.
> So the sign of the number
> 12345678901234567890123456789012345678901234567890
> 12345678901234567890123456789012345678901234567890123456789012345678901234567890
> 12345678901234567890123456789012345678901234567890123456789012345678901234567890
> 12345678901234567890123456789012345678901234567 
> should be 0x807F.
> The number is 257 digits before the decimal point.
> But the sign is 0.
> So is there anything wrong?

[1000  00][0][00111 0][001]

I appreciate that you got no responses on the original e-mail but if you are
going to re-raise the question at least have the courtesy to respond to your
original thread and not go and start a new one.  And maybe trying rephrasing
the question since most likely your original question was not worded in such
a way to garner a response.

I may have this totally wrong but I don't see why the sign of your number
should be anything but "zero" since that is, I presume, the value of the
specific bit for a positive number - which yours is.

So, in short, nothing seems to "be wrong".  If you think something is wrong
you should probably state what that is explicitly and ask someone to explain
what is happening.

I would have said all this when I saw the first e-mail but I wasn't (and
still am not) totally clear on what you are asking and was hoping someone
more familiar could make better sense of it.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/The-question-about-the-type-numeric-tp5800173p5800174.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The question about the type numeric

2014-04-15 Thread sure.postgres

Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
 * In the NumericShort format, the remaining 14 bits of the header word
 * (n_short.n_header) are allocated as follows: 1 for sign (positive or
 * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
 * commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before the 
decimal point.
So the sign of the number 12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567 
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

Thank you!

2014-04-15



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website：www.highgo.com
Mobile：18766416137

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread David G Johnston

Thom Brown-2 wrote
> On 15 April 2014 23:19, Andreas 'ads' Scherbaum <

> adsmail@

> > wrote:
>>
>> Hi,
>>
>> stumbled over a number of "iff" in the source where "if" is meant - not
>> sure
>> what the real story behind this is, but attached is a patch to fix the
>> about
>> 80 occurrences.
>>
>> This only appears in comments, not in any code path.
> 
> Yeah, apparently those are intentional, and mean "if and only if" (i.e.
> <=>)

Just looking at the first few items someone's good intention is being ruined
by bad execution...especially:

"Add...references to the section...a block is in [iff] chapters aren't
autolabelled..."

Many of these are not mathematical propositions but flow-control logic for
which "IF" is indeed the correct term-of-art; though re-reading the example
above that is probably one of the more logical ones...

Now:

"...new tuple was inserted, and its HEAP_ONLY_TUPLE flag is set [if/iff] a
HOT update was done"

needs to be evaluated on its merits; namely does HEAP_ONLY_TUPLE ever get
set if a HOT update was not done?  If not then "IFF" is proper and necessary
to convey that fact (though even this one is marginal and both versions are
likely unambiguous in reality).

In short "returns BOOLEAN" can reasonably be said to properly use "IF" in
almost all cases - especially for those functions that are check-oriented
(and thus obviously if the opposite condition is present the opposite result
would be returned).  "IFF" is not incorrect here but for me is misplaced and
too-much-information.

Given that it is unlikely the mis/over-use of IFF in the comments is a
meaningful problem I would vote for letting the imprecision go and avoid the
code churn.  If the surrounding code was being altered anyway then having
the corresponding comment updated seems acceptable; but a blanket
find/replace doesn't seem helpful.

David J.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Patch-iff-if-tp5800153p5800172.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread Christopher Browne

On Tue, Apr 15, 2014 at 6:19 PM, Andreas 'ads' Scherbaum <
adsm...@wars-nicht.de> wrote:

>
> Hi,
>
> stumbled over a number of "iff" in the source where "if" is meant - not
> sure what the real story behind this is, but attached is a patch to fix the
> about 80 occurrences.
>
>
"IFF" is a common idiom in mathematics (and logic, which means it will be
of interest in computer science, which shares boundaries with both) which
stands for "If And Only If".

Unless it has very recently fallen out of fashion in mathematics, it's
almost certainly intentional to use "iff" in the comments, and not merely a
typo.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread Tom Lane

Steve Crawford  writes:
> On 04/15/2014 05:36 PM, Andrew Dunstan wrote:
>> On 04/15/2014 06:26 PM, Thom Brown wrote:
>>> Yeah, apparently those are intentional, and mean "if and only if" 

>> This is a reasonably common idiom, or used to be.

> If it has fallen into disuse the news has failed to reach me:
> http://en.wikipedia.org/wiki/If_and_only_if
> http://www.mathwords.com/i/if_and_only_if.htm
> http://mathworld.wolfram.com/Iff.html

The last discussion about this suggested that non-English speakers
might be familiar with other abbreviations for the concept.
But anyway, yes, those are not typos.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread Steve Crawford


On 04/15/2014 05:36 PM, Andrew Dunstan wrote:


On 04/15/2014 06:26 PM, Thom Brown wrote:
On 15 April 2014 23:19, Andreas 'ads' Scherbaum 
 wrote:

Hi,

stumbled over a number of "iff" in the source where "if" is meant - 
not sure
what the real story behind this is, but attached is a patch to fix 
the about

80 occurrences.

This only appears in comments, not in any code path.
Yeah, apparently those are intentional, and mean "if and only if" 
(i.e. <=>)




This is a reasonably common idiom, or used to be.


If it has fallen into disuse the news has failed to reach me:

http://en.wikipedia.org/wiki/If_and_only_if
http://www.mathwords.com/i/if_and_only_if.htm
http://mathworld.wolfram.com/Iff.html
...

Cheers,
Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread Andrew Dunstan



On 04/15/2014 06:26 PM, Thom Brown wrote:

On 15 April 2014 23:19, Andreas 'ads' Scherbaum  wrote:

Hi,

stumbled over a number of "iff" in the source where "if" is meant - not sure
what the real story behind this is, but attached is a patch to fix the about
80 occurrences.

This only appears in comments, not in any code path.

Yeah, apparently those are intentional, and mean "if and only if" (i.e. <=>)



This is a reasonably common idiom, or used to be.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Peter Geoghegan

On Tue, Apr 15, 2014 at 3:59 PM, Ants Aasma  wrote:
> PostgreSQL replacement algorithm is more similar to Generalized CLOCK
> or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm
> that approximates LIRS[3]. LIRS is what MySQL implements[4] and
> CLOCK-Pro is implemented by NetBSD [5] and there has been some work on
> trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double
> the cache size metadata entries and detect pages that have been
> recently referenced. Basically they provide an adaptive tradeoff
> between LRU and LFU.

That's good to know.

> There's a paper on a non blocking GCLOCK algorithm, that does lock
> free clock sweep and buffer pinning[7]. If we decide to stay with
> GCLOCK it may be interesting, although I still believe that some
> variant of buffer nailing[8] is a better idea, my experience shows
> that most of the locking overhead is cache line bouncing ignoring the
> extreme cases where our naive spinlock implementation blows up.

You might be right about that, but lets handle one problem at a time.
Who knows what the bottleneck will end up being if and when we address
the naivety around frequency? I want to better characterize that
problem first.

> There has been some research that indicates that for TPC-A workloads
> giving index pages higher weights increases hitrates[1].

Frankly, there doesn't need to be any research on this, because it's
just common sense that probabilistically, leaf pages are much more
useful than heap pages in servicing index scan queries if we assume a
uniform distribution. If we don't assume that, then they're still more
useful on average.

> I think the hardest hurdle for any changes in this area will be
> showing that we don't have any nasty regressions. I think the best way
> to do that would be to study separately the performance overhead of
> the replacement algorithm and optimality of the replacement choices.
> If we capture a bunch of buffer reference traces by instrumenting
> PinBuffer, we can pretty accurately simulate the behavior of different
> algorithm and tuning choices with different shared buffer sizes.
> Obviously full scale tests are still needed due to interactions with
> OS, controller and disk caches and other miscellaneous influences. But
> even so, simulation would get us much better coverage of various
> workloads and at least some confidence that it's a good change
> overall. It will be very hard and time consuming to gather equivalent
> evidence with full scale tests.

I think I agree with all of that. The fact that we as a community
don't appear to have too much to say about what workloads to
prioritize somewhat frustrates this. The other problem is that sizing
shared_buffers appropriately involves a surprising amount of deference
to rules of thumb that in practice no one is quite prepared to
rigorously defend - who is to say what apportionment of memory to
Postgres is appropriate here? I too was hopeful that we could evaluate
this work purely in terms of observed improvements to hit rate (at
least initially), but now I doubt even that. It would be great to be
able to say "here are the parameters of this discussion", and have
everyone immediately agree with that, but in this instance that's
legitimately not possible.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Tom Lane

Hannu Krosing  writes:
> Is there a way to force it to prefer a plan where the results of (select
> id from onemillion where data > '0.9' limit 100)
> are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
> all at once ?

You could write the query like that:

select * from onemillion_pgfdw where id = any (array(select id from
onemillion where data > '0.9' limit 100));

Or at least you should be able to, except when I try it I get

explain analyze
select * from onemillion_pgfdw where id = any (array(select id from
onemillion where data > '0.9' limit 100));
ERROR:  operator does not exist: integer = integer[]
HINT:  No operator matches the given name and argument type(s). You might need 
to add explicit type casts.
CONTEXT:  Remote SQL command: EXPLAIN SELECT id, inserted, data FROM 
public.onemillion WHERE ((id = ANY ((SELECT null::integer[]

so there's something the remote-estimate code is getting wrong here.
(It seems to work without remote_estimate, though.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Ants Aasma

On Mon, Apr 14, 2014 at 8:11 PM, Peter Geoghegan  wrote:
> PostgreSQL implements a clock sweep algorithm, which gets us something
> approaching an LRU for the buffer manager in trade-off for less
> contention on core structures. Buffers have a usage_count/"popularity"
> that currently saturates at 5 (BM_MAX_USAGE_COUNT). The classic CLOCK
> algorithm only has one bit for what approximates our "usage_count" (so
> it's either 0 or 1). I think that at its core CLOCK is an algorithm
> that has some very desirable properties that I am sure must be
> preserved. Actually, I think it's more accurate to say we use a
> variant of clock pro, a refinement of the original CLOCK.

PostgreSQL replacement algorithm is more similar to Generalized CLOCK
or GCLOCK, as described in [1]. CLOCK-Pro [2] is a different algorithm
that approximates LIRS[3]. LIRS is what MySQL implements[4] and
CLOCK-Pro is implemented by NetBSD [5] and there has been some work on
trying it on Linux [6]. Both LIRS and CLOCK-Pro work by keeping double
the cache size metadata entries and detect pages that have been
recently referenced. Basically they provide an adaptive tradeoff
between LRU and LFU.

> In the past, various hackers have noted problems they've observed with
> this scheme. A common pathology is to see frantic searching for a
> victim buffer only to find all buffer usage_count values at 5. It may
> take multiple revolutions of the clock hand before a victim buffer is
> found, as usage_count is decremented for each and every buffer.  Also,
> BufFreelistLock contention is considered a serious bottleneck [1],
> which is related.

There's a paper on a non blocking GCLOCK algorithm, that does lock
free clock sweep and buffer pinning[7]. If we decide to stay with
GCLOCK it may be interesting, although I still believe that some
variant of buffer nailing[8] is a better idea, my experience shows
that most of the locking overhead is cache line bouncing ignoring the
extreme cases where our naive spinlock implementation blows up.

> Let's leave aside inner/root pages though, because they're so
> dramatically useful when in a primary index on a tpb-b table that
> they'll always be cached by any non-terrible algorithm. It beggars
> belief that the still relatively dense (and consequently *popular*)
> B+Tree leaf pages get so little credit for being of such long-term
> utility (in the view of our clock sweep algorithm as implemented). The
> algorithm has what could be loosely described as an excessively
> short-term perspective. There is clearly a better balance to be had
> here. I don't think the answer is that we have the B-Tree code give
> its pages some special treatment that makes them harder to evict,
> although I will admit that I briefly entertained the idea.

There has been some research that indicates that for TPC-A workloads
giving index pages higher weights increases hitrates[1].

I think the hardest hurdle for any changes in this area will be
showing that we don't have any nasty regressions. I think the best way
to do that would be to study separately the performance overhead of
the replacement algorithm and optimality of the replacement choices.
If we capture a bunch of buffer reference traces by instrumenting
PinBuffer, we can pretty accurately simulate the behavior of different
algorithm and tuning choices with different shared buffer sizes.
Obviously full scale tests are still needed due to interactions with
OS, controller and disk caches and other miscellaneous influences. But
even so, simulation would get us much better coverage of various
workloads and at least some confidence that it's a good change
overall. It will be very hard and time consuming to gather equivalent
evidence with full scale tests.

[1] http://www.csd.uoc.gr/~hy460/pdf/p35-nicola.pdf
[2] http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-05-3.pdf
[3] http://www.ece.eng.wayne.edu/~sjiang/pubs/papers/jiang02_LIRS.pdf
[4] http://lists.mysql.com/commits/28601
[5] http://fxr.watson.org/fxr/source/uvm/uvm_pdpolicy_clockpro.c?v=NETBSD
[6] http://lwn.net/Articles/147879/
[7] 
http://derby-nb.googlecode.com/svn-history/r41/trunk/derby-nb/ICDE10_conf_full_409.pdf
[8] 
http://www.postgresql.org/message-id/ca+tgmozypeyhwauejvyy9a5andoulcf33wtnprfr9sycw30...@mail.gmail.com

Regards,
Ants Aasma
-- 
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Need Multixact Freezing Docs

2014-04-15 Thread Josh Berkus

On 04/15/2014 02:25 PM, Josh Berkus wrote:
> Hackers,
> 
> We need documentation on how users should intelligently set the
> multixact freeze settings.  I'm happy to write the actual text, but I
> definitely don't have any idea how to set these myself.  Under what
> circumstances should they be different from freeze_max_age?  How?
> 

Also: how do I check the multixact age of a table?  There doesn't seem
to be any data for this ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: iff -> if

2014-04-15 Thread Thom Brown

On 15 April 2014 23:19, Andreas 'ads' Scherbaum  wrote:
>
> Hi,
>
> stumbled over a number of "iff" in the source where "if" is meant - not sure
> what the real story behind this is, but attached is a patch to fix the about
> 80 occurrences.
>
> This only appears in comments, not in any code path.

Yeah, apparently those are intentional, and mean "if and only if" (i.e. <=>)

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Get more from indices.

2014-04-15 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> [ pathkey_and_uniqueindx_v10_20130411.patch ]

I thought some more about this patch, and realized that it's more or less
morally equivalent to allowing references to ungrouped variables when the
query has a GROUP BY clause listing all the columns of the primary key.
In that case the parser is effectively pretending that the GROUP BY list
contains additional implicit entries that are functionally dependent on
the entries that are actually there.  In this patch, what we want to do
is recognize that trailing entries in an ORDER BY list are semantically
no-ops and can be ignored because they are functionally dependent on
earlier entries.

Now, the reason that the parser restricts the functional dependency
deduction to a primary key is that it wants to be able to identify a
constraint OID that the query is dependent on to be semantically valid.
In this case, we don't need such an OID, so just finding any old unique
index on not-null columns is good enough.  (If someone drops the index,
the optimization might become incorrect, but that would force replanning
anyway.)

However, this way of thinking about it shows that the patch is missing
possible optimizations.  If we have "ORDER BY a, b, c" and (a,b) is the
primary key, then including c in the ORDER BY list is semantically
redundant, *whether or not we use an indexscan on the pkey index at all*.
More: if we have "ORDER BY a, b, c" and the primary key is (b,a), we
can still discard c from the sort requirement, even though the pkey
index as such isn't helpful for producing the required order.

So hacking up the pathkeys attributed to the indexscan is the wrong thing.
Rather, what we should be looking to do is decide that c is a useless
pathkey and remove it from the query_pathkeys, much as we'd do if we found
"c = constant" in WHERE.  That would allow optimization of other query
plans besides scan-the-pkey-index plans.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Question about optimising (Postgres_)FDW

2014-04-15 Thread Hannu Krosing

Hi

I am playing around with postgres_fdw and found that the following code ...

--
CREATE EXTENSION postgres_fdw;

CREATE SERVER loop foreign data wrapper postgres_fdw
  OPTIONS (port '5432', dbname 'testdb');
 
CREATE USER MAPPING FOR PUBLIC SERVER loop;

create table onemillion (
id serial primary key,
inserted timestamp default clock_timestamp(),
data text
);

insert into onemillion(data) select random() from
generate_series(1,100);

CREATE FOREIGN TABLE onemillion_pgfdw (
id int,
inserted timestamp,
data text
) SERVER loop
OPTIONS (table_name 'onemillion',
 use_remote_estimate 'true');

testdb=# explain analyse
select * from onemillion_pgfdw where id in (select id from onemillion
where data > '0.9' limit 100);
   QUERY
PLAN   
-
 Nested Loop  (cost=122.49..10871.06 rows=50 width=44) (actual
time=4.269..93.444 rows=100 loops=1)
   ->  HashAggregate  (cost=22.06..23.06 rows=100 width=4) (actual
time=1.110..1.263 rows=100 loops=1)
 ->  Limit  (cost=0.00..20.81 rows=100 width=4) (actual
time=0.038..1.026 rows=100 loops=1)
   ->  Seq Scan on onemillion  (cost=0.00..20834.00
rows=100115 width=4) (actual time=0.036..0.984 rows=100 loops=1)
 Filter: (data > '0.9'::text)
 Rows Removed by Filter: 805
   ->  Foreign Scan on onemillion_pgfdw  (cost=100.43..108.47 rows=1
width=29) (actual time=0.772..0.773 rows=1 loops=100)
 Total runtime: 93.820 ms
(8 rows)

Time: 97.283 ms
--

... actually performs 100 distinct "SELECT * FROM onemillion WHERE id =
$1" calls on "remote" side.

Is there a way to force it to prefer a plan where the results of (select
id from onemillion where data > '0.9' limit 100)
are passed to FDW as a single IN ( = ANY(...)) query and are retrieved
all at once ?

If not, how hord would it be to add this feature ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] How can we make beta testing better?

2014-04-15 Thread Josh Berkus

Hackers,

I think 9.3 has given us evidence that our users aren't giving new
versions of PostgreSQL substantial beta testing, or if they are, they
aren't sharing the results with us.

How can we make beta testing better and more effective?  How can we get
more users to actually throw serious workloads at new versions and share
the results?

I've tried a couple of things over the last two years and they haven't
worked all that well.  Since we're about to go into another beta testing
period, we need something new.  Ideas?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Dynamic Shared Memory stuff

2014-04-15 Thread Robert Haas

On Tue, Apr 15, 2014 at 12:33 AM, Amit Kapila  wrote:
> On Mon, Apr 14, 2014 at 10:03 PM, Robert Haas  wrote:
>> On Sat, Apr 12, 2014 at 1:32 AM, Amit Kapila  wrote:
>>> I have checked that other place in code also check handle to
>>> decide if API has failed.  Refer function PGSharedMemoryIsInUse().
>>> So I think fix to call GetLastError() after checking handle is okay.
>>> Attached patch fixes this issue.  After patch, the server shows below
>>> log which is exactly what is expected from test_shm_mq
>>
>> In PostgreSQL code, hmap == NULL, rather than !hmap, is the preferred
>> way to test for a NULL pointer.  I notice that the !hmap style is used
>> throughout this code, so I guess cleaning that up is a matter for a
>> separate commit.
>
> I think in that case we might want to cleanup some other similar usage
> (PGSharedMemoryCreate) of !hmap.

Ah.  Well, in that case maybe we should just leave it alone, since
it's been like that forever and nobody's cared until now.

>> For the create case, I'm wondering if we should put the block that
>> tests for !hmap *before* the _dosmaperr() and check for EEXIST.  What
>> is your opinion?
>
> Either way is okay, but I think the way you are suggesting is better as it
> will make code consistent with other place (PGSharedMemoryCreate()).

OK, can you prepare a patch?


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Need Multixact Freezing Docs

2014-04-15 Thread Josh Berkus

Hackers,

We need documentation on how users should intelligently set the
multixact freeze settings.  I'm happy to write the actual text, but I
definitely don't have any idea how to set these myself.  Under what
circumstances should they be different from freeze_max_age?  How?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/15/2014 02:15 PM, Heikki Linnakangas wrote:
> You're counting XLogRecData structs, not backup blocks. Each
> backup block typically consists of three XLogRecData structs, one
> to record a BkpBlock struct, one to record the data before the
> unused "hole" in the middle of the page, and one for after.
> Sometimes just two, if there is no hole to skip. See the loop just
> before the CRC calculation, that's where the XLogRecData entries
> for backup blocks are created.
> 
> - Heikki

Ah, thanks for the explanation!

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTaLxAAoJEDfy90M199hlwuQP/3Tuea1TUe+4L21ZdProNIIF
fUtejFNUwqhkyWNcnePlubgSyTEIfHGEG9hatrB5/MWdzpiyEvXdDkvV1ODakEhJ
CVKZbnQ4dmnrevypy2f2YdhlbB9du/DDFhFcPOZGbn+vLywwM9oMPS8tQmsol37e
aITe2GnD5LpEcmCSqzz04OL+xAxKLe8fXaI9dDsTRWXb9qdj4pDHI706CeixwSFb
sGsGcIHXmnWieMby9qfWc0WGpc38iMRRkE+LeaEULhsycFP/2x09irXdhbl5T1SH
4PItwX0/ZgLskklG2gaD4HpNe75+Emj1i22PHDYhXSoAzpykHUf+kZZwMUr0AbaF
5QVCer071jHaMacpaVC7/qwUt8zISx4/1wtJuQzfk5H3P2q4L+b/xPmod5/cqs9z
/wFp+9kjMT4349sSMe1eDPTDoIZKgRh8Eiag5IfJtrOAjoK+FN+k8uWNikiyFDMu
z/3l+6mbfrl7FAmfeXLFC9fqhhGOiGLHoZufB/4qFgEikj4S94Hx9Q0nHqkMsFvM
Fcd3qcpLI06Xku7LmBPRvdZ8OVFGWAirH1jBlrdsvC9E5VoZxgByxg90EaTlwjAQ
1ZaGOsbQoXdOPOwe/rGx2ONGwgZp8uFwHSXzUY+CJucvfYQh2AD67AlEhS7Jb9NC
ummpulzJ6arce0815KaT
=Y+m4
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Heikki Linnakangas


On 04/15/2014 11:53 PM, Joe Conway wrote:

One more question before I get to that. I had applied the following
patch to XLogInsert

8<--
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f71590..e39cd37 100644
- --- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData
*rdata)
 uint32  len,
 write_len;
 unsignedi;
+   unsignedinorm;
 boolupdrqst;
 booldoPageWrites;
 boolisLogSwitch = (rmid == RM_XLOG_ID && info ==
XLOG_SWITCH);
 uint8   info_orig = info;
+   uint32  xl_tot_len;

 /* cross-check on whether we should be here or not */
 if (!XLogInsertAllowed())
@@ -924,8 +926,23 @@ begin:;
  * header.
  */
 INIT_CRC32(rdata_crc);
+   i = 0;
+   inorm = 0;
 for (rdt = rdata; rdt != NULL; rdt = rdt->next)
+   {
 COMP_CRC32(rdata_crc, rdt->data, rdt->len);
+
+   if (rdt_lastnormal == rdt)
+   {
+   inorm = i;
+   i = 0;
+   }
+   else
+   i++;
+   }
+   xl_tot_len = SizeOfXLogRecord + write_len;
+   if ((inorm + i) > 4 || xl_tot_len > 2000)
+   elog(LOG,
"XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d", inorm,
i, xl_tot_len);

 START_CRIT_SECTION();
8<--

The idea was to record number of normal and backup blocks, and total
size of the record. I have quite a few entries in the log from the
test run which are like:

8<--
2014-04-11 08:42:06.904 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168
2014-04-11 09:03:12.790 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172
2014-04-11 10:16:57.949 PDT;LOG:
XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150
8<--

and

8<--
2014-04-11 11:17:08.313 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
2014-04-11 11:17:08.338 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020
2014-04-11 11:17:08.389 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
8<--


In other words, based on my inserted logic, it appears that there are
5 and 6 backup blocks on a fairly regular basis.

However in xlog.h it says:
8<--
  * If we backed up any disk blocks with the XLOG record, we use flag
  * bits in xl_info to signal it.  We support backup of up to 4 disk
  * blocks per XLOG record.
8<--

So is my logic to record number of backup blocks wrong, or is the
comment wrong, or am I otherwise misunderstanding something?


You're counting XLogRecData structs, not backup blocks. Each backup 
block typically consists of three XLogRecData structs, one to record a 
BkpBlock struct, one to record the data before the unused "hole" in the 
middle of the page, and one for after. Sometimes just two, if there is 
no hole to skip. See the loop just before the CRC calculation, that's 
where the XLogRecData entries for backup blocks are created.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Tom Lane

Joe Conway  writes:
> In other words, based on my inserted logic, it appears that there are
> 5 and 6 backup blocks on a fairly regular basis.

> However in xlog.h it says:
> 8<--
>  * If we backed up any disk blocks with the XLOG record, we use flag
>  * bits in xl_info to signal it.  We support backup of up to 4 disk
>  * blocks per XLOG record.
> 8<--

> So is my logic to record number of backup blocks wrong, or is the
> comment wrong, or am I otherwise misunderstanding something?

The comment is correct, so you did something wrong.  From memory,
there's a goto-label retry loop in that function; maybe you need
to zero your counters after the retry label?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Excessive WAL generation and related performance issue

2014-04-15 Thread Joe Conway

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/14/2014 04:34 PM, Joe Conway wrote:
> On 04/14/2014 04:25 PM, Andres Freund wrote:
>> On 2014-04-14 16:22:48 -0700, Joe Conway wrote:
>>> That'll help performance, but lets say I generally keep WAL
>>> files for PITR and don't turn that off before starting --
>>> shouldn't I be very surprised to need over 3TB of archive
>>> storage when loading a 50GB table with a couple of indexes?
> 
>> The point is that more frequent checkpoints will increase the
>> WAL volume *significantly* because more full page writes will
>> have to be generated.
> 
> OK, I'll see how much it can be brought down through checkpoint
> tuning and report back.

One more question before I get to that. I had applied the following
patch to XLogInsert

8<--
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 2f71590..e39cd37 100644
- --- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -737,10 +737,12 @@ XLogInsert(RmgrId rmid, uint8 info, XLogRecData
*rdata)
uint32  len,
write_len;
unsignedi;
+   unsignedinorm;
boolupdrqst;
booldoPageWrites;
boolisLogSwitch = (rmid == RM_XLOG_ID && info ==
XLOG_SWITCH);
uint8   info_orig = info;
+   uint32  xl_tot_len;

/* cross-check on whether we should be here or not */
if (!XLogInsertAllowed())
@@ -924,8 +926,23 @@ begin:;
 * header.
 */
INIT_CRC32(rdata_crc);
+   i = 0;
+   inorm = 0;
for (rdt = rdata; rdt != NULL; rdt = rdt->next)
+   {
COMP_CRC32(rdata_crc, rdt->data, rdt->len);
+
+   if (rdt_lastnormal == rdt)
+   {
+   inorm = i;
+   i = 0;
+   }
+   else
+   i++;
+   }
+   xl_tot_len = SizeOfXLogRecord + write_len;
+   if ((inorm + i) > 4 || xl_tot_len > 2000)
+   elog(LOG,
"XLogInsert;tot_nml_blks;%d;tot_bkp_blks;%d;tot_Xlog_Len;%d", inorm,
i, xl_tot_len);

START_CRIT_SECTION();
8<--

The idea was to record number of normal and backup blocks, and total
size of the record. I have quite a few entries in the log from the
test run which are like:

8<--
2014-04-11 08:42:06.904 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16168
2014-04-11 09:03:12.790 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;5;tot_Xlog_Len;16172
2014-04-11 10:16:57.949 PDT;LOG:
XLogInsert;tot_nml_blks;3;tot_bkp_blks;5;tot_Xlog_Len;16150
8<--

and

8<--
2014-04-11 11:17:08.313 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
2014-04-11 11:17:08.338 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;16020
2014-04-11 11:17:08.389 PDT;LOG:
XLogInsert;tot_nml_blks;4;tot_bkp_blks;6;tot_Xlog_Len;12332
8<--


In other words, based on my inserted logic, it appears that there are
5 and 6 backup blocks on a fairly regular basis.

However in xlog.h it says:
8<--
 * If we backed up any disk blocks with the XLOG record, we use flag
 * bits in xl_info to signal it.  We support backup of up to 4 disk
 * blocks per XLOG record.
8<--

So is my logic to record number of backup blocks wrong, or is the
comment wrong, or am I otherwise misunderstanding something?

Thanks,

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJTTZxJAAoJEDfy90M199hl15MQAKTcv9BoZTsXDleSu9JrU1ha
UhHUnDALRmxWLgyPYsgtifxMQ3jLp5eLrkMHGnQbVD17619OgHckuOiEphc2bdQp
MfZlv3jrEqxnmsh6qKhK1J23mHj0cohWXQ9EUoyjE6tlZueLPyMigaIV662KP1d2
pUXCh6IEJYMMaPfqhR5Mxi62s+HMkpAULhafWeEeAwcU1eYNijFWlyxJWlsv7D6X
9ZuDSmRtqnAP0g23GcbxNkL/I9Yv090Uxar7um2Rw5SEUV+Uv1kMY0GVCjHluE0k
qZhSF1tE2jypThhSnv5xRHT3ZzdKoJtNmfLekjws7+dFZbSBLgNOj4EdV0H/wUgf
NqO71kkeRhd44uMRzii0cr03LwBiwqC2apCYoZy7s0X3rl10hZfKgVEKkyhaZ4VJ
QdfR1WdY/hC7mKW7NPnkycF+Es1ykEfuPnKHHsyJ3fHeFGxkKD3I6A8jGnNnS6VL
ba+jx+t3qnrcKQAW8lqQ3rAij5Jkb97Ljibc7o6w8cgnGA4S0tqsE6jDrdDR1FO4
ns5uULTs4REU8clFwiKNZnQfINRUUfqY1mtlRneJMANeafm0j2CyIzvqLqB2mdOH
YL9SS2lIngQlVSfgpu7EiSS7sJx8XGe3a3YFE9DoTBpq009scrscH40+kuN823wp
yruufkzaBN6lyAjo3zoR
=GQDN
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-15 Thread Tom Lane

David Fetter  writes:
> On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
>> Fixed.  I added a personal script option that allows me to test contrib,
>> but forgot to run it.

> Is that script of general utility for committers?  If so, it might be
> good to include it in the distribution.  I'd be happy to go through
> and perl-ify it, document it, etc.  Or maybe it could be a new make
> target...

I'm pretty sure "make check-world" would've covered this already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-15 Thread Andres Freund

On 2014-04-15 12:32:36 -0700, David Fetter wrote:
> On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
> > On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote:
> > > Bruce Momjian  writes:
> > > > psql: conditionally display oids and replication identity
> > > 
> > > Buildfarm isn't terribly pleased with this --- looks like you missed
> > > contrib/test_decoding/
> > 
> > Fixed.  I added a personal script option that allows me to test contrib,
> > but forgot to run it.
> 
> Is that script of general utility for committers?  If so, it might be
> good to include it in the distribution.  I'd be happy to go through
> and perl-ify it, document it, etc.  Or maybe it could be a new make
> target...

"make check-world"

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] test script, was Re: [COMMITTERS] pgsql: psql: conditionally display oids and replication identity

2014-04-15 Thread David Fetter

On Tue, Apr 15, 2014 at 02:46:34PM -0400, Bruce Momjian wrote:
> On Tue, Apr 15, 2014 at 02:32:53PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > psql: conditionally display oids and replication identity
> > 
> > Buildfarm isn't terribly pleased with this --- looks like you missed
> > contrib/test_decoding/
> 
> Fixed.  I added a personal script option that allows me to test contrib,
> but forgot to run it.

Is that script of general utility for committers?  If so, it might be
good to include it in the distribution.  I'd be happy to go through
and perl-ify it, document it, etc.  Or maybe it could be a new make
target...

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the "relfilenode mapping" infrastructure

2014-04-15 Thread Tom Lane

Andres Freund  writes:
>> On 2014-03-27 08:02:35 -0400, Tom Lane wrote:
>>> Buildfarm member prairiedog thinks there's something unreliable about
>>> commit f01d1ae3a104019d6d68aeff85c4816a275130b3:

> So I had made a notice to recheck on
> this. 
> http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&br=HEAD
> indicates there haven't been any further failures... So, for now I
> assume this was caused by some problem fixed elsewhere.

Hard to say.  In any case, I agree we can't make any progress unless we
see it again.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Something flaky in the "relfilenode mapping" infrastructure

2014-04-15 Thread Andres Freund

On 2014-03-28 21:36:11 +0100, Andres Freund wrote:
> Hi,
> 
> On 2014-03-27 08:02:35 -0400, Tom Lane wrote:
> > Buildfarm member prairiedog thinks there's something unreliable about
> > commit f01d1ae3a104019d6d68aeff85c4816a275130b3:
> 
> > *** 
> > /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/expected/alter_table.out
> >  Thu Mar 27 04:12:40 2014
> > --- 
> > /Users/buildfarm/bf-data/HEAD/pgsql.13462/src/test/regress/results/alter_table.out
> >   Thu Mar 27 04:52:02 2014
> > ***
> > *** 2333,2339 
> >   ) mapped;
> >incorrectly_mapped | have_mappings 
> >   +---
> > !   0 | t
> >   (1 row)
> 
> That's rather odd. It has survived for a couple of months on the other
> buildfarm animals now... Could one of you apply the attached patch
> adding more details to eventual failures?

So I had made a notice to recheck on
this. 
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=prairiedog&br=HEAD
indicates there haven't been any further failures... So, for now I
assume this was caused by some problem fixed elsewhere.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Amit Kapila

> On Mon, Apr 14, 2014 at 11:46 AM, Christian Ullrich
>  wrote:


> > * From: Amit Kapila
> >> Do you mean to say use some existing environment variable?
> >> Introducing an environment variable to solve this issue or infact
> >> using some existing environ variable doesn't seem to be the best way
> >> to pass such information.
> >
> > I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was
> > set, the postmaster etc. would ignore the events.
> 
> Do you plan to reset it and if yes when?
> I think there is chance that after setting this environment variable,
> some other instance of server (postmaster) can read it and missed the
> signal which it should have otherwise processed.

We have decided not to go this way, but just for completeness:

Environment inheritance works the same way on Windows as on Unix. When
a process is started with a modified environment (one of the plentiful
arguments of CreateProcess() et al.), only that process and its 
descendants see the modification. I had not planned to set a system-level
or user-level variable.

-- 
Christian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Bruce Momjian

> On Mon, Apr 14, 2014 at 09:34:14AM +0530, Amit Kapila wrote:
> > The problem can be solved this way, but the only question here is
> > whether it is acceptable for users to have a new console window for
> > server.
> >
> > Can others also please share their opinion if this fix (start server
> > in new console) seems acceptable or shall we try by passing some
> > information from pg_ctl and then ignore CTRL+C && CTRL+BREAK for it?
> 
> I wanted to point out that I think this patch is only appropriate for
> head, not backpatching.  Also, on Unix we have to handle signals that

Yes, of course.

> come from the kill command.  Can you send CTRL+C from other applications
> on Windows?

Yes again, using GenerateConsoleCtrlEvent() you can send these events to
any (console-attached) process you have the required permissions for, 
but that is not an issue for the same reason it isn't one on Unix. All
the target process sees is the event, it cannot determine (nor does it 
care) where the event came from.

-- 
Christian



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Christian Ullrich

* From: Robert Haas

> On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich
>  wrote:

> > I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was
> > set, the postmaster etc. would ignore the events.
> 
> Why not just pass a command-line switch?

Because, as I wrote in the message you are quoting, I did not think that
having a command-line option for the sole purpose of telling the 
postmaster who its parent is was a suitable solution. 

I had already given up on that idea based on Amit's advice, and I will 
create a patch based on a command-line option.

While I have you here, though, any suggestions on what the name of that
option should be? I think --background is about right. Also, how should
I treat the option on non-Windows platforms? Should it just not be there
(= error), or be ignored if present?

-- 
Christian

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Peter Geoghegan

On Tue, Apr 15, 2014 at 9:30 AM, Merlin Moncure  wrote:
> There are many reports of improvement from lowering shared_buffers.
> The problem is that it tends to show up on complex production
> workloads and that there is no clear evidence pointing to problems
> with the clock sweep; it could be higher up in the partition locks or
> something else entirely (like the O/S).  pgbench is also not the
> greatest tool for sniffing out these cases: it's too random and for
> large database optimization is generally an exercise in de-randomizing
> i/o patterns.  We really, really need a broader testing suite that
> covers more usage patterns.

I find it quite dissatisfying that we know so little about this.

I'm finding that my patch helps much less when shared_buffers is sized
large enough to fit the index entirely (although there are still some
localized stalls on master, where there are none with patched).
shared_buffers is still far too small to fit the entire heap. With
shared_buffers=24GB (which still leaves just under 8GB of memory for
the OS to use as cache, since this system has 32GB of main memory),
the numbers are much less impressive relative to master with the same
configuration. Both sets of numbers are still better than what you've
already seen with shared_buffers=8GB, since of course the "no more
than 8GB" recommendation is not an absolute, and as you say its
efficacy seemingly cannot be demonstrated with pgbench.

My guess is that the patch doesn't help because once there is more
than enough room to cache the entire index (slightly over twice as
many buffers as would be required to do so), even on master it becomes
virtually impossible to evict those relatively popular index pages,
since they still have an early advantage. It doesn't matter that
master's clock sweep has what I've called an excessively short-term
perspective, because there is always enough pressure relative to the
number of leaf pages being pinned to prefer to evict heap pages. There
is still a lot of buffers that can fit some moderate proportion of all
heap pages even after buffering the entire index (something like
~13GB).

You might say that with this new shared_buffers setting, clock sweep
doesn't need to have a "good memory", because it can immediately
observe the usefulness of B-Tree leaf pages.

There is no need to limit myself to speculation here, of course. I'll
check it out using pg_buffercache.
-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL in Windows console and Ctrl-C

2014-04-15 Thread Robert Haas

On Mon, Apr 14, 2014 at 2:16 AM, Christian Ullrich  wrote:
> I meant creating a new one, yes. If, say, PGSQL_BACKGROUND_JOB was set,
> the postmaster etc. would ignore the events.

Why not just pass a command-line switch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql \d+ and oid display

2014-04-15 Thread Bruce Momjian

On Thu, Apr 10, 2014 at 08:05:11PM -0400, Bruce Momjian wrote:
> On Thu, Apr 10, 2014 at 07:58:55PM -0400, Tom Lane wrote:
> > Bruce Momjian  writes:
> > > It also has changed the OID status to only display if it exists.  One
> > > question that came up with Robert is whether OID status should appear
> > > for \d as well, now that is only shows up when present.
> > 
> > Yeah, I was wondering about that too.  If part of the argument here is
> > to make these two displays act more alike, it seems inconsistent that
> > one is emitted by \d while the other only comes out with \d+.
> > 
> > Of course, there are two ways to fix that: maybe the replica info
> > also only belongs in \d+?
> 
> OK, I changed my patch to only show replica info for \d+.  If we decide
> to change them to both display for \d, I will update it again.

OK, hearing only quiet, I have applied the patch.  I like that we now
document the replication identity default, which should go a long way to
making this clearer.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Clock sweep not caching enough B-Tree leaf pages?

2014-04-15 Thread Merlin Moncure

On Mon, Apr 14, 2014 at 7:45 PM, Peter Geoghegan  wrote:
> On Mon, Apr 14, 2014 at 5:30 PM, Bruce Momjian  wrote:
>> I am glad you are looking at this.  You are right that it requires a
>> huge amount of testing, but clearly our code needs improvement in this
>> area.
>
> Thanks.
>
> Does anyone recall the original justification for the recommendation
> that shared_buffers never exceed 8GiB? I'd like to revisit the test
> case, if such a thing exists.

There are many reports of improvement from lowering shared_buffers.
The problem is that it tends to show up on complex production
workloads and that there is no clear evidence pointing to problems
with the clock sweep; it could be higher up in the partition locks or
something else entirely (like the O/S).  pgbench is also not the
greatest tool for sniffing out these cases: it's too random and for
large database optimization is generally an exercise in de-randomizing
i/o patterns.  We really, really need a broader testing suite that
covers more usage patterns.

I was suspicious for a while that spinlock contention inside the
clocksweep was causing stalls and posted a couple of different patches
to try and reduce the chance of that.  I basically gave up when I
couldn't demonstrate that case in simulated testing.

I still think there is no good reason for the clock to pedantically
adjust usage count on contented buffers...better to throw a single
TTAS and bail to the next buffer if either 'T' signals a lock.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Tom Lane

Andres Freund  writes:
> What I think this discussion shows that this patch isn't ready for
> 9.4. The first iteration of the patch came in 2013-11-06. Imo that's
> pretty damn late for a relatively complex patch. And obviously we don't
> have agreement on the course forward.
> I don't think we need to stop discussing, but I think it's pretty clear
> that this isn't 9.4 material. And that it's far from "Ready for Committer".

Yeah.  I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa).  But I'm not
opposed in principle to committing it, if we can find a way to have a
cleaner API for things like setrefs.c.  It seems like late-stage planner
processing in general is an issue for this patch (createplan.c and
subselect.c are also looking messy).  EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder
whether things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes.  The state
of affairs in setrefs and subselect, at least, is a bit reminiscent
of the bad old days when we had lots of different bespoke code for 
traversing expression trees.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Andres Freund

Hi,

On 2014-04-15 11:07:11 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane  wrote:
> > Robert Haas  writes:
> [ discussion ]

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we don't
have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty clear
that this isn't 9.4 material. And that it's far from "Ready for Committer".

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Robert Haas

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane  wrote:
>>> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
>>> seems to me that my original fears are largely confirmed: the custom scan
>>> patch is not going to be sufficient to allow development of any truly new
>>> plan type.  Yeah, you can plug in some new execution node types, but
>>> actually doing anything interesting is going to require patching other
>>> parts of the system.
>
>> Without prejudice to the rest of what you said, this argument doesn't
>> hold much water with me.  I mean, anything that our extensibility
>> mechanism doesn't support today will require new hooks, but does that
>> mean we're never going to add any more hooks?  I sure hope not.
>
> No, that's not what I said.  ISTM that the argument for the custom-scan
> API is that it allows interesting new things to be done *without further
> modifying the core code*.  But the example application (cache_scan) fails
> to demonstrate that, and indeed seems to be a counterexample.  Whether
> we'd accept cache_scan on its own merits is a separate question.  The
> problem for me is that custom-scan isn't showing that it can support what
> was claimed without doing serious damage to modularity and maintainability
> of the core code.

I think there's two separate things in there, one of which I agree
with and one of which I disagree with.  I agree that we must avoid
damaging the modularity and maintainability of the core code; I don't
agree that custom-scan needs to be able to do interesting things with
zero additional changes to the core code.  If we come up with three
interesting applications for custom scan that require 5 new hooks
between them, I'll consider that a major success - assuming those
hooks don't unduly limit future changes we may wish to make in the
core code.  I think your concern about exposing APIs that may not be
terribly stable is well-founded, but I don't think that means we
shouldn't expose *anything*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Stephen Frost

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> A concrete example here is setrefs.c, whose responsibilities tend to
> change from release to release.  I think if we committed custom-scan
> as is, we'd have great difficulty changing setrefs.c's transformations
> ever again, at least if we hoped to not break users of the custom-scan
> API.  I'm not sure what the solution is --- but turning setrefs into
> a white box instead of a black box isn't it.

Yeah, this was my (general) complaint as well and the answer that I kept
getting back is "well, it's ok, you can still break it between major
releases and the custom scan users will just have to deal with it".

I'm a bit on the fence about that, itself, but the other half of that
coin is that we could end up with parts of the *core* code that think
it's ok to go pulling in these functions, once they're exposed, and that
could end up making things quite ugly and difficult to maintain going
forward.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Tom Lane

Robert Haas  writes:
> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane  wrote:
>> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
>> seems to me that my original fears are largely confirmed: the custom scan
>> patch is not going to be sufficient to allow development of any truly new
>> plan type.  Yeah, you can plug in some new execution node types, but
>> actually doing anything interesting is going to require patching other
>> parts of the system.

> Without prejudice to the rest of what you said, this argument doesn't
> hold much water with me.  I mean, anything that our extensibility
> mechanism doesn't support today will require new hooks, but does that
> mean we're never going to add any more hooks?  I sure hope not.

No, that's not what I said.  ISTM that the argument for the custom-scan
API is that it allows interesting new things to be done *without further
modifying the core code*.  But the example application (cache_scan) fails
to demonstrate that, and indeed seems to be a counterexample.  Whether
we'd accept cache_scan on its own merits is a separate question.  The
problem for me is that custom-scan isn't showing that it can support what
was claimed without doing serious damage to modularity and maintainability
of the core code.

What this may mean is that we need more attention to refactoring of the
core code.  But just removing "static" from any function that looks like
it might be handy isn't my idea of well-considered refactoring.  More the
opposite in fact: if those things turn into APIs that we have to support,
it's going to kill any ability to do such refactoring.

A concrete example here is setrefs.c, whose responsibilities tend to
change from release to release.  I think if we committed custom-scan
as is, we'd have great difficulty changing setrefs.c's transformations
ever again, at least if we hoped to not break users of the custom-scan
API.  I'm not sure what the solution is --- but turning setrefs into
a white box instead of a black box isn't it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)

2014-04-15 Thread Robert Haas

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane  wrote:
> Simon Riggs  writes:
>> [ assorted comments about custom-scan patch, but particularly ]
>
>> * The prune hook makes me feel very uneasy. It seems weirdly specific
>> implementation detail, made stranger by the otherwise lack of data
>> maintenance API calls. Calling that for every dirty page sounds like
>> an issue and my patch rejection indicator is flashing red around that.
>
> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
> seems to me that my original fears are largely confirmed: the custom scan
> patch is not going to be sufficient to allow development of any truly new
> plan type.  Yeah, you can plug in some new execution node types, but
> actually doing anything interesting is going to require patching other
> parts of the system.  Are we going to say to all comers, "sure, we'll put
> a hook call anywhere you like, just ask"?  I can't see this as being the
> way to go.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me.  I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks?  I sure hope not.  When
hooks are proposed here, we evaluate on them on their merits and
attempt to judge the likelihood that a hook in a particular place will
be useful, but generally we're not averse to adding them, and as long
as the paths aren't too performance-critical, I don't think we should
be averse to adding them.

We have a great system today for letting people add new data types and
things of that sort, but anything that penetrates more deeply into the
heart of the system pretty much can't be done; this is why various
companies, such as our respective employers, have developed and
maintained forks of the PostgreSQL code base instead of just hooking
in to the existing code.  We probably can't solve that problem
completely, but that doesn't mean we should throw in the towel.

And in particular, I think it's pretty normal that a new facility like
custom scans might create additional demand for new hooks.  If
something was completely impossible before, and the new facility makes
it almost-possible, then why shouldn't someone ask for a hook there?
A prune hook probably has no business in the custom scan patch proper,
but whether it's a good idea or a bad one should be decided on the
merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-15 Thread Tom Lane

Craig Ringer  writes:
> On 04/15/2014 03:39 AM, Tom Lane wrote:
>> I still wish we could get rid of this problem by fixing the Windows build
>> recipes so that the PGDLLEXPORT marking wasn't needed.  We proved to
>> ourselves recently that getting rid of PGDLLIMPORT on global variables
>> wouldn't work, but I'm not sure that the function end of it was really
>> investigated.

> My understanding is that we *can* drop PGDLLEXPORT on functions without
> actively breaking anything. But we probably shouldn't.

> If we omit PGDLLEXPORT, the linker of the DLL/executable that imports
> the extern function will generate a thunk from the .LIB file for the
> target DLL during linkage; this thunk within the DLL/EXE with the
> undefined extern then jumps to the real address within the defining DLL/EXE.

TBH, if the only argument for this is a small efficiency difference,
then to my mind it barely requires discussion.  I don't give one hoot
about micro-optimization for the Windows platform; I'm satisfied if
it works at all there.  And I seriously doubt that a couple more cycles to
call any function implemented in a loadable module would matter anyway.

> I actually think we should *add* a LIBPQEXPORT that handles this for
> libpq, much like PGDLLEXPORT does for postgres(.exe). And in the
> process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or
> something.

My reaction to that is "not bloody likely".  I remarked on this upthread
already, but there is absolutely no way that I want to clutter our source
code with platform-specific markings like that.

Perhaps somebody could try a Windows build with PGDLLEXPORT defined to
empty, and verify that it works, and if so do a pgbench comparison
against a build done the existing way?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Create function prototype as part of PG_FUNCTION_INFO_V1

2014-04-15 Thread Craig Ringer

On 04/15/2014 03:39 AM, Tom Lane wrote:
> I still wish we could get rid of this problem by fixing the Windows build
> recipes so that the PGDLLEXPORT marking wasn't needed.  We proved to
> ourselves recently that getting rid of PGDLLIMPORT on global variables
> wouldn't work, but I'm not sure that the function end of it was really
> investigated.

My understanding is that we *can* drop PGDLLEXPORT on functions without
actively breaking anything. But we probably shouldn't.

If we omit PGDLLEXPORT, the linker of the DLL/executable that imports
the extern function will generate a thunk from the .LIB file for the
target DLL during linkage; this thunk within the DLL/EXE with the
undefined extern then jumps to the real address within the defining DLL/EXE.

Reference: http://msdn.microsoft.com/en-us/library/zw3za17w.aspx

So in other words, it makes calls across DLL boundaries less efficient
by adding a layer of indirection. (No idea how this works in the
presence of link time base address randomization either).

I actually think we should *add* a LIBPQEXPORT that handles this for
libpq, much like PGDLLEXPORT does for postgres(.exe). And in the
process, rename PGDLLEXPORT to POSTGRESEXPORT or PGSERVEREXPORT or
something.

PGDLLEXPORT is probably less important overall - it'll mainly impact
extensions (like hstore, intarray, etc) that call into the server.

I wonder if this thunking still really mattres with modern CPU
architecures' smart branch prediction, TLB caches, etc. I haven't found
much info on the real world impact.

It would probably be reasonable to add PGDLLEXPORT within postgres.exe
only on functions we've intentionally exposed for use by extensions,
where those functions are likely to get called a lot and don't have
bigget costs like disk I/O, network I/O, expensive memory allocations,
etc, that make call time overheads irrelevant.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The question about the type numeric

2014-04-15 Thread sure.postgres

Hi hackers,

I am learning about numeric .
The comment of NumericShort format is:
 * In the NumericShort format, the remaining 14 bits of the header word
 * (n_short.n_header) are allocated as follows: 1 for sign (positive or
 * negative), 6 for dynamic scale, and 7 for weight.  In practice, most
 * commonly-encountered values can be represented this way.

So the Max of the NumericShort format should be up to 508 digits before the 
decimal point.
So the sign of the number 12345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567890123456789012345678901234567890
12345678901234567890123456789012345678901234567 
should be 0x807F.
The number is 257 digits before the decimal point.
But the sign is 0.
So is there anything wrong?

2014-04-15



wangshuo
HighGo Software Co.,Ltd.
Address: A203 Block D QILU Soft Park, High-Tech Zone, Lixia district, Jinan 
Shandong, China(Head Office)
Tel:+86-0531-55701530
Fax:+86-0531-55701544
Website：www.highgo.com
Mobile：18766416137

Re: [HACKERS] UNION ALL on partitioned tables won't use indices.

2014-04-15 Thread Kyotaro HORIGUCHI

Thank you for committing.

At Fri, 28 Mar 2014 11:50:56 -0400, Tom Lane  wrote in 
<21426.1396021...@sss.pgh.pa.us>
tgl> Kyotaro HORIGUCHI  writes:
tgl> > Hello. Attached is the 2nd version of 'pushdown in UNION ALL on
tgl> > partitioned tables' patch type 1 - fix in equiv-member version.
tgl> 
tgl> Committed, thanks.
tgl> 
tgl>regards, tom lane

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

63 matches

Mail list logo