date:20140508

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-08 Thread Craig Ringer

On 05/08/2014 08:04 AM, Jaime Casanova wrote:
> Maybe is not the right term... i mean: the time that take to update
> indexes on an INSERT/UPDATE operation

That'd be a big plus IMO, especially for expensive GiST or GIN index
updates.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Peter Geoghegan

On Thu, May 8, 2014 at 3:39 PM, Tom Lane  wrote:
> Barring objections I'll commit this tomorrow

Looks good to me.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Gavin Flower


On 09/05/14 15:34, Bruce Momjian wrote:

On Thu, May  8, 2014 at 06:39:11PM -0400, Tom Lane wrote:

I wrote:

I think the idea of hashing only keys/values that are "too long" is a
reasonable compromise.  I've not finished coding it (because I keep
getting distracted by other problems in the code :-() but it does not
look to be very difficult.  I'm envisioning the cutoff as being something
like 128 bytes; in practice that would mean that few if any keys get
hashed, I think.

Attached is a draft patch for this.  In addition to the hash logic per se,
I made these changes:

* Replaced the K/V prefix bytes with a code that distinguishes the types
of JSON values.  While this is not of any huge significance for the
current index search operators, it's basically free to store the info,
so I think we should do it for possible future use.

* Fixed the problem with "exists" returning rows it shouldn't.  I
concluded that the best fix is just to force recheck for exists, which
allows considerable simplification in the consistent functions.

* Tried to improve the comments in jsonb_gin.c.

Barring objections I'll commit this tomorrow, and also try to improve the
user-facing documentation about the jsonb opclasses.

Looks good.  I was thinking the jsonb_ops name could remain unchanged
and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
the key and value into a single index entry.

If you have 'jsonb_combo_ops' - then surely 'jsonb_op' should be called 
'jsonb_xxx_ops', where the 'xxx' distinguishes that from 
'jsonb_combo_ops'?  I guess, if any appropriate wording of 'xxx' was too 
cumbersome, then it would be worse.



Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Amit Kapila

On Thu, May 8, 2014 at 9:47 PM, Christoph Berg  wrote:
> Re: Andres Freund 2014-05-08 <20140508145901.gb1...@awork2.anarazel.de>
>> > Maybe this is nitpicking, but what happens when postgresql.auto.conf also
>> > includes the setting of data_directory? This is possible because we can
>> > set data_directory via ALTER SYSTEM now. Should we just ignore such
>> > problematic setting in postgresql.auto.conf with warning message?
>>
>> I think that's a case of "Doctor, it hurts when I do this. Doctor: don't
>> do that then".
>
> I'd opt to forbid setting data_directory at ALTER SYSTEM time. For the
> other options, I agree with Andres that you should get to keep all
> parts if you manage to break it.

There is no harm in forbidding data_directory, but similarly we can
imagine that people can set some very large values for some config
variables due to which later it can have symptoms similar to this
issue.

> Fortunately, ALTER SYSTEM already refuses to change config_file :)

That is because config_file is not a file parameter (not present in
postgresql.conf).  We disallow non-file and internal (like wal_segment_size)
parameters in Alter System

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-08 Thread David G Johnston

Robert Haas wrote
> On Thu, May 8, 2014 at 2:31 AM, Jaime Casanova <

> jaime@

> > wrote:
>> On Wed, May 7, 2014 at 10:52 PM, Amit Kapila <

> amit.kapila16@

> > wrote:
>>> On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova <

> jaime@

> > wrote:
>>
>>QUERY PLAN
>> 
>>  Insert on public.t1 (actual time=0.540..0.540 rows=0 loops=1)
>>->  Result (actual time=0.046..0.049 rows=1 loops=1)
>>  Output: 
> 
>>  Index uniq_idx_on_text on t1: time=0.421 rows=1
>>  Index t1_pkey on t1: time=0.027 rows=1
>>  Total runtime: 0.643 ms
>> (6 rows)
> 
> I would have expected the information about index maintenance times to
> be associated with the Insert node, not the plan overall.  IIUC, you
> could have more than one such node if, for example, there are
> writeable CTEs involved.

Even with multiple nodes I would think that typically each node would focus
on a different table and so listing everything, by table, at the bottom of
the explain would not cause that much of an issue.  Even if a table gets hit
from two CTEs, and thus have their calls and times added together, the
relative (per call) numbers overall and comparative numbers between indexes
on the same table would remain constant.  And if you are testing indexes you
would likely want to keep the queries the same.

Not having to scan through and handle multiple locations for index timings
will make interpretation, parsing, and presentation much easier.  It
wouldn't preclude having total timings and node timings in the future if the
demand is sufficient.

David J.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/WIP-showing-index-maintenance-on-EXPLAIN-tp5803106p5803337.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-08 Thread Amit Kapila

On Thu, May 8, 2014 at 12:01 PM, Jaime Casanova  wrote:
> On Wed, May 7, 2014 at 10:52 PM, Amit Kapila  wrote:
>>
>> Why to capture only for Index Insert/Update and not for Read; is it
>> because Read will be always fast ot implementation complexity?
>>
>
> EXPLAIN ANALYZE already shows that on any SELECT that uses an index in
> some way. Or are you thinking on something else?

postgres=# explain analyze select * from t1 where c1 > 5 and c1 <6;
   QUERY PLAN



 Index Scan using idx1_t1 on t1  (cost=0.29..983.57 rows=10014 width=508) (actua
l time=0.033..11.826 rows= loops=1)
   Index Cond: ((c1 > 5) AND (c1 < 6))
 Planning time: 2.001 ms
 Execution time: 18.486 ms
(4 rows)

Are you referring actual time in above print?

The actual time is node execution time which in above kind of cases will
be: scanning the index + scanning the heap.  I think it is not same what
you are planning to show for Insert/Update case.

>> Why not similar timings for heap?
>>
>
> well "actual time" shows us total time of the operation so just
> deducting the time spent on triggers, indexes and planning seems like
> a way to get "heap modification time".

planning time doesn't include parse time, so above calculation might
not give time spent in heap during statement execution.

>> Why can't we print when only Analyze is used with Explain, the
>> execution time is printed with Analyze option?
>
> i'm not sure the info is useful for everyone, i'm not opposed to show
> it all the time though

Okay, no problem.  I think it can be done based on what most people
expect.

>> Could you please tell in what all kind of scenario's, do you expect it
>> to be useful?
>> One I could think is that if there are multiple indexes on a table and user
>> wants to find out if any particular index is consuming more time.
>>
>
> exactly my use case. consider this plan (we spent 78% of the time
> updating the index uniq_idx_on_text):

I think this will be useful for such cases.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Bruce Momjian

On Thu, May  8, 2014 at 06:39:11PM -0400, Tom Lane wrote:
> I wrote:
> > I think the idea of hashing only keys/values that are "too long" is a
> > reasonable compromise.  I've not finished coding it (because I keep
> > getting distracted by other problems in the code :-() but it does not
> > look to be very difficult.  I'm envisioning the cutoff as being something
> > like 128 bytes; in practice that would mean that few if any keys get
> > hashed, I think.
> 
> Attached is a draft patch for this.  In addition to the hash logic per se,
> I made these changes:
> 
> * Replaced the K/V prefix bytes with a code that distinguishes the types
> of JSON values.  While this is not of any huge significance for the
> current index search operators, it's basically free to store the info,
> so I think we should do it for possible future use.
> 
> * Fixed the problem with "exists" returning rows it shouldn't.  I
> concluded that the best fix is just to force recheck for exists, which
> allows considerable simplification in the consistent functions.
> 
> * Tried to improve the comments in jsonb_gin.c.
> 
> Barring objections I'll commit this tomorrow, and also try to improve the
> user-facing documentation about the jsonb opclasses.

Looks good.  I was thinking the jsonb_ops name could remain unchanged
and the jsonb_hash_ops could be called jsonb_combo_ops as it combines
the key and value into a single index entry.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> I am not sure what your point is here.  Here's mine: if we can strip
> this down to the executor support plus the most minimal planner
> support possible, we might be able to get *something* committed.  Then
> we can extend it in subsequent commits.

I guess my point is that I see this more-or-less being solved already by
FDWs, but that doesn't address the case when it's a local table, so
perhaps there is something useful our of a commit that allows
replacement of a SeqScan node (which presumably would also be costed
differently).

> You seem to be saying there's no value in getting anything committed
> unless it handles the scan-substituting-for-join case.  I don't agree.
>  Incremental commits are good, whether they get you all the way to
> where you want to be or not.

To be honest, I think this is really the first proposal to replace
specific Nodes, rather than provide a way for a generic Node to exist
(which could also replace joins).  While I do think it's an interesting
idea, and if we could push filters down to this new Node it might even
be worthwhile, I'm not sure that it actually moves us down the path to
supporting Nodes which replace joins.

Still, I'm not against it.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 10:16 PM, Stephen Frost  wrote:
> * Robert Haas (robertmh...@gmail.com) wrote:
>> Well, I consider that somewhat good news, because I think it would be
>> rather nice if we could get by with solving one problem at a time, and
>> if the executor part is close to being well-solved, excellent.
>
> Sadly, I'm afraid the news really isn't all that good in the end..
>
>> My ignorance is probably showing here, but I guess I don't understand
>> why it's so hard to deal with the planner side of things.  My
>> perhaps-naive impression is that a Seq Scan node, or even an Index
>> Scan node, is not all that complicated.  If we just want to inject
>> some more things that behave a lot like those into various baserels, I
>> guess I don't understand why that's especially hard.
>
> That's not what is being asked for here though...

I am not sure what your point is here.  Here's mine: if we can strip
this down to the executor support plus the most minimal planner
support possible, we might be able to get *something* committed.  Then
we can extend it in subsequent commits.

You seem to be saying there's no value in getting anything committed
unless it handles the scan-substituting-for-join case.  I don't agree.
 Incremental commits are good, whether they get you all the way to
where you want to be or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> > So the extension will need to be aware of all custom data types and then
> > installed *after* all other extensions are installed?  That doesn't strike
> > me as workable...
> > 
> I'm not certain why do you think an extension will need to support all
> the data types.

Mostly because we have a very nice extension system which quite a few
different extensions make use of and it'd be pretty darn unfortunate if
none of them could take advtange of GPUs because we decided that the
right way to support GPUs was through an extension.

This is argument which might be familiar to some as it was part of the
reason that json and jsonb were added to core, imv...

> Even if it works only for a particular set of data types, it makes sense
> as long as it covers data types user actually using.

I know quite a few users of PostGIS, ip4r, and hstore...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Peter Geoghegan

On Thu, May 8, 2014 at 7:13 PM, Stephen Frost  wrote:
> Of course, things will change when we are able to parallelize joins
> across multiple CPUs ourselves..  In a way, the PGStrom approach gets to
> "cheat" us today, since it can parallelize the work where core can't and
> that ends up not being an entirely fair comparison.

I was thinking of SIMD, along similar lines. We might be able to cheat
our way out of having to solve some of the difficult problems of
parallelism that way. For example, if you can build a SIMD-friendly
bitonic mergesort, and combine that with poor man's normalized keys,
that could make merge joins on text faster. That's pure speculation,
but it seems like an interesting possibility.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> > > I didn't ask this before but it's been on my mind for a while- how
> > > will this work for custom data types, ala the 'geometry' type from
> PostGIS?
> > > There's user-provided code that we have to execute to check equality
> > > for those, but they're not giving us CUDA code to run to perform that
> equality...
> > >
> > If custom-plan provider support the user-defined data types such as
> > PostGIS, it will be able to pick up these data types also, in addition
> > to built-in ones. It fully depends on coverage of the extension.
> > If not a supported data type, it is not a show-time of GPUs.
> 
> So the extension will need to be aware of all custom data types and then
> installed *after* all other extensions are installed?  That doesn't strike
> me as workable...
> 
I'm not certain why do you think an extension will need to support all
the data types.
Even if it works only for a particular set of data types, it makes sense
as long as it covers data types user actually using.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> > I didn't ask this before but it's been on my mind for a while- how will
> > this work for custom data types, ala the 'geometry' type from PostGIS?
> > There's user-provided code that we have to execute to check equality for
> > those, but they're not giving us CUDA code to run to perform that 
> > equality...
> > 
> If custom-plan provider support the user-defined data types such as PostGIS,
> it will be able to pick up these data types also, in addition to built-in
> ones. It fully depends on coverage of the extension.
> If not a supported data type, it is not a show-time of GPUs.

So the extension will need to be aware of all custom data types and then
installed *after* all other extensions are installed?  That doesn't
strike me as workable...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> GPU has workloads likes and dislikes. It is a reasonable idea to list up
> operators (or something else) that have advantage to run on custom-path.
> For example, numeric calculation on fixed-length variables has greate
> advantage on GPU, but locale aware text matching is not a workload suitable
> to GPUs.

Right- but this points out exactly what I was trying to bring up.

Locale-aware text matching requires running libc-provided code, which
isn't going to happen on the GPU (unless we re-implement it...).
Aren't we going to have the same problem with the 'numeric' type?  Our
existing functions won't be usable on the GPU and we'd have to
re-implement them and then make darn sure that they produce the same
results...

We'll also have to worry about any cases where we have a libc function
and a CUDA function and convince ourselves that there's no difference
between the two..  Not sure exactly how we'd built this kind of
knowledge into the system through a catalog (I tend to doubt that'd
work, in fact) and trying to make it work from an extension in a way
that it would work with *other* extensions strikes me as highly
unlikely.  Perhaps the extension could provide the core types and the
other extensions could provide their own bits to hook into the right
places, but that sure seems fragile.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> * Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> > I initially intended to allow extensions to add their custom-path
> > based on their arbitrary decision, because the core backend cannot
> > have expectation towards the behavior of custom-plan.
> > However, of course, the custom-path that replaces built-in paths shall
> > have compatible behavior in spite of different implementation.
> 
> I didn't ask this before but it's been on my mind for a while- how will
> this work for custom data types, ala the 'geometry' type from PostGIS?
> There's user-provided code that we have to execute to check equality for
> those, but they're not giving us CUDA code to run to perform that equality...
> 
If custom-plan provider support the user-defined data types such as PostGIS,
it will be able to pick up these data types also, in addition to built-in
ones. It fully depends on coverage of the extension.
If not a supported data type, it is not a show-time of GPUs.

In my case, if PG-Strom can also have compatible code, but runnable on OpenCL,
of them, it will say "yes, I can handle this data type".

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Kouhei Kaigai (kai...@ak.jp.nec.com) wrote:
> I initially intended to allow extensions to add their custom-path based
> on their arbitrary decision, because the core backend cannot have
> expectation towards the behavior of custom-plan.
> However, of course, the custom-path that replaces built-in paths shall
> have compatible behavior in spite of different implementation.

I didn't ask this before but it's been on my mind for a while- how will
this work for custom data types, ala the 'geometry' type from PostGIS?
There's user-provided code that we have to execute to check equality for
those, but they're not giving us CUDA code to run to perform that
equality...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> Well, I consider that somewhat good news, because I think it would be
> rather nice if we could get by with solving one problem at a time, and
> if the executor part is close to being well-solved, excellent.

Sadly, I'm afraid the news really isn't all that good in the end..

> My ignorance is probably showing here, but I guess I don't understand
> why it's so hard to deal with the planner side of things.  My
> perhaps-naive impression is that a Seq Scan node, or even an Index
> Scan node, is not all that complicated.  If we just want to inject
> some more things that behave a lot like those into various baserels, I
> guess I don't understand why that's especially hard.

That's not what is being asked for here though...

> Now I do understand that part of what KaiGai wants to do here is
> inject custom scan paths as additional paths for *joinrels*.  And I
> can see why that would be somewhat more complicated.  But I also don't
> see why that's got to be part of the initial commit.

I'd say it's more than "part" of what the goal is here- it's more or
less what everything boils down to.  Oh, plus being able to replace
aggregates with a GPU-based operation instead, but that's no trivially
done thing either really (if it is, let's get it done for FDWs
already...).

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Peter Geoghegan (p...@heroku.com) wrote:
> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai  wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that
> I missed it, but I would find it useful if you could direct me towards
> a benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling
> compared to traditional strategies already implemented in Postgres and
> other systems.

I agree that some concrete evidence would be really nice.  I
more-or-less took KaiGai's word on it, but having actual benchmarks
would certainly be better.

> If I wanted to make joins faster, personally, I would look at
> opportunities to optimize our existing hash joins to take better
> advantage of modern CPU characteristics.

Yeah, I'm pretty confident we're leaving a fair bit on the table right
there based on my previous investigation into this area.  There were
easily cases which showed a 3x improvement, as I recall (the trade-off
being increased memory usage for a larger, sparser hash table).  Sadly,
there were also cases which ended up being worse and it seemed to be
very sensetive to the size of the hash table which ends up being built
and the size of the on-CPU cache.

> A lot of the research
> suggests that it may be useful to implement techniques that take
> better advantage of available memory bandwidth through techniques like
> prefetching and partitioning, perhaps even (counter-intuitively) at
> the expense of compute bandwidth.

While I agree with this, one of the big things about GPUs is that they
operate in a highly parallel fashion and across a different CPU/Memory
architecture than what we're used to (for starters, everything is much
"closer").  In a traditional memory system, there's a lot of back and
forth to memory, but a single memory dump over to the GPU's memory where
everything is processed in a highly parallel way and then shipped back
wholesale to main memory is at least conceivably faster.

Of course, things will change when we are able to parallelize joins
across multiple CPUs ourselves..  In a way, the PGStrom approach gets to
"cheat" us today, since it can parallelize the work where core can't and
that ends up not being an entirely fair comparison.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> > So it seems reasonable to have a way to define/declare what is
> > possible and what is not. But my take is that adding a new column to
> > pg_operator for every CustomJoin node is probably out of the question,
> > hence my suggestion to list the operators we know it can work with.
> 
> It does seem like there should be some work done in this area, as Tom 
> mentioned,
> to avoid needing to have more columns to track how equality can be done.
> I do wonder just how we'd deal with this when it comes to GPUs as, presumably,
> the code to implement the equality for various types would have to be written
> in CUDA-or-whatever.
> 
GPU has workloads likes and dislikes. It is a reasonable idea to list up
operators (or something else) that have advantage to run on custom-path.
For example, numeric calculation on fixed-length variables has greate
advantage on GPU, but locale aware text matching is not a workload suitable
to GPUs.
It may be a good hint for planner to pick up candidate paths to be considered.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 8 May 2014 20:40, Robert Haas  wrote:
> > For my money, we'd be better off
> > getting some kind of basic custom scan node functionality committed
> > first, even if the cases where you can actually inject them into real
> > plans are highly restricted.  Then, we could later work on adding more
> > ways to inject them in more places.
> 
> We're past the prototyping stage and into productionising what we know
> works, AFAIK. If that point is not clear, then we need to discuss that
> first.
> 
> At the moment the Custom join hook is called every time we attempt to
> cost a join, with no restriction.
> 
> I would like to highly restrict this, so that we only consider a
> CustomJoin node when we have previously said one might be usable and
> the user has requested this (e.g. enable_foojoin = on)

This is part of what I disagree with- I'd rather not require users to
know and understand when they want to do a HashJoin vs. a MergeJoin vs.
a CustomJoinTypeX.

> We only consider merge joins if the join uses operators with oprcanmerge=true.
> We only consider hash joins if the join uses operators with oprcanhash=true

I wouldn't consider those generally "user-facing" options, and the
enable_X counterparts are intended for debugging and not to be used in
production environments.  To the point you make in the other thread- I'm
fine w/ having similar cost-based enable_X options, but I think we're
doing our users a disservice if we require that they populate or update
a table.  Perhaps an extension needs to do that on installation, but
that would need to enable everything to avoid the user having to mess
around with the table.

> So it seems reasonable to have a way to define/declare what is
> possible and what is not. But my take is that adding a new column to
> pg_operator for every CustomJoin node is probably out of the question,
> hence my suggestion to list the operators we know it can work with.

It does seem like there should be some work done in this area, as Tom
mentioned, to avoid needing to have more columns to track how equality
can be done.  I do wonder just how we'd deal with this when it comes to
GPUs as, presumably, the code to implement the equality for various
types would have to be written in CUDA-or-whatever.

> Given that everything else in Postgres is agnostic and configurable,
> I'm looking to do the same here.

It's certainly a neat idea, but I do have concerns (which appear to be
shared by others) about just how practical it'll be and how much rework
it'd take and the question about if it'd really be used in the end..

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai  wrote:
> > Umm... I'm now missing the direction towards my goal.
> > What approach is the best way to glue PostgreSQL and PGStrom?
> 
> I haven't really paid any attention to PGStrom. Perhaps it's just that I
> missed it, but I would find it useful if you could direct me towards a
> benchmark or something like that, that demonstrates a representative
> scenario in which the facilities that PGStrom offers are compelling compared
> to traditional strategies already implemented in Postgres and other
> systems.
> 
Implementation of Hash-Join on GPU side is still under development.

Only available use-case right now is an alternative scan path towards
full table scan in case when a table contains massive amount of records
and qualifiers are enough complicated.

EXPLAIN command below is, a sequential scan towards a table that contains
80M records (all of them are on memory; no disk accesses during execution).
Nvidia's GT640 takes advantages towards single threaded Core i5 4570S, at
least.


postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + 
(y-20.0)^2) < 10;
QUERY 
PLAN
--
 Aggregate  (cost=10003175757.67..10003175757.68 rows=1 width=0) (actual 
time=46648.635..46648.635 rows=1 loops=1)
   ->  Seq Scan on t1  (cost=100.00..10003109091.00 rows=2667 
width=0) (actual time=0.047..46351.567 rows=2513814 loops=1)
 Filter: (sqrtx - 20::double precision) ^ 2::double precision) + 
((y - 20::double precision) ^ 2::double precision))) < 10::double precision)
 Rows Removed by Filter: 77486186
 Planning time: 0.066 ms
 Total runtime: 46648.668 ms
(6 rows)
postgres=# set pg_strom.enabled = on;
SET
postgres=# explain (analyze) select count(*) from t1 where sqrt((x-20.0)^2 + 
(y-20.0)^2) < 10;
   
QUERY PLAN
-
 Aggregate  (cost=1274424.33..1274424.34 rows=1 width=0) (actual 
time=1784.729..1784.729 rows=1 loops=1)
   ->  Custom (GpuScan) on t1  (cost=1.00..1207757.67 rows=2667 
width=0) (actual time=179.748..1567.018 rows=2513699 loops=1)
 Host References:
 Device References: x, y
 Device Filter: (sqrtx - 20::double precision) ^ 2::double 
precision) + ((y - 20::double precision) ^ 2::double precision))) < 10::double 
precision)
 Total time to load: 0.231 ms
 Avg time in send-mq: 0.027 ms
 Max time to build kernel: 1.064 ms
 Avg time of DMA send: 3.050 ms
 Total time of DMA send: 933.318 ms
 Avg time of kernel exec: 5.117 ms
 Total time of kernel exec: 1565.799 ms
 Avg time of DMA recv: 0.086 ms
 Total time of DMA recv: 26.289 ms
 Avg time in recv-mq: 0.011 ms
 Planning time: 0.094 ms
 Total runtime: 1784.793 ms
(17 rows)


> If I wanted to make joins faster, personally, I would look at opportunities
> to optimize our existing hash joins to take better advantage of modern CPU
> characteristics. A lot of the research suggests that it may be useful to
> implement techniques that take better advantage of available memory
> bandwidth through techniques like prefetching and partitioning, perhaps
> even (counter-intuitively) at the expense of compute bandwidth. It's
> possible that it just needs to be explained to me, but, with respect,
> intuitively I have a hard time imagining that offloading joins to the GPU
> will help much in the general case. Every paper on joins from the last decade
> talks a lot about memory bandwidth and memory latency. Are you concerned
> with some specific case that I may have missed? In what scenario might a
> cost-based optimizer reasonably prefer a custom join node implemented by
> PgStrom, over any of the existing join node types? It's entirely possible
> that I simply missed relevant discussions here.
> 
If our purpose is to consume 100% capacity of GPU device, memory bandwidth
is troublesome. But I'm not interested in GPU benchmarking.
Things I want to do is, accelerate complicated query processing than existing
RDBMS, with cheap in cost and transparent to existing application approach.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> On Thu, May 8, 2014 at 4:43 PM, Tom Lane  wrote:
> > I thought that the executor side of his patch wasn't in bad shape.
> > The real problems were in the planner, and indeed largely in the "backend"
> > part of the planner where there's a lot of hard-wired logic for fixing
> > up low-level details of the constructed plan tree.  It seems like in
> > principle it might be possible to make that logic cleanly extensible,
> > but it'll likely take a major rewrite.  The patch tried to skate by
> > with just exposing a bunch of internal functions, which I don't think
> > is a maintainable approach, either for the core or for the extensions
> using it.
> 
> Well, I consider that somewhat good news, because I think it would be rather
> nice if we could get by with solving one problem at a time, and if the 
> executor
> part is close to being well-solved, excellent.
> 
> My ignorance is probably showing here, but I guess I don't understand why
> it's so hard to deal with the planner side of things.  My perhaps-naive
> impression is that a Seq Scan node, or even an Index Scan node, is not all
> that complicated.  If we just want to inject some more things that behave
> a lot like those into various baserels, I guess I don't understand why that's
> especially hard.
> 
> Now I do understand that part of what KaiGai wants to do here is inject
> custom scan paths as additional paths for *joinrels*.  And I can see why
> that would be somewhat more complicated.  But I also don't see why that's
> got to be part of the initial commit.
> 
I'd also like to take this approach. Even though we eventually need to take
a graceful approach for join replacement by custom-path, it partially makes
sense to have minimum functionality set first.

Then, we can focus on how to design planner integration for joinning.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> > I also think that there are really two separate problems here: getting
> > the executor to call a custom scan node when it shows up in the plan
> > tree; and figuring out how to get it into the plan tree in the first
> > place.  I'm not sure we've properly separated those problems, and I'm
> > not sure into which category the issues that sunk KaiGai's 9.4 patch
> > fell.
> 
> I thought that the executor side of his patch wasn't in bad shape.  The
> real problems were in the planner, and indeed largely in the "backend"
> part of the planner where there's a lot of hard-wired logic for fixing up
> low-level details of the constructed plan tree.  It seems like in principle
> it might be possible to make that logic cleanly extensible, but it'll likely
> take a major rewrite.  The patch tried to skate by with just exposing a
> bunch of internal functions, which I don't think is a maintainable approach,
> either for the core or for the extensions using it.
> 
(I'm now trying to catch up the discussion last night...)

I initially intended to allow extensions to add their custom-path based
on their arbitrary decision, because the core backend cannot have
expectation towards the behavior of custom-plan.
However, of course, the custom-path that replaces built-in paths shall
have compatible behavior in spite of different implementation.

So, I'm inclined to the direction that custom-plan provider will inform
the core backend what they can do, and planner will give extensions more
practical information to construct custom path node.

Let me investigate how to handle join replacement by custom-path in the
planner stage.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 4:43 PM, Tom Lane  wrote:
> I thought that the executor side of his patch wasn't in bad shape.  The
> real problems were in the planner, and indeed largely in the "backend"
> part of the planner where there's a lot of hard-wired logic for fixing up
> low-level details of the constructed plan tree.  It seems like in
> principle it might be possible to make that logic cleanly extensible,
> but it'll likely take a major rewrite.  The patch tried to skate by with
> just exposing a bunch of internal functions, which I don't think is a
> maintainable approach, either for the core or for the extensions using it.

Well, I consider that somewhat good news, because I think it would be
rather nice if we could get by with solving one problem at a time, and
if the executor part is close to being well-solved, excellent.

My ignorance is probably showing here, but I guess I don't understand
why it's so hard to deal with the planner side of things.  My
perhaps-naive impression is that a Seq Scan node, or even an Index
Scan node, is not all that complicated.  If we just want to inject
some more things that behave a lot like those into various baserels, I
guess I don't understand why that's especially hard.

Now I do understand that part of what KaiGai wants to do here is
inject custom scan paths as additional paths for *joinrels*.  And I
can see why that would be somewhat more complicated.  But I also don't
see why that's got to be part of the initial commit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-08 Thread Michael Paquier

On Fri, May 9, 2014 at 2:50 AM, MauMau  wrote:
> From: "Bruce Momjian" 
>>
>> I have completed the initial version of the 9.4 release notes.  You can
>> view them here:
>>
>> http://www.postgresql.org/docs/devel/static/release-9-4.html
>>
>> Feedback expected and welcomed.  I expect to be modifying this until we
>> release 9.4 final.  I have marked items where I need help with question
>> marks.
>
>
> Could you add the following item, client-only installation on Windows, if
> it's appropriate for release note?  It will be useful for those like
> EnterpriseDB who develop products derived from PostgreSQL.
>
> https://commitfest.postgresql.org/action/patch_view?id=1326
+1. Thanks for pointing that out as well!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: popen and pclose redefinitions causing many warning in Windows build

2014-05-08 Thread Noah Misch

On Thu, May 08, 2014 at 12:14:44PM -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > I'm pretty sure we need this on Mingw - this SYSTEMQUOTE stuff dates 
> > back well before 8.3, IIRC, which is when we first got full MSVC support.
> 
> I tried googling for some info on this, and got a number of hits
> suggesting that mingw didn't emulate popen at all till pretty recently.
> For instance this:
> https://lists.fedoraproject.org/pipermail/fedora-mingw/2009-September/002087.html
> Jones is an ex-coworker of mine, and I'm pretty sure that if he said
> it wasn't there then it wasn't there.

I doubt MinGW has overridden popen() at runtime; that would be contrary to its
design criteria.  The headers, however, are MinGW territory.  MinGW declares
both _popen() and popen() as functions.  MinGW-w64, a project more distinct
from MinGW than it sounds, uses "#define popen _popen":

MinGW: 
http://sourceforge.net/p/mingw/mingw-org-wsl/ci/master/tree/include/stdio.h#l467
MinGW-w64: 
http://sourceforge.net/p/mingw-w64/code/HEAD/tree/trunk/mingw-w64-headers/crt/stdio.h#l496

Building with any recent MinGW-w64, 32-bit or 64-bit, gets the reported
warnings; building with MinGW proper does not.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compilation errors with mingw build caused by undefined optreset

2014-05-08 Thread Michael Paquier

On Fri, May 9, 2014 at 12:30 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> Since commit 60ff2fd introducing the centralizated getopt-related
>> things in a global header file, build on Windows with mingw is failing
>
> Hm, buildfarm member narwhal doesn't seem to be having any such problem.
> (It's got its own issues, but not this.)
>
> Do you think you could put up a buildfarm critter using whatever version
> of mingw you're using?  Without buildfarm feedback, things *will* break
> regularly; Windows is just too weird to expect otherwise.
Yeah, after sleeping on this I had the same thought. I'll try to get
one working properly.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compilation errors with mingw build caused by undefined optreset

2014-05-08 Thread Michael Paquier

On Fri, May 9, 2014 at 12:55 AM, Andrew Dunstan  wrote:
> On 05/08/2014 11:30 AM, Tom Lane wrote:
>> Michael Paquier  writes:
>>>
>>> Since commit 60ff2fd introducing the centralizated getopt-related
>>> things in a global header file, build on Windows with mingw is failing
>>
>> Hm, buildfarm member narwhal doesn't seem to be having any such problem.
>> (It's got its own issues, but not this.)
> jacana and frogmouth are also Mingw animals, and are not having issues.
Seems like the version I am using, which is the one actually specified
on the wiki here (
http://www.postgresql.org/docs/devel/static/installation-platform-notes.html#INSTALLATION-NOTES-MINGW)
is able to reproduce that...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue with GRANT/COMMENT ON FUNCTION with default

2014-05-08 Thread Jim Nasby

On 5/5/14, 3:22 PM, Alvaro Herrera wrote: 
> Jim Nasby wrote: 
>> Prior to default parameters on functions, GRANT and COMMENT accepted full 
>> parameter syntax. IE: 
>> 
>> GRANT EXECUTE ON test(t text) TO public 
>> 
>> as opposed to regprocedure, which only accepts the data types ( test(text), 
>> not test(t text) ). 
>> 
>> They do not accept DEFAULT though: 
>> 
>> GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; 
>> ERROR:  syntax error at or near "DEFAULT" 
>> LINE 1: GRANT EXECUTE ON FUNCTION test(t text DEFAULT '') to public; 
>> 
>> Presumably this is just an oversight? 
> 
> I have to say that accepting DEFAULT there seems pretty odd to me.  What 
> if you specify the wrong default?  Do you get a "no such function" 
> error?  That would be pretty unhelpful.  But then accepting it ignoring 
> the fact that the default is wrong would be rather strange. 

We already have that exact problem with the name of the argument. 

decibel@decina.cashnetusa=# CREATE FUNCTION test(t text default '') RETURNS 
text LANGUAGE sql AS 'SELECT $1'; 
CREATE FUNCTION 
decibel@decina.cashnetusa=# GRANT EXECUTE ON FUNCTION test(baz text) to public; 
GRANT 
decibel@decina.cashnetusa=# 

>> Related to that, is it intentional that the regprocedure cast 
>> disallows *any* decorators to the function, other than type? If 
>> regprocedure at least accepted the full function parameter definition 
>> you could use it to get a definitive reference to a function. 
> 
> Does pg_identify_object() give you what you want? 

No, because I'd need an OID, and I have no way of reliably getting that because 
the regprocedure cast won't even accept additional information beyond data type.
-- 
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Tomas Vondra

On 8.5.2014 23:48, Andrew Dunstan wrote:
> 
> On 05/08/2014 05:21 PM, Alvaro Herrera wrote:
>> Andrew Dunstan wrote:
>>
>>> I really don't get what your objection to the setup is. And no, I
>>> don't want them to run concurrently, I'd rather spread out the
>>> cycles.
>> I wasn't objecting, merely an observation. Note that Tomas
>> mentioned he's okay with running 4 builds at once. My main point
>> here, really, is that having a larger number of animals shouldn't
>> be an impediment for a more complex permutation of configurations,
>> if he's okay with doing that. I assume you wouldn't object to my
>> approving four extra animals running on the same machine, if Tomas
>> wants to go for that.

So, if I get this right, the proposal is to have 7 animals:


1) all branches/locales, frequent builds (every few hours)
  magpie  - gcc
  fulmar  - icc
  treepie - clang

2) single branch/locale, CLOBBER, built once a week
  magpie2 - gcc
  fulmar2 - icc
  treepie - clang

3) single branch/locale, recursive CLOBBER, built once a month


I don't particularly mind the number of animals, although I was shooting
for lower number.

The only question is - should we use 3 animals for the recursive CLOBBER
too? I mean, one for each compiler?

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread David G Johnston

Tom Lane-2 wrote
> Robert Haas <

> robertmhaas@

> > writes:
>> On Thu, May 8, 2014 at 1:51 PM, Tom Lane <

> tgl@.pa

> > wrote:
>>> Robert Haas <

> robertmhaas@

> > writes:
 OK.  It still seems to me that there's a doc fix needed here, if
 nothing else.  The documentation for PQputCopyEnd() clearly implies
 that if you get a return value of 1, the message is sent, and that's
 just not true.
> 
>>> That's fair.
> 
>> Something like the attached?
> 
>> I don't really know what to say about 0, since it's never actually
>> returned, so I left that out.  But I'm up for suggestions.
> 
> I think you should leave the existing verbiage about zero alone.
> If we take it out, we're painting ourselves into a corner about
> ever fixing the buffer-is-too-full failure case.
> 
> Perhaps the text should be like this:
> 
> The result is 1 if the termination message was sent; or in nonblocking
> mode, this may only indicate that the termination message was successfully
> queued.  (In nonblocking mode, to be certain that the data has been sent,
> you should next wait for write-ready and call 
> 
> PQflush
> 
> ,
> repeating until it returns zero.)  Zero indicates that the function could
> not queue the termination message because of full buffers; this will only
> happen in nonblocking mode.  (In this case, wait for write-ready and try
> the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
> can use 
> 
> PQerrorMessage
> 
>  to retrieve details.
> 
> If it bothers you to document behavior that's not actually there yet,
> maybe you should go fix the bug ;-).  I had thought this would require
> a lot of refactoring, but it might work to just check if there's enough
> buffer space for the message and return zero immediately if not.

Had trouble comprehending Tom's wording so decided to word-smith a little:

The return value can be one of [-1, 0 , 1] whose meaning depends on whether
you are in blocking or non-blocking mode.  
In blocking mode a 1 means success and -1 means permanent failure; it is not
possible to receive a 0 in blocking mode - all calls will either succeed (1)
or fail (-1).
In non-blocking mode a 1 means that the termination message was placed in
the queue while a -1 means an immediate and permanent failure. A return
value of 0 means that the queue was full and that you need to try again at
the next wait-ready.
[?]While in non-blocking mode once you have received a return value of 1 you
will need to continually poll, at each wait-ready, PQflush
until it returns 0 (queue empty) or -1 (failure).
The error associated with the -1 return can be retrieved using
PQerrorMessage

[?]Isn't the whole PQflush comment simply repeating what is written
elsewhere?  The requirement is standard when working in non-blocking mode.

It does seem like the "buffer full" case would be fairly direct to
resolve...and I'm not sure what the explanation would be, in the doc above,
as to why "0 is not returned by the current implementation because (the
queue can never be full, or, even in non-blocking mode the system will block
to add to the queue in this function)

David J.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PQputCopyEnd-doesn-t-adhere-to-its-API-contract-tp5803240p5803297.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Tom Lane

I wrote:
> I think the idea of hashing only keys/values that are "too long" is a
> reasonable compromise.  I've not finished coding it (because I keep
> getting distracted by other problems in the code :-() but it does not
> look to be very difficult.  I'm envisioning the cutoff as being something
> like 128 bytes; in practice that would mean that few if any keys get
> hashed, I think.

Attached is a draft patch for this.  In addition to the hash logic per se,
I made these changes:

* Replaced the K/V prefix bytes with a code that distinguishes the types
of JSON values.  While this is not of any huge significance for the
current index search operators, it's basically free to store the info,
so I think we should do it for possible future use.

* Fixed the problem with "exists" returning rows it shouldn't.  I
concluded that the best fix is just to force recheck for exists, which
allows considerable simplification in the consistent functions.

* Tried to improve the comments in jsonb_gin.c.

Barring objections I'll commit this tomorrow, and also try to improve the
user-facing documentation about the jsonb opclasses.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c
index 592036a..2c4ade2 100644
*** a/src/backend/utils/adt/jsonb_gin.c
--- b/src/backend/utils/adt/jsonb_gin.c
***
*** 14,19 
--- 14,20 
  #include "postgres.h"
  
  #include "access/gin.h"
+ #include "access/hash.h"
  #include "access/skey.h"
  #include "catalog/pg_collation.h"
  #include "catalog/pg_type.h"
*** typedef struct PathHashStack
*** 26,39 
  	struct PathHashStack *parent;
  } PathHashStack;
  
! static text *make_text_key(const char *str, int len, char flag);
! static text *make_scalar_key(const JsonbValue *scalarVal, char flag);
  
  /*
   *
   * jsonb_ops GIN opclass support functions
   *
   */
  Datum
  gin_compare_jsonb(PG_FUNCTION_ARGS)
  {
--- 27,41 
  	struct PathHashStack *parent;
  } PathHashStack;
  
! static Datum make_text_key(char flag, const char *str, int len);
! static Datum make_scalar_key(const JsonbValue *scalarVal, bool is_key);
  
  /*
   *
   * jsonb_ops GIN opclass support functions
   *
   */
+ 
  Datum
  gin_compare_jsonb(PG_FUNCTION_ARGS)
  {
*** gin_extract_jsonb(PG_FUNCTION_ARGS)
*** 65,144 
  {
  	Jsonb	   *jb = (Jsonb *) PG_GETARG_JSONB(0);
  	int32	   *nentries = (int32 *) PG_GETARG_POINTER(1);
- 	Datum	   *entries = NULL;
  	int			total = 2 * JB_ROOT_COUNT(jb);
- 	int			i = 0,
- r;
  	JsonbIterator *it;
  	JsonbValue	v;
  
  	if (total == 0)
  	{
  		*nentries = 0;
  		PG_RETURN_POINTER(NULL);
  	}
  
  	entries = (Datum *) palloc(sizeof(Datum) * total);
  
  	it = JsonbIteratorInit(&jb->root);
  
  	while ((r = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
  	{
  		if (i >= total)
  		{
  			total *= 2;
  			entries = (Datum *) repalloc(entries, sizeof(Datum) * total);
  		}
  
- 		/*
- 		 * Serialize keys and elements equivalently,  but only when elements
- 		 * are Jsonb strings.  Otherwise, serialize elements as values.  Array
- 		 * elements are indexed as keys, for the benefit of
- 		 * JsonbExistsStrategyNumber.  Our definition of existence does not
- 		 * allow for checking the existence of a non-jbvString element (just
- 		 * like the definition of the underlying operator), because the
- 		 * operator takes a text rhs argument (which is taken as a proxy for
- 		 * an equivalent Jsonb string).
- 		 *
- 		 * The way existence is represented does not preclude an alternative
- 		 * existence operator, that takes as its rhs value an arbitrarily
- 		 * internally-typed Jsonb.  The only reason that isn't the case here
- 		 * is that the existence operator is only really intended to determine
- 		 * if an object has a certain key (object pair keys are of course
- 		 * invariably strings), which is extended to jsonb arrays.  You could
- 		 * think of the default Jsonb definition of existence as being
- 		 * equivalent to a definition where all types of scalar array elements
- 		 * are keys that we can check the existence of, while just forbidding
- 		 * non-string notation.  This inflexibility prevents the user from
- 		 * having to qualify that the rhs string is a raw scalar string (that
- 		 * is, naturally no internal string quoting in required for the text
- 		 * argument), and allows us to not set the reset flag for
- 		 * JsonbExistsStrategyNumber, since we know that keys are strings for
- 		 * both objects and arrays, and don't have to further account for type
- 		 * mismatch.  Not having to set the reset flag makes it less than
- 		 * tempting to tighten up the definition of existence to preclude
- 		 * array elements entirely, which would arguably be a simpler
- 		 * alternative. In any case the infrastructure used to implement the
- 		 * existence operator could trivially support this hypothetical,
- 		 * slightly distinct definition of existence.
- 		 */

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 05:21 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


I really don't get what your objection to the setup is. And no, I
don't want them to run concurrently, I'd rather spread out the
cycles.

I wasn't objecting, merely an observation.  Note that Tomas mentioned
he's okay with running 4 builds at once.  My main point here, really, is
that having a larger number of animals shouldn't be an impediment for a
more complex permutation of configurations, if he's okay with doing
that.  I assume you wouldn't object to my approving four extra animals
running on the same machine, if Tomas wants to go for that.




No, that's fine.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Peter Geoghegan

On Thu, May 8, 2014 at 6:34 AM, Kouhei Kaigai  wrote:
> Umm... I'm now missing the direction towards my goal.
> What approach is the best way to glue PostgreSQL and PGStrom?

I haven't really paid any attention to PGStrom. Perhaps it's just that
I missed it, but I would find it useful if you could direct me towards
a benchmark or something like that, that demonstrates a representative
scenario in which the facilities that PGStrom offers are compelling
compared to traditional strategies already implemented in Postgres and
other systems.

If I wanted to make joins faster, personally, I would look at
opportunities to optimize our existing hash joins to take better
advantage of modern CPU characteristics. A lot of the research
suggests that it may be useful to implement techniques that take
better advantage of available memory bandwidth through techniques like
prefetching and partitioning, perhaps even (counter-intuitively) at
the expense of compute bandwidth. It's possible that it just needs to
be explained to me, but, with respect, intuitively I have a hard time
imagining that offloading joins to the GPU will help much in the
general case. Every paper on joins from the last decade talks a lot
about memory bandwidth and memory latency. Are you concerned with some
specific case that I may have missed? In what scenario might a
cost-based optimizer reasonably prefer a custom join node implemented
by PgStrom, over any of the existing join node types? It's entirely
possible that I simply missed relevant discussions here.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Alvaro Herrera

Andrew Dunstan wrote:

> I really don't get what your objection to the setup is. And no, I
> don't want them to run concurrently, I'd rather spread out the
> cycles.

I wasn't objecting, merely an observation.  Note that Tomas mentioned
he's okay with running 4 builds at once.  My main point here, really, is
that having a larger number of animals shouldn't be an impediment for a
more complex permutation of configurations, if he's okay with doing
that.  I assume you wouldn't object to my approving four extra animals
running on the same machine, if Tomas wants to go for that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 21:55, Tom Lane  wrote:

> So I'm not real sure how we move forward.  Maybe something to brainstorm
> about in Ottawa.

I'm just about to go on away for a week, so that's probably the best
place to leave (me out of) the discussion until Ottawa.

I've requested some evidence this hardware route is worthwhile from my
contacts, so we'll see what we get. Presumably Kaigai has something to
share already also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Tom Lane

Robert Haas  writes:
> On Thu, May 8, 2014 at 1:51 PM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> OK.  It still seems to me that there's a doc fix needed here, if
>>> nothing else.  The documentation for PQputCopyEnd() clearly implies
>>> that if you get a return value of 1, the message is sent, and that's
>>> just not true.

>> That's fair.

> Something like the attached?

> I don't really know what to say about 0, since it's never actually
> returned, so I left that out.  But I'm up for suggestions.

I think you should leave the existing verbiage about zero alone.
If we take it out, we're painting ourselves into a corner about
ever fixing the buffer-is-too-full failure case.

Perhaps the text should be like this:

The result is 1 if the termination message was sent; or in nonblocking
mode, this may only indicate that the termination message was successfully
queued.  (In nonblocking mode, to be certain that the data has been sent,
you should next wait for write-ready and call PQflush,
repeating until it returns zero.)  Zero indicates that the function could
not queue the termination message because of full buffers; this will only
happen in nonblocking mode.  (In this case, wait for write-ready and try
the PQputCopyEnd call again.)  If a hard error occurs, -1 is returned; you
can use PQerrorMessage to retrieve details.

If it bothers you to document behavior that's not actually there yet,
maybe you should go fix the bug ;-).  I had thought this would require
a lot of refactoring, but it might work to just check if there's enough
buffer space for the message and return zero immediately if not.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 04:54 PM, Andrew Dunstan wrote:




Why? This was actually discussed when I set this up and Tom opined 
that a once a day run with CLOBBER_CACHE_ALWAYS was plenty. It takes 
about 4 /12 hours. The rest of the time nightjar runs. friarbird runs 
a bit after midnight US East Coast time, which is generally a slowish 
time for commits, so not running nightjar at that time seems perfectly 
reasonable.





er, that's 4 1/2 hours.

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Tom Lane

Simon Riggs  writes:
> We only consider merge joins if the join uses operators with oprcanmerge=true.
> We only consider hash joins if the join uses operators with oprcanhash=true

> So it seems reasonable to have a way to define/declare what is
> possible and what is not. But my take is that adding a new column to
> pg_operator for every CustomJoin node is probably out of the question,
> hence my suggestion to list the operators we know it can work with.

For what that's worth, I'm not sure that either the oprcanmerge or
oprcanhash columns really pull their weight.  We could dispense with both
at the cost of doing some wasted lookups in pg_amop.  (Perhaps we should
replace them with a single "oprisequality" column, which would amount to
a hint that it's worth looking for hash or merge properties, or for other
equality-ish properties in future.)

So I think something comparable to an operator class is indeed a better
approach than adding more columns to pg_operator.  Other than the
connection to pg_am, you could pretty nearly just use the operator class
infrastructure as-is for a lot of operator-property things like this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Tom Lane

Simon Riggs  writes:
> On 8 May 2014 20:40, Robert Haas  wrote:
>> For my money, we'd be better off
>> getting some kind of basic custom scan node functionality committed
>> first, even if the cases where you can actually inject them into real
>> plans are highly restricted.  Then, we could later work on adding more
>> ways to inject them in more places.

> We're past the prototyping stage and into productionising what we know
> works, AFAIK. If that point is not clear, then we need to discuss that
> first.

OK, I'll bite: what here do we know works?  Not a damn thing AFAICS;
it's all speculation that certain hooks might be useful, and speculation
that's not supported by a lot of evidence.  If you think this isn't
prototyping, I wonder what you think *is* prototyping.

It seems likely to me that our existing development process is not
terribly well suited to developing a good solution in this area.
We need to be able to try some things and throw away what doesn't
work; but the project's mindset is not conducive to throwing features
away once they've appeared in a shipped release.  And the other side
of the coin is that trying these things is not inexpensive: you have
to write some pretty serious code before you have much of a feel for
whether a planner hook API is actually any good.  So by the time
you've built something of the complexity of, say, contrib/postgres_fdw,
you don't really want to throw that away in the next major release.
And that's at the bottom end of the scale of the amount of work that'd
be needed to do anything with the sorts of interfaces we're discussing.

So I'm not real sure how we move forward.  Maybe something to brainstorm
about in Ottawa.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 04:09 PM, Alvaro Herrera wrote:

Andrew Dunstan wrote:


Here is what I do on my FreeBSD VM. I have 2 animals, nightjar and
friarbird. They have the same buildroot. friarbird is set up to
build with CLOBBER_CACHE_ALWAYS, building just HEAD and just testing
C locale; nightjar builds all branches we are interested in and
tests locale cs_CZ.utf8 in addition to C.

So nightjar would build frequently almost all the time, but as soon as
friarbird is doing a CLOBBER run nightjar would just stop running?  That
seems a bit odd, given that the CLOBBER runs take a lot longer than
non-CLOBBER ones.  (I guess it makes sense if you don't want to devote
double CPU time now and then to running the CLOBBER animal, but other
than that there doesn't seem to be a point to setting it up like that.)




Why? This was actually discussed when I set this up and Tom opined that 
a once a day run with CLOBBER_CACHE_ALWAYS was plenty. It takes about 4 
/12 hours. The rest of the time nightjar runs. friarbird runs a bit 
after midnight US East Coast time, which is generally a slowish time for 
commits, so not running nightjar at that time seems perfectly reasonable.


I really don't get what your objection to the setup is. And no, I don't 
want them to run concurrently, I'd rather spread out the cycles.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Tom Lane

Robert Haas  writes:
> I'm pretty skeptical about this whole line of inquiry.  We've only got
> three kinds of joins, and each one of them has quite a bit of bespoke
> logic, and all of this code is pretty performance-sensitive on large
> join nests.  If there's a way to make this work for KaiGai's use case
> at all, I suspect something really lightweight like a hook, which
> should have negligible impact on other workloads, is a better fit than
> something involving system catalog access.  But I might be wrong.

We do a great deal of catalog consultation already during planning,
so I think a few more wouldn't be a problem, especially if the planner
is smart enough to touch the catalogs just once (per query?) and cache
the results.  However, your point about lots of bespoke logic is dead
on, and I'm afraid it's damn near a fatal objection.  As just one example,
if we did not have merge joins then an awful lot of what the planner does
with path keys simply wouldn't exist, or at least would look a lot
different than it does.  Without that infrastructure, I can't imagine
that a plugin approach would be able to plan mergejoins anywhere near as
effectively.  Maybe there's a way around this issue, but it sure won't
just be a pg_am-like API.

> I also think that there are really two separate problems here: getting
> the executor to call a custom scan node when it shows up in the plan
> tree; and figuring out how to get it into the plan tree in the first
> place.  I'm not sure we've properly separated those problems, and I'm
> not sure into which category the issues that sunk KaiGai's 9.4 patch
> fell.

I thought that the executor side of his patch wasn't in bad shape.  The
real problems were in the planner, and indeed largely in the "backend"
part of the planner where there's a lot of hard-wired logic for fixing up
low-level details of the constructed plan tree.  It seems like in
principle it might be possible to make that logic cleanly extensible,
but it'll likely take a major rewrite.  The patch tried to skate by with
just exposing a bunch of internal functions, which I don't think is a
maintainable approach, either for the core or for the extensions using it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 20:10, Stephen Frost  wrote:

>> * A USERSET mechanism to allow users to turn it off for testing or
>> otherwise, at user, database level
>
> If we re-implement our existing components through this ("eat our own
> dogfood" as it were), I'm not sure that we'd be able to have a way to
> turn it on/off..  I realize we wouldn't have to, but then it seems like
> we'd have two very different code paths and likely a different level of
> support / capability afforded to "external" storage systems and then I
> wonder if we're not back to just FDWs again..

We have SET enable_hashjoin = on | off

I would like a way to do the equivalent of SET enable_mycustomjoin =
off so that when it starts behaving weirdly in production, I can turn
it off so we can prove that is not the casue, or keep it turned off if
its a problem. I don't want to have to call a hook and let the hook
decide whether it can be turned off or not.

Postgres should be in control of the plugin, not give control to the
plugin every time and hope it gives us control back.

(I'm trying to take the "FDW isn't the right way" line of thinking to
its logical conclusions, so we can decide).

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Alvaro Herrera

Andrew Dunstan wrote:

> Here is what I do on my FreeBSD VM. I have 2 animals, nightjar and
> friarbird. They have the same buildroot. friarbird is set up to
> build with CLOBBER_CACHE_ALWAYS, building just HEAD and just testing
> C locale; nightjar builds all branches we are interested in and
> tests locale cs_CZ.utf8 in addition to C.

So nightjar would build frequently almost all the time, but as soon as
friarbird is doing a CLOBBER run nightjar would just stop running?  That
seems a bit odd, given that the CLOBBER runs take a lot longer than
non-CLOBBER ones.  (I guess it makes sense if you don't want to devote
double CPU time now and then to running the CLOBBER animal, but other
than that there doesn't seem to be a point to setting it up like that.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 20:40, Robert Haas  wrote:

> For my money, we'd be better off
> getting some kind of basic custom scan node functionality committed
> first, even if the cases where you can actually inject them into real
> plans are highly restricted.  Then, we could later work on adding more
> ways to inject them in more places.

We're past the prototyping stage and into productionising what we know
works, AFAIK. If that point is not clear, then we need to discuss that
first.

At the moment the Custom join hook is called every time we attempt to
cost a join, with no restriction.

I would like to highly restrict this, so that we only consider a
CustomJoin node when we have previously said one might be usable and
the user has requested this (e.g. enable_foojoin = on)

We only consider merge joins if the join uses operators with oprcanmerge=true.
We only consider hash joins if the join uses operators with oprcanhash=true

So it seems reasonable to have a way to define/declare what is
possible and what is not. But my take is that adding a new column to
pg_operator for every CustomJoin node is probably out of the question,
hence my suggestion to list the operators we know it can work with.

Given that everything else in Postgres is agnostic and configurable,
I'm looking to do the same here.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Andrew Dunstan





On 05/08/2014 12:21 PM, Tomas Vondra wrote:

On 6.5.2014 23:01, Tomas Vondra wrote:

On 6.5.2014 22:24, Tom Lane wrote:

Tomas Vondra  writes:

I recall there was a call for more animals with CLOBBER_CACHE_ALWAYS
some time ago, so I went and enabled that on all three animals. Let's
see how long that will take.
I see there are more 'clobber' options in the code: CLOBBER_FREED_MEMORY
and CLOBBER_CACHE_RECURSIVELY. Would that be a good idea to enable these
as well?
The time requirements will be much higher (especially for the
RECURSIVELY option), but running that once a week shouldn't be a big
deal - the machine is pretty much dedicated to the buildfarm.

I've never had the patience to run the regression tests to completion
with CLOBBER_CACHE_RECURSIVELY at all, let alone do it on a regular
basis. (I wonder if there's some easy way to run it for just a few
regression tests...)

Now, that's a challenge ;-)


I think testing CLOBBER_FREED_MEMORY would be sensible though.

OK, I've enabled this for now.

H, with CLOBBER_CACHE_ALWAYS + CLOBBER_FREED_MEMORY the tests take
~20h on a single branch/animal. With a single locale (e.g. "C") it would
take ~4h, but we're testing a bunch of additional czech/slovak locales.

The tests are running in sequence (magpie->treepie->fulmar) so with all
6 branches, this would take ~14 days to complete. I don't mind the
machine is running tests 100% of the time, that's why it's in buildfarm,
but I'd rather see the failures soon after the commit (and two weeks is
well over the "soon" edge, IMHO).

So I'm thinking about how to improve this. I'd like to keep the options
for all the branches (e.g. not just HEAD, as a few other animals do).
But I'm thinking about running the tests in parallel, somehow - the
machine has 4 cores, and most of the time only one of them is used. I
don't expect a perfect ~3x speedup, but getting ~2x would be nice.

Any recommendations how to do that? I see there's 'base_port' in the
config - is it enough to tweak this, or do I need to run separate the
animals using e.g. lxc?



Here is what I do on my FreeBSD VM. I have 2 animals, nightjar and 
friarbird. They have the same buildroot. friarbird is set up to build 
with CLOBBER_CACHE_ALWAYS, building just HEAD and just testing C locale; 
nightjar builds all branches we are interested in and tests locale 
cs_CZ.utf8 in addition to C.


Other than those differences they are pretty similar.

Here is the crontab that drives them:

   27 5-22 * * * cd bf && ./run_branches.pl --run-all --verbose
   --config=nightjarx.conf >> bf.out 2>&1
   20 0 * * * cd bf && ./run_branches.pl --run-all --verbose
   --config=friarbird.conf --skip-steps=install-check >> bf.out 2>&1


The buildfarm code has enough locking smarts to make sure we don't get 
any build collisions doing this.


If you have an animal to do a special type of build (e.g. CLOBBER_foo) 
then it's probably a good idea to set a note for that animal - see the 
buildfarm program setnotes.pl. friarbird has the note set "Uses 
-DCLOBBER_CACHE_ALWAYS".



If you want to do this in parallel, then you will need different 
buildroots and different base ports for each animal. I would not run the 
same animal on different branches concurrently, that is quite likely to 
end up in port collisions.



HTH.

cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 3:10 PM, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> It would seem normal and natural to have
>>
>> * pg_joinam catalog table for "join methods" with a join method API
>> Which would include some way of defining which operators/datatypes we
>> consider this for, so if PostGIS people come up with some fancy GIS
>> join thing, we don't invoke it every time even when its inapplicable.
>> I would prefer it if PostgreSQL also had some way to control when the
>> joinam was called, possibly with some kind of table_size_threshold on
>> the AM tuple, which could be set to >=0 to control when this was even
>> considered.
>
> It seems useful to think about how we would redefine our existing join
> methods using such a structure.  While thinking about that, it seems
> like we would worry more about what the operators provide rather than
> the specific operators themselves (ala hashing / HashJoin) and I'm not
> sure we really care about the data types directly- just about the
> operations which we can do on them..

I'm pretty skeptical about this whole line of inquiry.  We've only got
three kinds of joins, and each one of them has quite a bit of bespoke
logic, and all of this code is pretty performance-sensitive on large
join nests.  If there's a way to make this work for KaiGai's use case
at all, I suspect something really lightweight like a hook, which
should have negligible impact on other workloads, is a better fit than
something involving system catalog access.  But I might be wrong.

I also think that there are really two separate problems here: getting
the executor to call a custom scan node when it shows up in the plan
tree; and figuring out how to get it into the plan tree in the first
place.  I'm not sure we've properly separated those problems, and I'm
not sure into which category the issues that sunk KaiGai's 9.4 patch
fell.  Most of this discussion seems like it's about the latter
problem, but we need to solve both.  For my money, we'd be better off
getting some kind of basic custom scan node functionality committed
first, even if the cases where you can actually inject them into real
plans are highly restricted.  Then, we could later work on adding more
ways to inject them in more places.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> It would seem normal and natural to have
> 
> * pg_joinam catalog table for "join methods" with a join method API
> Which would include some way of defining which operators/datatypes we
> consider this for, so if PostGIS people come up with some fancy GIS
> join thing, we don't invoke it every time even when its inapplicable.
> I would prefer it if PostgreSQL also had some way to control when the
> joinam was called, possibly with some kind of table_size_threshold on
> the AM tuple, which could be set to >=0 to control when this was even
> considered.

It seems useful to think about how we would redefine our existing join
methods using such a structure.  While thinking about that, it seems
like we would worry more about what the operators provide rather than
the specific operators themselves (ala hashing / HashJoin) and I'm not
sure we really care about the data types directly- just about the
operations which we can do on them..

I can see a case for sticking data types into this if we feel that we
have to constrain the path possibilities for some reason, but I'd rather
try and deal with any issues around "it doesn't make sense to do X
because we'll know it'll be really expensive" through the cost model
instead of with a table that defines what's allowed or not allowed.
There may be cases where we get the costing wrong and it's valuable
to be able to tweak cost values on a per-connection basis or for
individual queries.

I don't mean to imply that a 'pg_joinam' table is a bad idea, just that
I'd think of it being defined in terms of what capabilities it requires
of operators and a way for costing to be calculated for it, plus the
actual functions which it provides to implement the join itself (to
include some way to get output suitable for explain, etc..).

> * pg_scanam catalog table for "scan methods" with a scan method API
> Again, a list of operators that can be used with it, like indexes and
> operator classes

Ditto for this- but there's lots of other things this makes me wonder
about because it's essentially trying to define a pluggable storage
layer, which is great, but also requires some way to deal with all of
things we use our storage system for: cacheing / shared buffers,
locking, visibility, WAL, unique identifier / ctid (for use in indexes,
etc)...

> By analogy to existing mechanisms, we would want
> 
> * A USERSET mechanism to allow users to turn it off for testing or
> otherwise, at user, database level

If we re-implement our existing components through this ("eat our own
dogfood" as it were), I'm not sure that we'd be able to have a way to
turn it on/off..  I realize we wouldn't have to, but then it seems like
we'd have two very different code paths and likely a different level of
support / capability afforded to "external" storage systems and then I
wonder if we're not back to just FDWs again..

> We would also want
> 
> * A startup call that allows us to confirm it is available and working
> correctly, possibly with some self-test for hardware, performance
> confirmation/derivation of planning parameters

Yeah, we'd need this for anything that supports a GPU, regardless of how
we implement it, I'd think.

> * Some kind of trace mode that would allow people to confirm the
> outcome of calls

Seems like this would be useful independently of the rest..

> * Some interface to the stats system so we could track the frequency
> of usage of each join/scan type. This would be done within Postgres,
> tracking the calls by name, rather than trusting the plugin to do it
> for us

This is definitely something I want for core already...

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 1:51 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> OK.  It still seems to me that there's a doc fix needed here, if
>> nothing else.  The documentation for PQputCopyEnd() clearly implies
>> that if you get a return value of 1, the message is sent, and that's
>> just not true.
>
> That's fair.

Something like the attached?

I don't really know what to say about 0, since it's never actually
returned, so I left that out.  But I'm up for suggestions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index e10ccec..d685894 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5025,12 +5025,12 @@ int PQputCopyEnd(PGconn *conn,
   
 
   
-   The result is 1 if the termination data was sent, zero if it was
-   not sent because the attempt would block (this case is only possible
-   if the connection is in nonblocking mode), or -1 if an error
-   occurred.  (Use PQerrorMessage to retrieve
-   details if the return value is -1.  If the value is zero, wait for
-   write-ready and try again.)
+   The result is 1 if the termination data was sent, or if the socket
+   is in non-blocking mode and the data was queued but could not be sent
+   because the attempt would block.  (In non-blocking mode, to be certain
+   that the data has been sent, you should call PQflush
+   until it returns zero.)  The result is -1 if an error has occurred.
+   (Use PQerrorMessage to retrieve details.)
   
 
   

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Alvaro Herrera

Tomas Vondra wrote:

> H, with CLOBBER_CACHE_ALWAYS + CLOBBER_FREED_MEMORY the tests take
> ~20h on a single branch/animal. With a single locale (e.g. "C") it would
> take ~4h, but we're testing a bunch of additional czech/slovak locales.
> 
> The tests are running in sequence (magpie->treepie->fulmar) so with all
> 6 branches, this would take ~14 days to complete. I don't mind the
> machine is running tests 100% of the time, that's why it's in buildfarm,
> but I'd rather see the failures soon after the commit (and two weeks is
> well over the "soon" edge, IMHO).
> 
> So I'm thinking about how to improve this. I'd like to keep the options
> for all the branches (e.g. not just HEAD, as a few other animals do).

I think doing the CLOBBER_CACHE_ALWAYS + CLOBBER_FREED_MEMORY runs in a
separate set of animals that does a single locale is enough.  It's
pretty dubious that there is any point in doing the CLOBBER stuff in
multiple locales.

Perhaps run these in HEAD only once a day, and do the other branches
once a week, or something like that.

If you add a seventh animal that does the recursive clobber thingy,
running say once a month, that could be useful too.  Since you have
spare CPUs, it shouldn't matter if it takes a whole week to run.

The advantage of doing this in separate animals is that you can run them
in parallel with your regular non-CLOBBER animals (which would run more
frequently.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-08 Thread Peter Geoghegan

On Thu, May 8, 2014 at 10:50 AM, MauMau  wrote:
> Could you add the following item, client-only installation on Windows, if
> it's appropriate for release note?


I think that it is appropriate.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-08 Thread Jaime Casanova

On Thu, May 8, 2014 at 10:41 AM, Robert Haas  wrote:
> On Thu, May 8, 2014 at 2:31 AM, Jaime Casanova  wrote:
>> On Wed, May 7, 2014 at 10:52 PM, Amit Kapila  wrote:
>>> On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova  
>>> wrote:
 Hi,

 This patch implements $subject only when ANALYZE and VERBOSE are on.
 I made it that way because for years nobody seemed interested in this
 info (at least no one did it) so i decided that maybe is to much
 information for most people (actually btree indexes are normally very
 fast).
>>>
>>>
> I would have expected the information about index maintenance times to
> be associated with the Insert node, not the plan overall.  IIUC, you
> could have more than one such node if, for example, there are
> writeable CTEs involved.
>

i followed how trigger info is showed in explain. I can try to change
it, if that is what most people prefer.

   QUERY PLAN
-
 Insert on pgbench_accounts (actual time=0.249..0.249 rows=0 loops=1)
   CTE results
 ->  Insert on pgbench_accounts pgbench_accounts_1 (actual
time=0.152..0.159 rows=1 loops=1)
   ->  Result (actual time=0.003..0.004 rows=1 loops=1)
   ->  CTE Scan on results (actual time=0.165..0.174 rows=1 loops=1)
 Trigger trg1 on pgbench_accounts: time=0.033 calls=1
 Trigger trg1 on pgbench_accounts: time=0.059 calls=1
 Total runtime: 0.377 ms
(8 rows)


-- 
Jaime Casanova www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566 Cell: +593 987171157


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Tom Lane

Robert Haas  writes:
> OK.  It still seems to me that there's a doc fix needed here, if
> nothing else.  The documentation for PQputCopyEnd() clearly implies
> that if you get a return value of 1, the message is sent, and that's
> just not true.

That's fair.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.4 release notes

2014-05-08 Thread MauMau


From: "Bruce Momjian" 

I have completed the initial version of the 9.4 release notes.  You can
view them here:

http://www.postgresql.org/docs/devel/static/release-9-4.html

Feedback expected and welcomed.  I expect to be modifying this until we
release 9.4 final.  I have marked items where I need help with question
marks.


Could you add the following item, client-only installation on Windows, if 
it's appropriate for release note?  It will be useful for those like 
EnterpriseDB who develop products derived from PostgreSQL.


https://commitfest.postgresql.org/action/patch_view?id=1326

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 12:53 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> What I'm now thinking I need to do is something like this:
>
>> 1. If PQputCopyEnd returns -1, error.
>> 2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
>> wait for socket to become read-ready or write-ready; } }
>> 3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
>> 4. PQgetResult()
>
>> Does that sound right?
>
> Yeah, more or less --- I think you need a PQconsumeInput there somewhere.
>
> There is a PQflush call in PQconsumeInput that is intended to keep clients
> from having to do that for themselves; but I'm not sure that it helps,
> since clients probably only call PQconsumeInput when the socket is
> read-ready --- and it wouldn't be in this situation.

OK.  It still seems to me that there's a doc fix needed here, if
nothing else.  The documentation for PQputCopyEnd() clearly implies
that if you get a return value of 1, the message is sent, and that's
just not true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 12:14 PM, Tom Lane wrote:

Andrew Dunstan  writes:

I'm pretty sure we need this on Mingw - this SYSTEMQUOTE stuff dates
back well before 8.3, IIRC, which is when we first got full MSVC support.

I tried googling for some info on this, and got a number of hits
suggesting that mingw didn't emulate popen at all till pretty recently.
For instance this:
https://lists.fedoraproject.org/pipermail/fedora-mingw/2009-September/002087.html
Jones is an ex-coworker of mine, and I'm pretty sure that if he said
it wasn't there then it wasn't there.

So I'm confused about how this worked at all in mingw back-when.



Mingw didn't, possibly, but we didn't rely on that. The MS C runtime 
library (msvcrt.dll_ has had _popen() forever, and we had this for 
Windows in PG 8.0 and have it still in all the back branches AFAICT:


   #define popen(a,b) _popen(a,b)
   #define pclose(a) _pclose(a)


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Tom Lane

Robert Haas  writes:
> What I'm now thinking I need to do is something like this:

> 1. If PQputCopyEnd returns -1, error.
> 2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
> wait for socket to become read-ready or write-ready; } }
> 3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
> 4. PQgetResult()

> Does that sound right?

Yeah, more or less --- I think you need a PQconsumeInput there somewhere.

There is a PQflush call in PQconsumeInput that is intended to keep clients
from having to do that for themselves; but I'm not sure that it helps,
since clients probably only call PQconsumeInput when the socket is
read-ready --- and it wouldn't be in this situation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-08 Thread Andres Freund

On 2014-05-05 18:50:59 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> >> Looks all right to me.  Yeah, the right shift might have undefined
> >> high-order bits, but we don't care because we're storing the result
> >> into an int16.
> 
> > Doesn't at the very least
> > rnode.backend = (msg->sm.backend_hi << 16) | (int) 
> > msg->sm.backend_lo;
> > have to be
> > rnode.backend = ((int) msg->sm.backend_hi << 16) | (int) 
> > msg->sm.backend_lo;
> 
> A promotion to int (or wider) is implicit in any arithmetic operation,
> last I checked the C standard.  The "(int)" on the other side isn't
> necessary either.

Done in the attached patch.

> We might actually be better off casting both sides to unsigned int,
> just to enforce that the left shifting is done with unsigned semantics.
> 
> >>> c) The ProcessMessageList() calls access msg->rc.id to test for the type
> >>> of the existing message. That's not nice.
> 
> >> Huh?
> 
> > The checks should access msg->id, not msg->rc.id...
> 
> Meh.  If they're not the same thing, we've got big trouble anyway.
> But if you want to change it as a cosmetic thing, no objection.

Changed as well.

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > a) SICleanupQueue() sometimes releases and reacquires the write lock
> >held on the outside. That's pretty damn fragile, not to mention
> >ugly. Even slight reformulations of the code in SIInsertDataEntries()
> >can break this... Can we please extend the comment in the latter that
> >to mention the lock dropping explicitly?
> 
> Want to write a better comment?

Edited the comment and, in a perhaps debatable fashion, moved some
variable declarations + volatile into the retry loop for some added
cozyness. If the compiler inlines the cleanup function it could very
well decide to fetch some of the variables only once.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From a24973508710c965035fca45e933823ce49a5a4f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 8 May 2014 16:13:44 +0200
Subject: [PATCH] Minor cleanups and comment improvements in the cache
 invalidation code.

---
 src/backend/storage/ipc/sinvaladt.c | 11 ---
 src/backend/utils/cache/inval.c |  9 +
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 0328660..3966ccc 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -453,7 +453,6 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
 	while (n > 0)
 	{
 		int			nthistime = Min(n, WRITE_QUANTUM);
-		int			numMsgs;
 		int			max;
 		int			i;
 
@@ -466,11 +465,17 @@ SIInsertDataEntries(const SharedInvalidationMessage *data, int n)
 		 * queue and reset anyone who is preventing space from being freed.
 		 * Otherwise, clean the queue only when it's exceeded the next
 		 * fullness threshold.  We have to loop and recheck the buffer state
-		 * after any call of SICleanupQueue.
+		 * after any call of SICleanupQueue because the cleanup sometimes
+		 * require releasing the write lock acquired above.
 		 */
 		for (;;)
 		{
-			numMsgs = segP->maxMsgNum - segP->minMsgNum;
+			int numMsgs;
+			/* use volatile pointer to prevent code rearrangement */
+			volatile SISeg *vsegP = segP;
+
+			numMsgs = vsegP->maxMsgNum - vsegP->minMsgNum;
+
 			if (numMsgs + nthistime > MAXNUMMESSAGES ||
 numMsgs >= segP->nextThreshold)
 SICleanupQueue(true, nthistime);
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index dd46e18..4c3197b 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -374,7 +374,7 @@ AddRelcacheInvalidationMessage(InvalidationListHeader *hdr,
 	/* Don't add a duplicate item */
 	/* We assume dbId need not be checked because it will never change */
 	ProcessMessageList(hdr->rclist,
-	   if (msg->rc.id == SHAREDINVALRELCACHE_ID &&
+	   if (msg->id == SHAREDINVALRELCACHE_ID &&
 		   msg->rc.relId == relId)
 	   return);
 
@@ -400,7 +400,7 @@ AddSnapshotInvalidationMessage(InvalidationListHeader *hdr,
 	/* Don't add a duplicate item */
 	/* We assume dbId need not be checked because it will never change */
 	ProcessMessageList(hdr->rclist,
-	   if (msg->sn.id == SHAREDINVALSNAPSHOT_ID &&
+	   if (msg->id == SHAREDINVALSNAPSHOT_ID &&
 		   msg->sn.relId == relId)
 	   return);
 
@@ -580,7 +580,8 @@ LocalExecuteInvalidationMessage(SharedInvalidationMessage *msg)
 		RelFileNodeBackend rnode;
 
 		rnode.node = msg->sm.rnode;
-		rnode.backend = (msg->sm.backend_hi << 16) | (int) msg->sm.backend_lo;
+		rnode.backend = (BackendId) (((uint32) msg->sm.backend_hi << 16)
+	 |(uint32) msg->sm.backend_lo);
 		smgrclosenode(rnode);
 	}
 	els

Re: [HACKERS] Recursive ReceiveSharedInvalidMessages not safe

2014-05-08 Thread Andres Freund

On 2014-05-05 15:41:22 -0400, Tom Lane wrote:
> > After far, far too much confused head scratching, code reading, random
> > elog()s et al I found out that this is just because of a deficiency in
> > valgrind's undefinedness tracking. [...]
> > Unfortunately this cannot precisely be caught by valgrind's
> > suppressions. Thus I'd like to add memset(SharedInvalidationMessage msg,
> > 0) in AddCatcacheInvalidationMessage() et al. to suppress these
> > warnings. Imo we can just add them unconditionally, but if somebody else
> > prefers we can add #ifdef USE_VALGRIND around them.
> 
> I'd be okay with USE_VALGRIND.  I'm not particularly hot on adding a
> memset for everybody just to make valgrind less confused.  Especially
> since that's really going to hide any problems, not fix them.

The attached patch does VALGRIND_MAKE_MEM_DEFINED() on the relevant
structs. That's already defined to empty if USE_VALGRIND isn't defined.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 4da7b0caa2058c8b07ee4a7ab529b9ca0c292bcf Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Wed, 7 May 2014 22:30:05 +0200
Subject: [PATCH] Silence a spurious valgrind warning in inval.c.

For valgrind's sake explicitly define all bytes of
SharedInvalidationMessage as defined before sending them. Otherwise
valgrind sometimes treats bytes as undefined that aren't because
valgrind doesn't understand that the ringbuffer in sinvaladt.c is
accesses by multiple processes. Valgrind remembers that it stored a
undefined byte into parts of a slot in the ringbuffer and warns when
it uses that byte again - not realizing it has been stored by another
process.
---
 src/backend/utils/cache/inval.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index 5971469..dd46e18 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -103,6 +103,7 @@
 #include "storage/smgr.h"
 #include "utils/catcache.h"
 #include "utils/inval.h"
+#include "utils/memdebug.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/relmapper.h"
@@ -332,6 +333,14 @@ AddCatcacheInvalidationMessage(InvalidationListHeader *hdr,
 	msg.cc.id = (int8) id;
 	msg.cc.dbId = dbId;
 	msg.cc.hashValue = hashValue;
+	/*
+	 * Define padding bytes to be defined, otherwise the sinvaladt.c
+	 * ringbuffer, which is accessed by multiple processes, will cause false
+	 * positives because valgrind remembers the undefined bytes from this
+	 * processes store, not realizing that another process has written since.
+	 */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	AddInvalidationMessage(&hdr->cclist, &msg);
 }
 
@@ -347,6 +356,9 @@ AddCatalogInvalidationMessage(InvalidationListHeader *hdr,
 	msg.cat.id = SHAREDINVALCATALOG_ID;
 	msg.cat.dbId = dbId;
 	msg.cat.catId = catId;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	AddInvalidationMessage(&hdr->cclist, &msg);
 }
 
@@ -370,6 +382,9 @@ AddRelcacheInvalidationMessage(InvalidationListHeader *hdr,
 	msg.rc.id = SHAREDINVALRELCACHE_ID;
 	msg.rc.dbId = dbId;
 	msg.rc.relId = relId;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	AddInvalidationMessage(&hdr->rclist, &msg);
 }
 
@@ -393,6 +408,9 @@ AddSnapshotInvalidationMessage(InvalidationListHeader *hdr,
 	msg.sn.id = SHAREDINVALSNAPSHOT_ID;
 	msg.sn.dbId = dbId;
 	msg.sn.relId = relId;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	AddInvalidationMessage(&hdr->rclist, &msg);
 }
 
@@ -1242,6 +1260,9 @@ CacheInvalidateSmgr(RelFileNodeBackend rnode)
 	msg.sm.backend_hi = rnode.backend >> 16;
 	msg.sm.backend_lo = rnode.backend & 0x;
 	msg.sm.rnode = rnode.node;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	SendSharedInvalidMessages(&msg, 1);
 }
 
@@ -1267,6 +1288,9 @@ CacheInvalidateRelmap(Oid databaseId)
 
 	msg.rm.id = SHAREDINVALRELMAP_ID;
 	msg.rm.dbId = databaseId;
+	/* check AddCatcacheInvalidationMessage() for an explanation */
+	VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
+
 	SendSharedInvalidMessages(&msg, 1);
 }
 
-- 
1.8.5.rc2.dirty


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Fix harmless access to uninitialized memory in ri_triggers.c.

2014-05-08 Thread andres

From: Andres Freund 

When cache invalidations arrive while ri_LoadConstraintInfo() is busy
filling a new cache entry, InvalidateConstraintCacheCallBack()
compares the - not yet initialized - oidHashValue field with the
to-be-invalidated hash value. To fix check whether the entry is
already marked as invalid.
---
 src/backend/utils/adt/ri_triggers.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index d30847b..e4d7b2c 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -2934,7 +2934,8 @@ InvalidateConstraintCacheCallBack(Datum arg, int cacheid, 
uint32 hashvalue)
hash_seq_init(&status, ri_constraint_cache);
while ((hentry = (RI_ConstraintInfo *) hash_seq_search(&status)) != 
NULL)
{
-   if (hashvalue == 0 || hentry->oidHashValue == hashvalue)
+   if (hentry->valid &&
+   (hashvalue == 0 || hentry->oidHashValue == hashvalue))
hentry->valid = false;
}
 }
-- 
1.8.5.rc2.dirty



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A couple logical decoding fixes/patches

2014-05-08 Thread Andres Freund

Hi,

Patch 01: Fix a couple of embarassing typos. Most of them harmless, but
one isn't and can lead to crashing or decoding wrong data.

Patch 02: Don't crash with an Assert() failure if wal_level=logical but
max_replication_slots=0.

Patch 03: Add valgrind suppression for writing out padding bytes. That's
better than zeroing the data from the get go because unitialized
accesses are still detected.

Details are in the commit messages of the individual patches.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From 9987ff3fead5cee0b9c7da94c8d8a665015be7c2 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 8 May 2014 17:00:06 +0200
Subject: [PATCH 1/3] Fix a couple of brown paper bag typos in the logical
 decoding code.

Most of these were fairly harmless but the typos in
ReorderBufferSerializeChange() could lead to crashes or wrong data
being decoded.

The ReorderBufferSerializeChange() bug was encountered by Christian
Kruse; the rest were found during a review for similar sillyness.
---
 src/backend/replication/logical/reorderbuffer.c | 13 +
 src/backend/replication/logical/snapbuild.c | 11 +--
 2 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 7f2bbca..f96e3e1 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -2064,15 +2064,15 @@ ReorderBufferSerializeChange(ReorderBuffer *rb, ReorderBufferTXN *txn,
 if (snap->xcnt)
 {
 	memcpy(data, snap->xip,
-		   sizeof(TransactionId) + snap->xcnt);
-	data += sizeof(TransactionId) + snap->xcnt;
+		   sizeof(TransactionId) * snap->xcnt);
+	data += sizeof(TransactionId) * snap->xcnt;
 }
 
 if (snap->subxcnt)
 {
 	memcpy(data, snap->subxip,
-		   sizeof(TransactionId) + snap->subxcnt);
-	data += sizeof(TransactionId) + snap->subxcnt;
+		   sizeof(TransactionId) * snap->subxcnt);
+	data += sizeof(TransactionId) * snap->subxcnt;
 }
 break;
 			}
@@ -2168,15 +2168,12 @@ ReorderBufferRestoreChanges(ReorderBuffer *rb, ReorderBufferTXN *txn,
 
 		}
 
-		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
-
-
 		/*
 		 * Read the statically sized part of a change which has information
 		 * about the total size. If we couldn't read a record, we're at the
 		 * end of this file.
 		 */
-
+		ReorderBufferSerializeReserve(rb, sizeof(ReorderBufferDiskChange));
 		readBytes = read(*fd, rb->outbuf, sizeof(ReorderBufferDiskChange));
 
 		/* eof */
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index cb45f90..f00fd7d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -307,8 +307,7 @@ AllocateSnapshotBuilder(ReorderBuffer *reorder,
 	builder->committed.xip =
 		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
 	builder->committed.includes_all_transactions = true;
-	builder->committed.xip =
-		palloc0(builder->committed.xcnt_space * sizeof(TransactionId));
+
 	builder->initial_xmin_horizon = xmin_horizon;
 	builder->transactions_after = start_lsn;
 
@@ -1691,7 +1690,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 
 	/* restore running xacts information */
 	sz = sizeof(TransactionId) * ondisk.builder.running.xcnt_space;
-	ondisk.builder.running.xip = MemoryContextAlloc(builder->context, sz);
+	ondisk.builder.running.xip = MemoryContextAllocZero(builder->context, sz);
 	readBytes = read(fd, ondisk.builder.running.xip, sz);
 	if (readBytes != sz)
 	{
@@ -1705,7 +1704,7 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 
 	/* restore committed xacts information */
 	sz = sizeof(TransactionId) * ondisk.builder.committed.xcnt;
-	ondisk.builder.committed.xip = MemoryContextAlloc(builder->context, sz);
+	ondisk.builder.committed.xip = MemoryContextAllocZero(builder->context, sz);
 	readBytes = read(fd, ondisk.builder.committed.xip, sz);
 	if (readBytes != sz)
 	{
@@ -1763,10 +1762,10 @@ SnapBuildRestore(SnapBuild *builder, XLogRecPtr lsn)
 	}
 	ondisk.builder.committed.xip = NULL;
 
-	builder->running.xcnt = ondisk.builder.committed.xcnt;
+	builder->running.xcnt = ondisk.builder.running.xcnt;
 	if (builder->running.xip)
 		pfree(builder->running.xip);
-	builder->running.xcnt_space = ondisk.builder.committed.xcnt_space;
+	builder->running.xcnt_space = ondisk.builder.running.xcnt_space;
 	builder->running.xip = ondisk.builder.running.xip;
 
 	/* our snapshot is not interesting anymore, build a new one */
-- 
1.8.5.rc2.dirty

>From 863589478ce185cba02e8ad5a4ff2a8b79588cb0 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Thu, 8 May 2014 18:02:20 +0200
Subject: [PATCH 2/3] Remove overeager assertion in wal_level=logical specific
 code.

The assertion tri

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 12:09 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> According to the documentation for PQputCopyEnd:
>>> The result is 1 if the termination data was sent, zero if it was not sent 
>>> because the attempt would block (this case is only possible if the 
>>> connection is in
>>> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to 
>>> retrieve details if the return value is -1. If the value is zero, wait for 
>>> write-ready and try again.)
>
>> However, pqPutCopyEnd contains no return statement that can ever
>> possibly return 0.
>
> IIRC, for some of the libpq functions, the API spec was intended to allow
> for future async behavior that wasn't actually implemented.  So the mere
> fact that it can't return zero today is not a problem.  If it might block
> instead, then we have a problem, but that's not what you're saying.
>
> Also, what the API spec says is that clients should retry PQputCopyEnd
> on a zero return.  We do *not* want them to do that here; once we've
> changed asyncStatus, a retry would report an error.  So the API spec
> in this case is intended to require a retry loop that the current
> implementation doesn't actually need, but that conceivably could be
> needed after some future refactoring.  In particular, we might want
> to fix the code so that it returns zero if it fails to queue the
> COPY END message at all.

Yeah, good point.  I think what confused me is the fact that the
documentation says that when PQputCopyEnd() returns 1, that means that
"the termination data was sent".  So my application sat there and
waited for a server response that never showed up, because in fact the
termination data had *not* been sent.

What I'm now thinking I need to do is something like this:

1. If PQputCopyEnd returns -1, error.
2. while ((rc = PQflush(conn)) != 0) { if (rc < 0) { error; } else {
wait for socket to become read-ready or write-ready; } }
3. while (PQisBusy(conn)) { wait for the socket to become read-ready; }
4. PQgetResult()

Does that sound right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sending out a request for more buildfarm animals?

2014-05-08 Thread Tomas Vondra

On 6.5.2014 23:01, Tomas Vondra wrote:
> On 6.5.2014 22:24, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> I recall there was a call for more animals with CLOBBER_CACHE_ALWAYS
>>> some time ago, so I went and enabled that on all three animals. Let's
>>> see how long that will take.
>>
>>> I see there are more 'clobber' options in the code: CLOBBER_FREED_MEMORY
>>> and CLOBBER_CACHE_RECURSIVELY. Would that be a good idea to enable these
>>> as well?
>>
>>> The time requirements will be much higher (especially for the
>>> RECURSIVELY option), but running that once a week shouldn't be a big
>>> deal - the machine is pretty much dedicated to the buildfarm.
>>
>> I've never had the patience to run the regression tests to completion
>> with CLOBBER_CACHE_RECURSIVELY at all, let alone do it on a regular 
>> basis. (I wonder if there's some easy way to run it for just a few 
>> regression tests...)
> 
> Now, that's a challenge ;-)
> 
>>
>> I think testing CLOBBER_FREED_MEMORY would be sensible though.
> 
> OK, I've enabled this for now.

H, with CLOBBER_CACHE_ALWAYS + CLOBBER_FREED_MEMORY the tests take
~20h on a single branch/animal. With a single locale (e.g. "C") it would
take ~4h, but we're testing a bunch of additional czech/slovak locales.

The tests are running in sequence (magpie->treepie->fulmar) so with all
6 branches, this would take ~14 days to complete. I don't mind the
machine is running tests 100% of the time, that's why it's in buildfarm,
but I'd rather see the failures soon after the commit (and two weeks is
well over the "soon" edge, IMHO).

So I'm thinking about how to improve this. I'd like to keep the options
for all the branches (e.g. not just HEAD, as a few other animals do).
But I'm thinking about running the tests in parallel, somehow - the
machine has 4 cores, and most of the time only one of them is used. I
don't expect a perfect ~3x speedup, but getting ~2x would be nice.

Any recommendations how to do that? I see there's 'base_port' in the
config - is it enough to tweak this, or do I need to run separate the
animals using e.g. lxc?

regards
Tomas

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Christoph Berg

Re: Andres Freund 2014-05-08 <20140508145901.gb1...@awork2.anarazel.de>
> > Maybe this is nitpicking, but what happens when postgresql.auto.conf also
> > includes the setting of data_directory? This is possible because we can
> > set data_directory via ALTER SYSTEM now. Should we just ignore such
> > problematic setting in postgresql.auto.conf with warning message?
> 
> I think that's a case of "Doctor, it hurts when I do this. Doctor: don't
> do that then".

I'd opt to forbid setting data_directory at ALTER SYSTEM time. For the
other options, I agree with Andres that you should get to keep all
parts if you manage to break it.

Fortunately, ALTER SYSTEM already refuses to change config_file :)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-08 Thread Tom Lane

Andrew Dunstan  writes:
> I'm pretty sure we need this on Mingw - this SYSTEMQUOTE stuff dates 
> back well before 8.3, IIRC, which is when we first got full MSVC support.

I tried googling for some info on this, and got a number of hits
suggesting that mingw didn't emulate popen at all till pretty recently.
For instance this:
https://lists.fedoraproject.org/pipermail/fedora-mingw/2009-September/002087.html
Jones is an ex-coworker of mine, and I'm pretty sure that if he said
it wasn't there then it wasn't there.

So I'm confused about how this worked at all in mingw back-when.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Tom Lane

Robert Haas  writes:
> According to the documentation for PQputCopyEnd:
>> The result is 1 if the termination data was sent, zero if it was not sent 
>> because the attempt would block (this case is only possible if the 
>> connection is in
>> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to 
>> retrieve details if the return value is -1. If the value is zero, wait for 
>> write-ready and try again.)

> However, pqPutCopyEnd contains no return statement that can ever
> possibly return 0.

IIRC, for some of the libpq functions, the API spec was intended to allow
for future async behavior that wasn't actually implemented.  So the mere
fact that it can't return zero today is not a problem.  If it might block
instead, then we have a problem, but that's not what you're saying.

Also, what the API spec says is that clients should retry PQputCopyEnd
on a zero return.  We do *not* want them to do that here; once we've
changed asyncStatus, a retry would report an error.  So the API spec
in this case is intended to require a retry loop that the current
implementation doesn't actually need, but that conceivably could be
needed after some future refactoring.  In particular, we might want
to fix the code so that it returns zero if it fails to queue the
COPY END message at all.

> pqFlush() returns 0 if no data is waiting to be sent, or otherwise the
> return value of pqSendSome().  pqSendSome() returns -1 if an error
> occurs, 0 if all data is sent, or 1 if some data was sent but the
> socket is non-blocking and the caller must try again later.  It seems
> to me that when pqSendSome() returns 1, pqPutCopyEnd ought to return 0
> in order to meet its API contract - and then the client, presumably,
> should repeatedly wait for the socket to become write-ready and then
> try PQflush() until PQflush() returns non-zero.

That would be a change in the API spec, not just the implementation,
and it's not clear to me that it would actually be helpful to any
clients.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 7 May 2014 02:05, Kouhei Kaigai  wrote:

> (1) DDL support and system catalog
>
> Simon suggested that DDL command should be supported to track custom-
> plan providers being installed, and to avoid nonsense hook calls
> if it is an obvious case that custom-plan provider can help. It also
> makes sense to give a chance to load extensions once installed.
> (In the previous design, I assumed modules are loaded by LOAD command
> or *_preload_libraries parameters).

I've tried hard to bend my mind to this and its beginning to sink in.

We've already got pg_am for indexes, and soon to have pg_seqam for sequences.

It would seem normal and natural to have

* pg_joinam catalog table for "join methods" with a join method API
Which would include some way of defining which operators/datatypes we
consider this for, so if PostGIS people come up with some fancy GIS
join thing, we don't invoke it every time even when its inapplicable.
I would prefer it if PostgreSQL also had some way to control when the
joinam was called, possibly with some kind of table_size_threshold on
the AM tuple, which could be set to >=0 to control when this was even
considered.

* pg_scanam catalog table for "scan methods" with a scan method API
Again, a list of operators that can be used with it, like indexes and
operator classes

By analogy to existing mechanisms, we would want

* A USERSET mechanism to allow users to turn it off for testing or
otherwise, at user, database level

We would also want

* A startup call that allows us to confirm it is available and working
correctly, possibly with some self-test for hardware, performance
confirmation/derivation of planning parameters

* Some kind of trace mode that would allow people to confirm the
outcome of calls

* Some interface to the stats system so we could track the frequency
of usage of each join/scan type. This would be done within Postgres,
tracking the calls by name, rather than trusting the plugin to do it
for us

> I tried to implement the following syntax:
>
>   CREATE CUSTOM PLAN  FOR (scan|join|any) HANDLER ;

Not sure if we need that yet

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compilation errors with mingw build caused by undefined optreset

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 11:30 AM, Tom Lane wrote:

Michael Paquier  writes:

Since commit 60ff2fd introducing the centralizated getopt-related
things in a global header file, build on Windows with mingw is failing

Hm, buildfarm member narwhal doesn't seem to be having any such problem.
(It's got its own issues, but not this.)



jacana and frogmouth are also Mingw animals, and are not having issues.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 15:25, Stephen Frost  wrote:
> * Simon Riggs (si...@2ndquadrant.com) wrote:
>> On 8 May 2014 14:40, Stephen Frost  wrote:
>> > Allow me to re-state your suggestion here:
>> >
>> > An FDW is loaded which provides hook for join push-down (whatever those
>> > end up being).
>> >
>> > A query is run which joins *local* table A to *local* table B.  Standard
>> > heaps, standard indexes, all local to this PG instance.
>> >
>> > The FDW which supports join push-down is then passed this join for
>> > planning, with local sub-plans for the local tables.
>>
>> Yes that is correct; thank you for confirming your understanding with me.
>
> I guess for my part, that doesn't look like an FDW any more.

If it works, it works. If it doesn't, we can act otherwise.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [WIP] showing index maintenance on EXPLAIN

2014-05-08 Thread Robert Haas

On Thu, May 8, 2014 at 2:31 AM, Jaime Casanova  wrote:
> On Wed, May 7, 2014 at 10:52 PM, Amit Kapila  wrote:
>> On Thu, May 8, 2014 at 5:30 AM, Jaime Casanova  wrote:
>>> Hi,
>>>
>>> This patch implements $subject only when ANALYZE and VERBOSE are on.
>>> I made it that way because for years nobody seemed interested in this
>>> info (at least no one did it) so i decided that maybe is to much
>>> information for most people (actually btree indexes are normally very
>>> fast).
>>
>>
>> Why to capture only for Index Insert/Update and not for Read; is it
>> because Read will be always fast ot implementation complexity?
>>
>
> EXPLAIN ANALYZE already shows that on any SELECT that uses an index in
> some way. Or are you thinking on something else?
>
>> Why not similar timings for heap?
>>
>
> well "actual time" shows us total time of the operation so just
> deducting the time spent on triggers, indexes and planning seems like
> a way to get "heap modification time".
>
> yes, maybe we still need some additional data. for example, i could
> want to know how much time we spent extending a relation.
>
>> Why can't we print when only Analyze is used with Explain, the
>> execution time is printed with Analyze option?
>>
>
> i'm not sure the info is useful for everyone, i'm not opposed to show
> it all the time though
>
>> Could you please tell in what all kind of scenario's, do you expect it
>> to be useful?
>> One I could think is that if there are multiple indexes on a table and user
>> wants to find out if any particular index is consuming more time.
>>
>
> exactly my use case. consider this plan (we spent 78% of the time
> updating the index uniq_idx_on_text):
>
>QUERY PLAN
> 
>  Insert on public.t1 (actual time=0.540..0.540 rows=0 loops=1)
>->  Result (actual time=0.046..0.049 rows=1 loops=1)
>  Output: 
>  Index uniq_idx_on_text on t1: time=0.421 rows=1
>  Index t1_pkey on t1: time=0.027 rows=1
>  Total runtime: 0.643 ms
> (6 rows)

I would have expected the information about index maintenance times to
be associated with the Insert node, not the plan overall.  IIUC, you
could have more than one such node if, for example, there are
writeable CTEs involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-08 Thread Andrew Dunstan



On 05/08/2014 11:19 AM, Tom Lane wrote:

Heikki Linnakangas  writes:

On 05/08/2014 08:01 AM, Michael Paquier wrote:

Since commit a692ee5, code compilation on windows is full of warnings
caused by the re-definitions of popen and pclose:

Hmm. Does the MinGW version of popen() and system() do the quoting for
you? If we just #ifdef the defines, then we will not use the wrappers on
MinGW, which would be wrong if the quoting is needed there. If it's not
needed, then we shouldn't be compiling the wrapper functions in the
first place.

Another problem, if we do need the wrappers on mingw, is that the
"#undef" commands in system.c will presumably result in the wrong
things happening in the wrapper functions, since the platform needs
us to use their macros there.




I'm pretty sure we need this on Mingw - this SYSTEMQUOTE stuff dates 
back well before 8.3, IIRC, which is when we first got full MSVC support.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Compilation errors with mingw build caused by undefined optreset

2014-05-08 Thread Tom Lane

Michael Paquier  writes:
> Since commit 60ff2fd introducing the centralizated getopt-related
> things in a global header file, build on Windows with mingw is failing

Hm, buildfarm member narwhal doesn't seem to be having any such problem.
(It's got its own issues, but not this.)

Do you think you could put up a buildfarm critter using whatever version
of mingw you're using?  Without buildfarm feedback, things *will* break
regularly; Windows is just too weird to expect otherwise.

> because of some declarations of HAVE_INT_OPTRESET causing optreset to
> become undefined:
> This failure is new with 9.4, and attached is a patch fixing it...

I'm a bit suspicious of this patch because of the comment in pg_getopt.h
saying that cygwin doesn't want those variables to be declared.
We can try it, but the lack of up-to-date cygwin members in the buildfarm
means we won't be real sure whether it breaks cygwin.

Of course, I guess the response to any complaints can be "please put
up a buildfarm member" ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] popen and pclose redefinitions causing many warning in Windows build

2014-05-08 Thread Tom Lane

Heikki Linnakangas  writes:
> On 05/08/2014 08:01 AM, Michael Paquier wrote:
>> Since commit a692ee5, code compilation on windows is full of warnings
>> caused by the re-definitions of popen and pclose:

> Hmm. Does the MinGW version of popen() and system() do the quoting for 
> you? If we just #ifdef the defines, then we will not use the wrappers on 
> MinGW, which would be wrong if the quoting is needed there. If it's not 
> needed, then we shouldn't be compiling the wrapper functions in the 
> first place.

Another problem, if we do need the wrappers on mingw, is that the
"#undef" commands in system.c will presumably result in the wrong
things happening in the wrapper functions, since the platform needs
us to use their macros there.

The simplest workaround I can think of is to change the stanza in
port.h to be like

   #ifndef DONT_DEFINE_SYSTEM_POPEN
   #undef system
   #define system(a) pgwin32_system(a)
   #undef popen
   #define popen(a,b) pgwin32_popen(a,b)
   #undef pclose
   #define pclose(a) _pclose(a)
   #endif

and then have system.c do #define DONT_DEFINE_SYSTEM_POPEN before
including postgres.h.  This is pretty grotty, but on the other hand
it'd remove the need for the #undef's in system.c.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PQputCopyEnd doesn't adhere to its API contract

2014-05-08 Thread Robert Haas

According to the documentation for PQputCopyEnd:

> The result is 1 if the termination data was sent, zero if it was not sent 
> because the attempt would block (this case is only possible if the connection 
> is in
> nonblocking mode), or -1 if an error occurred. (Use PQerrorMessage to 
> retrieve details if the return value is -1. If the value is zero, wait for 
> write-ready and try again.)

However, pqPutCopyEnd contains no return statement that can ever
possibly return 0.  I think the problem is approximately here:

/* Try to flush data */
if (pqFlush(conn) < 0)
return -1;

pqFlush() returns 0 if no data is waiting to be sent, or otherwise the
return value of pqSendSome().  pqSendSome() returns -1 if an error
occurs, 0 if all data is sent, or 1 if some data was sent but the
socket is non-blocking and the caller must try again later.  It seems
to me that when pqSendSome() returns 1, pqPutCopyEnd ought to return 0
in order to meet its API contract - and then the client, presumably,
should repeatedly wait for the socket to become write-ready and then
try PQflush() until PQflush() returns non-zero.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Andres Freund

On 2014-05-08 22:21:43 +0900, Fujii Masao wrote:
> On Wed, May 7, 2014 at 4:57 PM, Amit Kapila  wrote:
> > On Tue, May 6, 2014 at 7:04 PM, Christoph Berg  wrote:
> >> Hi,
> >>
> >> if you split configuration and data by setting data_directory,
> >> postgresql.auto.conf is writen to the data directory
> >> (/var/lib/postgresql/9.4/main in Debian), but tried to be read from
> >> the etc directory (/etc/postgresql/9.4/main).
> >>
> >> One place to fix it would be in ProcessConfigFile in
> >> src/backend/utils/misc/guc-file.l:162 by always setting
> >> CallingFileName = NULL in src/backend/utils/misc/guc-file.l:162, but
> >> that breaks later in AbsoluteConfigLocation() when data_directory is
> >> NULL. (As the comment in ProcessConfigFile says.)
> >
> > This problem occurs because we don't have the value of data_directory
> > set in postgresql.conf by the time we want to parse .auto.conf file
> > during server start.  The value of data_directory is only available after
> > processing of config files.  To fix it, we need to store the value of
> > data_directory during parse of postgresql.conf file so that we can use it
> > till data_directory is actually set.  Attached patch fixes the problem.
> > Could you please once confirm if it fixes the problem in your
> > env./scenario.
> 
> Maybe this is nitpicking, but what happens when postgresql.auto.conf also
> includes the setting of data_directory? This is possible because we can
> set data_directory via ALTER SYSTEM now. Should we just ignore such
> problematic setting in postgresql.auto.conf with warning message?

I think that's a case of "Doctor, it hurts when I do this. Doctor: don't
do that then".

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Stephen Frost

* Amit Kapila (amit.kapil...@gmail.com) wrote:
> On Thu, May 8, 2014 at 6:51 PM, Fujii Masao  wrote:
> > Should we just ignore such
> > problematic setting in postgresql.auto.conf with warning message?
> 
> Another way could be that we detect the same during server start
> (while parsing postgresql.auto.conf) and then allow for reparse of
> auto conf file, but I think that will be bit more complicated.  So lets
> try to solve it in a way suggested by you.  If there is no objection by
> anyone else then I will update the patch.

Pretty sure this is more-or-less exactly what I suggested quite a while
back during the massive ALTER SYSTEM SET thread..  There are certain
options which we shouldn't be allowing in .auto.conf.  I doubt this is
going to be the only one...

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Amit Kapila

On Thu, May 8, 2014 at 6:51 PM, Fujii Masao  wrote:
> On Wed, May 7, 2014 at 4:57 PM, Amit Kapila  wrote:
>> This problem occurs because we don't have the value of data_directory
>> set in postgresql.conf by the time we want to parse .auto.conf file
>> during server start.  The value of data_directory is only available after
>> processing of config files.  To fix it, we need to store the value of
>> data_directory during parse of postgresql.conf file so that we can use it
>> till data_directory is actually set.  Attached patch fixes the problem.
>> Could you please once confirm if it fixes the problem in your
>> env./scenario.
>
> Maybe this is nitpicking, but what happens when postgresql.auto.conf also
> includes the setting of data_directory? This is possible because we can
> set data_directory via ALTER SYSTEM now.

   Yes, that will also be issue similar to above.

> Should we just ignore such
> problematic setting in postgresql.auto.conf with warning message?

Another way could be that we detect the same during server start
(while parsing postgresql.auto.conf) and then allow for reparse of
auto conf file, but I think that will be bit more complicated.  So lets
try to solve it in a way suggested by you.  If there is no objection by
anyone else then I will update the patch.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Bruce Momjian

On Thu, May  8, 2014 at 04:37:05PM +0200, Andres Freund wrote:
> On 2014-05-08 10:34:04 -0400, Bruce Momjian wrote:
> > On Thu, May  8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > > > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > > > combo in jsonb_hash_ops.  That would give us key-only lookups without a
> > > > recheck.
> > > 
> > > No, because there's nothing in JSON limiting the length of keys, any more
> > > than values.
> > > 
> > > I think the idea of hashing only keys/values that are "too long" is a
> > > reasonable compromise.  I've not finished coding it (because I keep
> > > getting distracted by other problems in the code :-() but it does not
> > > look to be very difficult.  I'm envisioning the cutoff as being something
> > > like 128 bytes; in practice that would mean that few if any keys get
> > > hashed, I think.
> > 
> > Yes, that would be nice.  Ideally we would not be doing this so close to
> > beta, but it is what it is, and if we need to break binary compatibility
> > after beta1, at least we have pg_upgrade.
> 
> If we break binary compatibility pg_upgrade won't be able to help. Since
> the data files wont be, err, binary compatibile.

Oops, yeah.  pg_upgrade only helps with system table changes. We would
have to require users to dump/reload any changed tables or recreate any
indexs.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Andres Freund

On 2014-05-08 10:34:04 -0400, Bruce Momjian wrote:
> On Thu, May  8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > > combo in jsonb_hash_ops.  That would give us key-only lookups without a
> > > recheck.
> > 
> > No, because there's nothing in JSON limiting the length of keys, any more
> > than values.
> > 
> > I think the idea of hashing only keys/values that are "too long" is a
> > reasonable compromise.  I've not finished coding it (because I keep
> > getting distracted by other problems in the code :-() but it does not
> > look to be very difficult.  I'm envisioning the cutoff as being something
> > like 128 bytes; in practice that would mean that few if any keys get
> > hashed, I think.
> 
> Yes, that would be nice.  Ideally we would not be doing this so close to
> beta, but it is what it is, and if we need to break binary compatibility
> after beta1, at least we have pg_upgrade.

If we break binary compatibility pg_upgrade won't be able to help. Since
the data files wont be, err, binary compatibile.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Bruce Momjian

On Thu, May  8, 2014 at 10:16:54AM -0400, Tom Lane wrote:
> > Can we hash just the values, not the keys, in jsonb_ops, and hash the
> > combo in jsonb_hash_ops.  That would give us key-only lookups without a
> > recheck.
> 
> No, because there's nothing in JSON limiting the length of keys, any more
> than values.
> 
> I think the idea of hashing only keys/values that are "too long" is a
> reasonable compromise.  I've not finished coding it (because I keep
> getting distracted by other problems in the code :-() but it does not
> look to be very difficult.  I'm envisioning the cutoff as being something
> like 128 bytes; in practice that would mean that few if any keys get
> hashed, I think.

Yes, that would be nice.  Ideally we would not be doing this so close to
beta, but it is what it is, and if we need to break binary compatibility
after beta1, at least we have pg_upgrade.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 8 May 2014 14:40, Stephen Frost  wrote:
> > Allow me to re-state your suggestion here:
> >
> > An FDW is loaded which provides hook for join push-down (whatever those
> > end up being).
> >
> > A query is run which joins *local* table A to *local* table B.  Standard
> > heaps, standard indexes, all local to this PG instance.
> >
> > The FDW which supports join push-down is then passed this join for
> > planning, with local sub-plans for the local tables.
> 
> Yes that is correct; thank you for confirming your understanding with me.

I guess for my part, that doesn't look like an FDW any more.

> That also supports custom join of local to non-local table, or custom
> join of two non-local tables.

Well, we already support these, technically, but the FDW
doesn't actually implement the join, it's done in core.

> If we can use interfaces that already exist with efficiency, why
> invent a new one?

Perhaps once we have a proposal for FDW join push-down this will make
sense, but I'm not seeing it right now.

> >> Have we considered having an Optimizer and Executor plugin that does
> >> this without touching core at all?
> >
> > Uh, isn't that what we're talking about?
> 
> No. I meant writing this as an extension rather than a patch on core.

KaiGai's patches have been some changes to core and then an extension
which uses those changes.  The changes to core include exposing internal
functions for extensions to use, which will undoubtably end up being a
sore spot and fragile.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Tom Lane

Bruce Momjian  writes:
> On Tue, May  6, 2014 at 06:20:53PM -0400, Tom Lane wrote:
>> I wonder why there's not an option to store keys and values separately,
>> but as hashes not as the original strings, so that indexability of
>> everything could be guaranteed.  Or a variant of that might be to hash
>> only strings that are too large to fit in an index entry, and force
>> recheck only when searching for a string that needed hashing.
>> 
>> I wonder whether the most effective use of time at this point
>> wouldn't be to fix jsonb_ops to do that, rather than arguing about
>> what to rename it to.  If it didn't have the failure-for-long-strings
>> problem I doubt anybody would be unhappy about making it the default.

> Can we hash just the values, not the keys, in jsonb_ops, and hash the
> combo in jsonb_hash_ops.  That would give us key-only lookups without a
> recheck.

No, because there's nothing in JSON limiting the length of keys, any more
than values.

I think the idea of hashing only keys/values that are "too long" is a
reasonable compromise.  I've not finished coding it (because I keep
getting distracted by other problems in the code :-() but it does not
look to be very difficult.  I'm envisioning the cutoff as being something
like 128 bytes; in practice that would mean that few if any keys get
hashed, I think.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 14:49, Stephen Frost  wrote:

> Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
> vein, though there I agree that we need it per-relation as there are
> other trade-offs to consider (storage costs of the matview, cost to
> maintain the matview, etc, similar to indexes).
>
> The PGStrom proposal, aiui, is to add a new join type which supports
> using a GPU to answer a query where all the data is in regular PG
> tables.  I'd like that to "just work" when a GPU is available (perhaps
> modulo having to install some extension), for any join which is costed
> to be cheaper/faster when done that way.

All correct and agreed. As I explained earlier, lets cover the join
requirement here and we can discuss lookasides to data structures at
Pgcon.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 14:40, Stephen Frost  wrote:

> Allow me to re-state your suggestion here:
>
> An FDW is loaded which provides hook for join push-down (whatever those
> end up being).
>
> A query is run which joins *local* table A to *local* table B.  Standard
> heaps, standard indexes, all local to this PG instance.
>
> The FDW which supports join push-down is then passed this join for
> planning, with local sub-plans for the local tables.

Yes that is correct; thank you for confirming your understanding with me.

That also supports custom join of local to non-local table, or custom
join of two non-local tables.

If we can use interfaces that already exist with efficiency, why
invent a new one?


>> Have we considered having an Optimizer and Executor plugin that does
>> this without touching core at all?
>
> Uh, isn't that what we're talking about?

No. I meant writing this as an extension rather than a patch on core.

> The issue is that there's a
> bunch of internal functions that such a plugin would need to either have
> access to or re-implement, but we'd rather not expose those internal
> functions to the whole world because they're, uh, internal helper
> routines, essentially, which could disappear in another release.
>
> The point is that there isn't a good API for this today and what's being
> proposed isn't a good API, it's just bolted-on to the existing system by
> exposing what are rightfully internal routines.

I think the main point is that people don't want to ask for our
permission before they do what they want to do.

We either help people use Postgres, or they go elsewhere.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

Simon,

Perhaps you've changed your proposal wrt LOOKASIDES's and I've missed it
somewhere in the thread, but this is what I was referring to with my
concerns regarding per-relation definition of 'LOOKASIDES':

* Simon Riggs (si...@2ndquadrant.com) wrote:
> Roughly, I'm thinking of this...
> 
> CREATE LOOKASIDE ON foo
>TO foo_mat_view;
> 
> and also this...
> 
> CREATE LOOKASIDE ON foo
>TO foo_as_a_foreign_table   /* e.g. PGStrom */

where I took 'foo' to mean 'a relation'.

Your downthread comments on 'CREATE MATERIALIZED VIEW' are in the same
vein, though there I agree that we need it per-relation as there are
other trade-offs to consider (storage costs of the matview, cost to
maintain the matview, etc, similar to indexes).

The PGStrom proposal, aiui, is to add a new join type which supports
using a GPU to answer a query where all the data is in regular PG
tables.  I'd like that to "just work" when a GPU is available (perhaps
modulo having to install some extension), for any join which is costed
to be cheaper/faster when done that way.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] New pg_lsn type doesn't have hash/btree opclasses

2014-05-08 Thread Andres Freund

Hi,

On 2014-05-06 23:55:04 -0300, Fabrízio de Royes Mello wrote:
> If helps, I added some regression tests to the lastest patch.

I think we should apply this patch now. It's much more sensible with the
opclasses present and we don't win anything by waiting for 9.5.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)

2014-05-08 Thread Bruce Momjian

On Tue, May  6, 2014 at 06:20:53PM -0400, Tom Lane wrote:
> "David E. Wheeler"  writes:
> > On May 6, 2014, at 2:20 PM, Bruce Momjian  wrote:
> >> Well, then, we only have a few days to come up with a name.
> 
> > What are the options?
> 
> We have no proposals as yet.
> 
> I've been looking at the source code to try to understand the difference
> between the two opclasses (and BTW I concur with the opinions expressed
> recently about the poor state of the internal documentation for jsonb).
> If I've got it straight:
> 
> jsonb_ops indexes keys and values separately, so for instance "{xyz: 2}"
> would give rise to GIN entries that are effectively the strings "Kxyz"
> and "V2".  If you're looking for tuples containing "{xyz: 2}" then you
> would be looking for the AND of those independent index entries, which
> fortunately GIN is pretty good at computing.  But you could also look
> for just keys or just values.
> 
> jsonb_hash_ops creates an index entry only for values, but what it
> stores is a hash of both the value and the key it's stored under.
> So in this example you'd get a hash combining "xyz" and "2".  This
> means the only type of query you can perform is like "find JSON tuples
> containing {xyz: 2}".

Good summary, thanks.  This is the information I was hoping we had in
our docs.  How does hstore deal with these issues?

> Because jsonb_ops stores the *whole* value, you can do lossless index
> searches (no recheck needed on the heap tuple), but you also run the
> risk of long strings failing to fit into an index entry.  Since jsonb_ops
> reduces everything to a hash, there's no possibility of index failure,
> but all queries are lossy and require recheck.
> 
> TBH, at this point I'm sort of agreeing with the thought expressed
> upthread that maybe neither of these should be the default as-is.
> They seem like rather arbitrary combinations of choices.  In particular
> I wonder why there's not an option to store keys and values separately,
> but as hashes not as the original strings, so that indexability of
> everything could be guaranteed.  Or a variant of that might be to hash
> only strings that are too large to fit in an index entry, and force
> recheck only when searching for a string that needed hashing.
> 
> I wonder whether the most effective use of time at this point
> wouldn't be to fix jsonb_ops to do that, rather than arguing about
> what to rename it to.  If it didn't have the failure-for-long-strings
> problem I doubt anybody would be unhappy about making it the default.

Can we hash just the values, not the keys, in jsonb_ops, and hash the
combo in jsonb_hash_ops.  That would give us key-only lookups without a
recheck.

How do we index long strings now?  Is it he combination of GIN and long
strings that is the problem?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 14:32, Stephen Frost  wrote:

> The API you've outlined requires users to specify on a per-relation
> basis what join types are valid.

No, it doesn't. I've not said or implied that at any point.

If you keep telling me what I mean, rather than asking, we won't get anywhere.

I think that's as far as we'll get on email.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> >From here, it looks exactly like pushing a join into an FDW. If we had
> that, we wouldn't need Custom Scan at all.
> 
> I may be mistaken and there is a critical difference. Local sub-plans
> doesn't sound like a big difference.

Erm.  I'm not sure that you're really thinking through what you're
suggesting.

Allow me to re-state your suggestion here:

An FDW is loaded which provides hook for join push-down (whatever those
end up being).

A query is run which joins *local* table A to *local* table B.  Standard
heaps, standard indexes, all local to this PG instance.

The FDW which supports join push-down is then passed this join for
planning, with local sub-plans for the local tables.

> Have we considered having an Optimizer and Executor plugin that does
> this without touching core at all?

Uh, isn't that what we're talking about?  The issue is that there's a
bunch of internal functions that such a plugin would need to either have
access to or re-implement, but we'd rather not expose those internal
functions to the whole world because they're, uh, internal helper
routines, essentially, which could disappear in another release.

The point is that there isn't a good API for this today and what's being
proposed isn't a good API, it's just bolted-on to the existing system by
exposing what are rightfully internal routines.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Kouhei Kaigai

> On Wed, May 7, 2014 at 4:01 AM, Simon Riggs  wrote:
> > Agreed. My proposal is that if the planner allows the lookaside to an
> > FDW then we pass the query for full execution on the FDW. That means
> > that the scan, aggregate and join could take place via the FDW. i.e.
> > "Custom Plan" == lookaside + FDW
> >
> > Or put another way, if we add Lookaside then we can just plug in the
> > pgstrom FDW directly and we're done. And everybody else's FDW will
> > work as well, so Citus etcc will not need to recode.
> 
> As Stephen notes downthread, Tom has already expressed opposition to this
> idea on other threads, and I tend to agree with him, at least to some degree.
> I think the drive to use foreign data wrappers for PGStrom, CitusDB, and
> other things that aren't really foreign data wrappers as originally
> conceived is a result of the fact that we've got only one interface in this
> area that looks remotely like something pluggable; and so everyone's trying
> to fit things into the constraints of that interface whether it's actually
> a good fit or not.
> Unfortunately, I think what CitusDB really wants is pluggable storage, and
> what PGStrom really wants is custom paths, and I don't think either of those
> things is the same as what FDWs provide.
> 
Yes, what PGStrom really needs is a custom paths; that allows extension to
replace a part of built-in nodes according to extension's characteristics.
The discussion upthread clarified that FDW needs to be enhanced to support
functionality that PGStrom wants to provide, however, some of them also needs
redefinition of FDW, indeed.

Umm... I'm now missing the direction towards my goal.
What approach is the best way to glue PostgreSQL and PGStrom?

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 8 May 2014 13:48, Stephen Frost  wrote:
> > I don't view on-GPU memory as being an alternate *permanent* data store.
> 
> As I've said, others have expressed an interest in placing specific
> data on specific external resources that we would like to use to speed
> up queries. That might be termed a "cache" of various kinds or it
> might be simply be an allocation of that resource to a specific
> purpose.

I don't think some generalized structure that addresses the goals of
FDWs, CustomPaths, MatViews and query cacheing is going to be workable
and I'm definitely against having to specify at a per-relation level
when I want certain join types to be considered.

> > Perhaps that's the disconnect that we have here, as it was my
> > understanding that we're talking about using GPUs to make queries run
> > faster where the data comes from regular tables.
> 
> I'm trying to consider a group of use cases, so we get a generic API
> that is useful to many people, not just to one use case. I had
> understood the argument to be there must be multiple potential users
> of an API before we allow it.

The API you've outlined requires users to specify on a per-relation
basis what join types are valid.  As for if CustomPlans, there's
certainly potential for many use-cases there beyond just GPUs.  What I'm
unsure about is if any others would actually need to be implemented
externally as the GPU-related work seems to need or if we would just
implement those other join types in core.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] pg_shmem_allocations view

2014-05-08 Thread Andres Freund

On 2014-05-08 07:58:34 -0400, Robert Haas wrote:
> On Wed, May 7, 2014 at 5:54 PM, Andres Freund  wrote:
> > Hm. Not sure what you're ACKing here ;).
> 
> The idea of giving the unallocated memory a NULL key.

Ok. A new version of the patches implementing that are
attached. Including a couple of small fixups and docs. The latter aren't
extensive, but that doesn't seem to be warranted anyway.

> > There's lots of allocations from shmem that cannot be associated with
> > any index entry though. Not just ShmemIndex's own entry. Most
> > prominently most of the memory used for SharedBufHash isn't actually
> > associated with the "Shared Buffer Lookup Table" entry - imo a
> > dynahash.c defficiency.

> Hmm, I don't know what to do about that.

Well, we have to live with it for now :)

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
>From c219c03a173fef962c1caba9f016d5d87448fd8f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Tue, 6 May 2014 19:42:36 +0200
Subject: [PATCH 1/4] Associate names to created dynamic shared memory
 segments.

At some later point we want to add a view show all allocated dynamic
shared memory segments so admins can understand resource usage. To
avoid breaking the API in 9.5 add the necessary name now.
---
 contrib/test_shm_mq/setup.c   |  2 +-
 src/backend/storage/ipc/dsm.c | 60 ++-
 src/include/storage/dsm.h |  2 +-
 3 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/contrib/test_shm_mq/setup.c b/contrib/test_shm_mq/setup.c
index 572cf88..897c47b 100644
--- a/contrib/test_shm_mq/setup.c
+++ b/contrib/test_shm_mq/setup.c
@@ -125,7 +125,7 @@ setup_dynamic_shared_memory(int64 queue_size, int nworkers,
 	segsize = shm_toc_estimate(&e);
 
 	/* Create the shared memory segment and establish a table of contents. */
-	seg = dsm_create(shm_toc_estimate(&e));
+	seg = dsm_create("test_shm_mq", shm_toc_estimate(&e));
 	toc = shm_toc_create(PG_TEST_SHM_MQ_MAGIC, dsm_segment_address(seg),
 		 segsize);
 
diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index a5c0084..c8fdf6e 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -80,8 +80,10 @@ struct dsm_segment
 /* Shared-memory state for a dynamic shared memory segment. */
 typedef struct dsm_control_item
 {
-	dsm_handle	handle;
+	dsm_handle	handle;			/* segment identifier */
 	uint32		refcnt;			/* 2+ = active, 1 = moribund, 0 = gone */
+	Size		size;			/* current size */
+	char		name[SHMEM_INDEX_KEYSIZE]; /* informational name */
 } dsm_control_item;
 
 /* Layout of the dynamic shared memory control segment. */
@@ -454,14 +456,16 @@ dsm_set_control_handle(dsm_handle h)
  * Create a new dynamic shared memory segment.
  */
 dsm_segment *
-dsm_create(Size size)
+dsm_create(const char *name, Size size)
 {
 	dsm_segment *seg = dsm_create_descriptor();
-	uint32		i;
-	uint32		nitems;
+	dsm_control_item *item;
+	uint32		slot;
 
 	/* Unsafe in postmaster (and pointless in a stand-alone backend). */
 	Assert(IsUnderPostmaster);
+	Assert(name != NULL && strlen(name) > 0 &&
+		   strlen(name) < SHMEM_INDEX_KEYSIZE - 1);
 
 	if (!dsm_init_done)
 		dsm_backend_startup();
@@ -479,33 +483,39 @@ dsm_create(Size size)
 	/* Lock the control segment so we can register the new segment. */
 	LWLockAcquire(DynamicSharedMemoryControlLock, LW_EXCLUSIVE);
 
-	/* Search the control segment for an unused slot. */
-	nitems = dsm_control->nitems;
-	for (i = 0; i < nitems; ++i)
+	/*
+	 * Search the control segment for an unused slot that's previously been
+	 * used. If we don't find one initialize a new one if there's still space.
+	 */
+	for (slot = 0; slot < dsm_control->nitems; ++slot)
 	{
-		if (dsm_control->item[i].refcnt == 0)
-		{
-			dsm_control->item[i].handle = seg->handle;
-			/* refcnt of 1 triggers destruction, so start at 2 */
-			dsm_control->item[i].refcnt = 2;
-			seg->control_slot = i;
-			LWLockRelease(DynamicSharedMemoryControlLock);
-			return seg;
-		}
+		if (dsm_control->item[slot].refcnt == 0)
+			break;
 	}
 
-	/* Verify that we can support an additional mapping. */
-	if (nitems >= dsm_control->maxitems)
+	/* Verify that we can support the mapping. */
+	if (slot >= dsm_control->maxitems)
 		ereport(ERROR,
 (errcode(ERRCODE_INSUFFICIENT_RESOURCES),
  errmsg("too many dynamic shared memory segments")));
 
-	/* Enter the handle into a new array slot. */
-	dsm_control->item[nitems].handle = seg->handle;
+	item = &dsm_control->item[slot];
+	item->handle = seg->handle;
 	/* refcnt of 1 triggers destruction, so start at 2 */
-	dsm_control->item[nitems].refcnt = 2;
-	seg->control_slot = nitems;
-	dsm_control->nitems++;
+	item->refcnt = 2;
+	item->size = size;
+	strncpy(item->name, name, SHMEM_INDEX_KEYSIZE);
+	item->name[SHMEM_INDEX_KEYSIZE - 1] = 0;
+
+	seg->control_slot = slot;
+
+	/*
+	 * Increase number of initilized slots if we

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 04:33, Kouhei Kaigai  wrote:

> In case when it replaced join relations by ForeignScan, it will be almost
> same as expected ForeignScan with join-pushed down. Unlike usual table scan,
> it does not have actual relation definition on catalog, and its result
> tuple-slot is determined on the fly.
> One thing different from the remote-join is, this ForeignScan node may have
> sub-plans locally, if FDW driver (e.g GPU execution) may have capability on
> Join only, but no relation scan portion.
> So, unlike its naming, I want ForeignScan to support to have sub-plans if
> FDW driver supports the capability.

>From here, it looks exactly like pushing a join into an FDW. If we had
that, we wouldn't need Custom Scan at all.

I may be mistaken and there is a critical difference. Local sub-plans
doesn't sound like a big difference.


Have we considered having an Optimizer and Executor plugin that does
this without touching core at all?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgresql.auto.conf read from wrong directory

2014-05-08 Thread Fujii Masao

On Wed, May 7, 2014 at 4:57 PM, Amit Kapila  wrote:
> On Tue, May 6, 2014 at 7:04 PM, Christoph Berg  wrote:
>> Hi,
>>
>> if you split configuration and data by setting data_directory,
>> postgresql.auto.conf is writen to the data directory
>> (/var/lib/postgresql/9.4/main in Debian), but tried to be read from
>> the etc directory (/etc/postgresql/9.4/main).
>>
>> One place to fix it would be in ProcessConfigFile in
>> src/backend/utils/misc/guc-file.l:162 by always setting
>> CallingFileName = NULL in src/backend/utils/misc/guc-file.l:162, but
>> that breaks later in AbsoluteConfigLocation() when data_directory is
>> NULL. (As the comment in ProcessConfigFile says.)
>
> This problem occurs because we don't have the value of data_directory
> set in postgresql.conf by the time we want to parse .auto.conf file
> during server start.  The value of data_directory is only available after
> processing of config files.  To fix it, we need to store the value of
> data_directory during parse of postgresql.conf file so that we can use it
> till data_directory is actually set.  Attached patch fixes the problem.
> Could you please once confirm if it fixes the problem in your
> env./scenario.

Maybe this is nitpicking, but what happens when postgresql.auto.conf also
includes the setting of data_directory? This is possible because we can
set data_directory via ALTER SYSTEM now. Should we just ignore such
problematic setting in postgresql.auto.conf with warning message?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Simon Riggs

On 8 May 2014 13:48, Stephen Frost  wrote:

>> We have multiple use cases where an alternate data structure could be
>> used to speed up queries.
>
> I don't view on-GPU memory as being an alternate *permanent* data store.

As I've said, others have expressed an interest in placing specific
data on specific external resources that we would like to use to speed
up queries. That might be termed a "cache" of various kinds or it
might be simply be an allocation of that resource to a specific
purpose.

If we forget GPUs, that leaves multiple use cases that do fit the description.

> Perhaps that's the disconnect that we have here, as it was my
> understanding that we're talking about using GPUs to make queries run
> faster where the data comes from regular tables.

I'm trying to consider a group of use cases, so we get a generic API
that is useful to many people, not just to one use case. I had
understood the argument to be there must be multiple potential users
of an API before we allow it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Robert Haas (robertmh...@gmail.com) wrote:
> As Stephen notes downthread, Tom has already expressed opposition to
> this idea on other threads, and I tend to agree with him, at least to
> some degree.  I think the drive to use foreign data wrappers for
> PGStrom, CitusDB, and other things that aren't really foreign data
> wrappers as originally conceived is a result of the fact that we've
> got only one interface in this area that looks remotely like something
> pluggable; and so everyone's trying to fit things into the constraints
> of that interface whether it's actually a good fit or not.

Agreed.

> Unfortunately, I think what CitusDB really wants is pluggable storage,
> and what PGStrom really wants is custom paths, and I don't think
> either of those things is the same as what FDWs provide.

I'm not entirely sure that PGStrom even really "wants" custom paths..  I
believe the goal there is to be able to use GPUs to do work for us and
custom paths/pluggable plan/execution are seen as the way to do that and
not depend on libraries which are under GPL, LGPL or other licenses which
we'd object to depending on from core.

Personally, I'd love to just see CUDA or whatever support in core as a
configure option and be able to detect at start-up when the right
libraries and hardware are available and enable the join types which
could make use of that gear.

I don't like that we're doing all of this because of licenses or
whatever and would still hope to figure out a way to address those
issues but I haven't had time to go research it myself and evidently
KaiGai and others see the issues there as insurmountable, so they're
trying to work around it by creating a pluggable interface where an
extension could provide those join types.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Stephen Frost

* Simon Riggs (si...@2ndquadrant.com) wrote:
> On 8 May 2014 03:36, Stephen Frost  wrote:
> > I'm all for making use of MatViews and GPUs, but there's more than one
> > way to get there and look-asides feels like pushing the decision,
> > unnecessarily, on to the user.
> 
> I'm not sure I understand where most of your comments come from, so
> its clear we're not talking about the same things yet.
> 
> We have multiple use cases where an alternate data structure could be
> used to speed up queries.

I don't view on-GPU memory as being an alternate *permanent* data store.
Perhaps that's the disconnect that we have here, as it was my
understanding that we're talking about using GPUs to make queries run
faster where the data comes from regular tables.

> My goal is to use the alternate data structure(s)

Pluggable storage is certainly interesting, but I view that as
independent of the CustomPlan-related work.

> which I now understand is different from the main thrust of Kaigai's
> proposal, so I will restate this later on another thread.

Sounds good.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] [v9.5] Custom Plan API

2014-05-08 Thread Robert Haas

On Wed, May 7, 2014 at 4:01 AM, Simon Riggs  wrote:
> Agreed. My proposal is that if the planner allows the lookaside to an
> FDW then we pass the query for full execution on the FDW. That means
> that the scan, aggregate and join could take place via the FDW. i.e.
> "Custom Plan" == lookaside + FDW
>
> Or put another way, if we add Lookaside then we can just plug in the
> pgstrom FDW directly and we're done. And everybody else's FDW will
> work as well, so Citus etcc will not need to recode.

As Stephen notes downthread, Tom has already expressed opposition to
this idea on other threads, and I tend to agree with him, at least to
some degree.  I think the drive to use foreign data wrappers for
PGStrom, CitusDB, and other things that aren't really foreign data
wrappers as originally conceived is a result of the fact that we've
got only one interface in this area that looks remotely like something
pluggable; and so everyone's trying to fit things into the constraints
of that interface whether it's actually a good fit or not.
Unfortunately, I think what CitusDB really wants is pluggable storage,
and what PGStrom really wants is custom paths, and I don't think
either of those things is the same as what FDWs provide.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PGDLLEXPORTing all GUCs?

2014-05-08 Thread Andres Freund

On 2014-05-08 07:56:46 -0400, Robert Haas wrote:
> On Thu, May 8, 2014 at 12:19 AM, Craig Ringer  wrote:
> > In terms of ugliness, would you be happier using a pg_extern instead of
> > extern, where we:
> >
> > #ifdef WIN32
> > #define pg_extern extern PGDLLIMPORT
> > #else
> > #define pg_extern extern
> > #endif
> >
> > ?
> 
> I personally would not be happier with that.  extern can be applied to
> function prototypes, not just variables, and even to function
> definitions.  When I see PGDLLIMPORT, I think, oh look, this is some
> kind of magic pixie dust that Windows requires us to sprinkle on our
> variables.  When I see pg_extern, I think, oh, this is the PostgreSQL
> version of extern, and it's not, really.

Well, it's actually "helpful" in some sense for functions too (one
trampoline less on windows). And given how postgres uses externs I think
it matches well with just PGDLLIMPORT everything.

> But I can live with it if it makes other people sufficiently happier.
> One way or another, I really do feel very strongly that we should push
> forward with broader PGDLLIMPORT-ification.  My experience mirrors
> Craig's: this is a very useful thing for extension developers.

Yea. I have been yelled at by jenkins far too many times. Exporting all
variables seems like the only way to significantly reduce the need for
!windows developers needing to care about windows.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_shmem_allocations view

2014-05-08 Thread Robert Haas

On Wed, May 7, 2014 at 5:54 PM, Andres Freund  wrote:
> On 2014-05-07 17:48:15 -0400, Robert Haas wrote:
>> On Tue, May 6, 2014 at 6:09 PM, Andres Freund  wrote:
>> >> I guess I'd vote for
>> >> ditching the allocated column completely and outputting the memory
>> >> allocated without ShmemIndex using some fixed tag (like "ShmemIndex"
>> >> or "Bootstrap" or "Overhead" or something).
>> >
>> > My way feels slightly cleaner, but I'd be ok with that as well. There's
>> > no possible conflicts with an actual segment... In your variant the
>> > unallocated/slop memory would continue to have a NULL key?
>>
>> Yeah, that seems all right.
>
> Hm. Not sure what you're ACKing here ;).

The idea of giving the unallocated memory a NULL key.

>> One way to avoid conflict with an actual segment would be to add an
>> after-the-fact entry into ShmemIndex representing the amount of memory
>> that was used to bootstrap it.
>
> There's lots of allocations from shmem that cannot be associated with
> any index entry though. Not just ShmemIndex's own entry. Most
> prominently most of the memory used for SharedBufHash isn't actually
> associated with the "Shared Buffer Lookup Table" entry - imo a
> dynahash.c defficiency.

Hmm, I don't know what to do about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 107 matches

Mail list logo