Re: [HACKERS] Getting to 9.3 beta

2013-03-30 Thread Alexander Korotkov
On Fri, Mar 29, 2013 at 7:22 PM, Andres Freund and...@2ndquadrant.comwrote:

 - Index based regexp search for pg_trgm:
Seems like the patch is undergoing restructuring of the regex access API
   = move to next fest


Last proposal of regex API by Tom seems simple enough in implementation for
me. I'm going to post reworked version today or tomorrow. I hope we can do
one more review cycle in this CF.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Cube extension improvement, GSoC

2013-03-30 Thread Alexander Korotkov
Hi, Jey!

I remember I've started couple of threads related to cube extension:
http://www.postgresql.org/message-id/4f30616d.3030...@gmail.com
http://www.postgresql.org/message-id/4f3c16e9.90...@gmail.com
Could you provide some feedback to GSoC proposal of Stas?

On Sat, Mar 23, 2013 at 3:10 AM, Stas Kelvich stanc...@gmail.com wrote:

 Hello,

 some time ago I started working on the data search system (about 100-200M
 of records) with queries consisted of several diapason and equality
 conditions, e.g.:

   WHERE dim1 BETWEEN 128 AND 137 AND
   WHERE dim2 BETWEEN 4815 AND 162342 AND
   WHERE dim3 = 42
   ORDER BY dim1 ASC

 There are 6 or 7 search criteria in my task. In order to avoid full table
 scan I started using R-Tree over cube data type:

   CREATE INDEX ON my_table USING GiST(cube(array[dim1, dim2, dim3]))

 For fast sorting I used Alexander Korotkov's patch for knngist (
 http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg153676.html),
 which changes metric for nearest neighbors search and allows to obtain
 cubes ordered by a specific coordinate. Having changed some data types and
 function/operator definitions I ported this to 9.2 (
 https://github.com/kelvich/postgres/commit/bb372). Then, after this I
 could select ordered records right from the index:

   SELECT * FROM my_table
   WHERE cube(array[dim1, dim2, dim3]) @ cube
 '(128,4815,42),(137,162342,42)'
   ORDER BY cube(array[dim1, dim2, dim3]) * 5;

 The main issue of such approach is the index size. In my case it was about
 100GB for 100M of records. Therefore index building becomes very expensive
 disk-related operation. For the same reason reading a large number of
 records from the index is slow too.

 I came to Oleg Bartunov, Theodor Sigaev and after a while to Alexander
 Korotkov for any help. (I'm very thankful to them and glad that they agreed
 to meet, listen to me and give useful advices). Having discussed it we
 decided that there was several possible improvements for the cube extension:

   * Adding point data type support to the cube extension in order to avoid
 storing of coincident upper left and lower right vertices, which may reduce
 the volume that leaf nodes occupy almost twice.
   * Checking the split algorithm with big datasets and, if possible,
 improving it.
   * Learning cube extension to store dimensions with different data types.
 Such index would be good alternative to compound key B-Tree multi-index
 (suitable for diapason queries and data ordering).
   * Providing support for KNN with metrics induced by the different norms:
 euclidean, taxicab norm, p-norm. This can be useful in fields where we can
 extract signature: similar images search, similar audio search.

 I'd like to participate in GSoC with this improvements, and I'm very
 interested in any comments or suggestions about this feature list.


--
With best regards,
Alexander Korotkov.


[HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread 赖文豫
As we know, SSDs are widely used in various kinds of applications. But the
SMGR in PostgreSQL still only
support magnetic disk. How do we make full use of SSDs to improve the
performance of PostgreSQL?

-- 
Just do it!


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Bruce Momjian
On Sat, Mar 30, 2013 at 10:08:44PM +0800, 赖文豫 wrote:
 As we know, SSDs are widely used in various kinds of applications. But the 
 SMGR
 in PostgreSQL still only 
 support magnetic disk. How do we make full use of SSDs to improve the
 performance of PostgreSQL?

When the storage manager (SMGR) says magnetic disk, it is talking about
read/write media with random access capabillity, vs. something like
write-only media, which was originally supported in the code.  Postgres
works just fine with SSDs;  the only adjustment you might want to make
is to reduce random_page_cost.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cube extension improvement, GSoC

2013-03-30 Thread Alexander Korotkov
On Sat, Mar 30, 2013 at 3:55 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 Hi, Jey!

 I remember I've started couple of threads related to cube extension:


 Oh, it's a typo. I remember you've started those threads.


 http://www.postgresql.org/message-id/4f30616d.3030...@gmail.com
 http://www.postgresql.org/message-id/4f3c16e9.90...@gmail.com
 Could you provide some feedback to GSoC proposal of Stas?


--
With best regards,
Alexander Korotkov.


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Sat, Mar 30, 2013 at 10:08:44PM +0800, 赖文豫 wrote:
 As we know, SSDs are widely used in various kinds of applications. But the 
 SMGR
 in PostgreSQL still only 
 support magnetic disk. How do we make full use of SSDs to improve the
 performance of PostgreSQL?

 When the storage manager (SMGR) says magnetic disk, it is talking about
 read/write media with random access capabillity, vs. something like
 write-only media, which was originally supported in the code.  Postgres
 works just fine with SSDs;  the only adjustment you might want to make
 is to reduce random_page_cost.

To enlarge on that point: the current smgr layer is basically vestigial,
because the sorts of device dependencies the Berkeley guys envisioned
switching between are nowadays always handled at the filesystem and
kernel device driver layers.  md.c is really an interface to the Unix
block device APIs; it has nothing whatsoever to do with whether the bits
are stored on spinning rust or something else.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Andrew Dunstan


On 03/30/2013 12:28 PM, Tom Lane wrote:

Bruce Momjian br...@momjian.us writes:

On Sat, Mar 30, 2013 at 10:08:44PM +0800, 赖文豫 wrote:

As we know, SSDs are widely used in various kinds of applications. But the SMGR
in PostgreSQL still only
support magnetic disk. How do we make full use of SSDs to improve the
performance of PostgreSQL?

When the storage manager (SMGR) says magnetic disk, it is talking about
read/write media with random access capabillity, vs. something like
write-only media, which was originally supported in the code.  Postgres
works just fine with SSDs;  the only adjustment you might want to make
is to reduce random_page_cost.

To enlarge on that point: the current smgr layer is basically vestigial,
because the sorts of device dependencies the Berkeley guys envisioned
switching between are nowadays always handled at the filesystem and
kernel device driver layers.  md.c is really an interface to the Unix
block device APIs; it has nothing whatsoever to do with whether the bits
are stored on spinning rust or something else.





This isn't the first time I've seen this sort of comment. Do we need to 
add some wording like the above to the top of md.c and the README in 
that directory?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 This isn't the first time I've seen this sort of comment. Do we need to 
 add some wording like the above to the top of md.c and the README in 
 that directory?

Yeah, probably.  I'll go write something ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for pg_upgrade and invalid indexes

2013-03-30 Thread Andres Freund
On 2013-03-29 19:03:05 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Those columns cannot be NULL, so using IS DISTINCT FROM seems a bit
  clumsy.
 
 That was what I started to write, too, but actually I think the IS
 DISTINCT is correct and the RIGHT JOIN should be a LEFT JOIN.  Note
 that the query appears to be intended to collect regular tables as
 well as indexes.  (As patched, that's totally broken, so I infer
 Bruce hasn't tested it yet.)

Ah yes. Then I'd actually find it much more readable to formulate it as a NOT
EXISTS(), but that might be just me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Satoshi Nagayasu

2013/03/30 23:31, Bruce Momjian wrote:

On Sat, Mar 30, 2013 at 10:08:44PM +0800, 赖文豫 wrote:

As we know, SSDs are widely used in various kinds of applications. But the SMGR
in PostgreSQL still only
support magnetic disk. How do we make full use of SSDs to improve the
performance of PostgreSQL?


When the storage manager (SMGR) says magnetic disk, it is talking about
read/write media with random access capabillity, vs. something like
write-only media, which was originally supported in the code.  Postgres
works just fine with SSDs;  the only adjustment you might want to make
is to reduce random_page_cost.


BTW, using the larger block size (64kB) would improve performance
when using SSD drive?

I found that configure script supports --with-blocksize option to
change the block size up to 32kB. (and the configure script does
not support 64kb block size so far.)

But I heard that larger block size, like 256kB, would take
advantage of the SSD performance because of the block management
within SSD.

So, I'm just curious to know that.

Regards,
--
Satoshi Nagayasu sn...@uptime.jp
Uptime Technologies, LLC. http://www.uptime.jp


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] HS and clog

2013-03-30 Thread Andres Freund
Hi,

During the investigation of
http://archives.postgresql.org/message-id/CAL_0b1t%3DWuM6roO8dki%3Dw8DhH8P8whhohbPjReymmQUrOcNT2A%40mail.gmail.com
I noticed that during HS we do the following in 
RecordKnownAssignedTransactionIds:
if (TransactionIdFollows(xid, latestObservedXid))
{
TransactionId next_expected_xid;

/*
 * Extend clog and subtrans like we do in GetNewTransactionId() 
during
 * normal operation using individual extend steps. Typical case
 * requires almost no activity.
 */
next_expected_xid = latestObservedXid;
TransactionIdAdvance(next_expected_xid);
while (TransactionIdPrecedesOrEquals(next_expected_xid, xid))
{
ExtendCLOG(next_expected_xid);
ExtendSUBTRANS(next_expected_xid);

TransactionIdAdvance(next_expected_xid);
}

Extending subtrans is fine, that's required since its truncated after
restart and because its not really WAL logged, but extending CLOG? Thats
strange, isn't it, since clog is actually WAL logged, so all required
pages will be zeroed from their wal records anyway.
The commit introducing HS changed ExtendCLOG to do
/* Zero the page and make an XLOG entry about it */
ZeroCLOGPage(pageno, !InRecovery);
to make that even work during recovery.

Imo this shouldn't be needed.

Simon, do you remember why you added that?  It makes me uneasy doing
something like that only during HS but not during normal crash
recovery/disabled HS.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Join cost estimates

2013-03-30 Thread Stephen Frost
Jeff,

* Jeff Davis (pg...@j-davis.com) wrote:
 On Thu, 2013-03-28 at 19:56 -0400, Stephen Frost wrote:
41K hashed, seqscan 4M: 115030.10 + 1229.46 = 116259.56
4M hashed, seqscan 41K: 1229.46 + 211156.20 = 212385.66
 
 I think those are backwards -- typo?

Yes, sorry, those are backwards.  The '4M hashed, seqscan 41K' entry
comes out with the lower cost and that's what we end up using.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hash Join cost estimates

2013-03-30 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
 I think the point is that there may *not* be few hash collisions ...

Right, but that's actually all entirely based on concerns over there
being duplicates (hence why we consider the MCVs and ndistinct), which
makes *some* sense given that we currently have a single linked-list in
each bucket into which any dups are placed.

It occurs to me that we could get away from this by having a 2-level
system.  We hash to a bucket which contains a linked list of linked
lists.  The bucket list would be for actual collisions (which we don't
consider at all in the current bucket estimating) and each of those
entries would be a linked list of duplicates for that entry in the
bucket.  This would add some small cost for building the hash, since we
would have to step through each entry in the bucket to determine if the
entry being added is a new entry or not, but not very much unless we're
worried about lots of collisions happening (which I certainly hope isn't
the case).  Hash tables are generally expected to take more effort to
build initially anyway, so I don't see a problem putting more logic
there.  Also, we could skip checking each entry in the bucket when the
input is known to be unique and instead just skip to the end of the
bucket since the new entry can't match any existing.

We could still work through the bucketing logic and add some costing to
that case for those situations where we are hashing on only one key of a
multi-key join and we expect a lot of duplicates to exist.  I'm not sure
how much that happens though- I would hope that we would use a composite
hash key most of the time that we have multi-key joins that use hash
tables.

Thoughts?

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hash Join cost estimates

2013-03-30 Thread Stephen Frost
* Jeff Davis (pg...@j-davis.com) wrote:
 In Stephen's case the table was only 41KB, so something still seems off.
 Maybe we should model the likelihood of a collision based on the
 cardinalities (assuming a reasonably good hash function)?

It's not really 'hash collisions' that we're trying to be wary of, per
se, it's the risk of duplicates.  To push this very far in the other
direction- if you have 41k of the *same value* in the small table, then
it's currently faster to build the hash table on the large table and
then seq scan the small table (10s vs. 14s on my laptop running w/o
being plugged in, so it's a bit slower).

Now, that's a pretty ridiculous case, but it seems like we're being
pretty dumb here- for every input row from the outer table, we're
looking through all 41K *duplicate* keys in that one hash bucket.  This
goes back to the suggestion I just made- if the hash bucket list
contained only unique values (which are a result of actual hash
collisions), then we'd only be doing *one* comparison for each input row
of the outer table that doesn't match- and when we found one which
*did*, we would only need to step through the dup list for that one
entry and blast back all of those rows, forgetting about the rest of the
bucket which we know can't match.

 Also, I think I found an important assumption that seems dubious (in
 comment for estimate_hash_bucketsize()):

I've wondered about that also.  It certainly seems quite bogus that we
can end up with an 'estimated # of entries in a bucket' that's larger
than the # of entries we've found for the MCV in the table, especially
*double* that.

 Stephen, do you think this could explain your problem?

As I tried to articulate in my initial email- even if we had a *perfect*
answer to how many comparisons will we need to do, the current costing
would cause us to pick the plan that, intuitively and empirically, takes
longer (hash the 41M row table) because that cost is multiplied times
the number of outer row tables and the cpu_tuple_cost (charged to build
the hash table) isn't high enough relative to the cpu_op_cost (charged
to do the comparisons in the buckets).

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Changing recovery.conf parameters into GUCs

2013-03-30 Thread Josh Berkus
Simon, All,

The new approach seems fine to me; I haven't looked at the code.  If Tom
doesn't feel like it's overly complicated, then this seems like a good
compromise.

The desire to move recovery.conf/trigger to a different directory is
definitely wanted by our Debian contingent.  Right now, the fact that
Debian has all .confs in /etc/, but that it doesn't work to relocate
recovery.conf, is a constant source of irritation.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Simon Riggs
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:

 Simon, do you remember why you added that?

That doesn't look like code I added, but I'll look some more.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changing recovery.conf parameters into GUCs

2013-03-30 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 The desire to move recovery.conf/trigger to a different directory is
 definitely wanted by our Debian contingent.  Right now, the fact that
 Debian has all .confs in /etc/, but that it doesn't work to relocate
 recovery.conf, is a constant source of irritation.

It seems like this is confusing two different problems.

If we get rid of recovery.conf per se in favor of folding the settings
into GUCs in the regular config file, then the first aspect of the issue
goes away, no?  The second aspect is where to put the trigger file, and
I'm not at all convinced that anybody would want the trigger file to be
in the same place they put external config files, mainly because the
trigger has to be in a server-writable directory.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Simon Riggs
On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:

 Imo this shouldn't be needed.

In principle, I think your premise looks correct. ExtendCLOG() is not needed.

If the xid truly is known assigned at a point in time, then the clog
should already have been extended to allow that.

I can't recall any special case there, but there may be one.

Given that it seems to work, changing it requires more care...

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:
 Imo this shouldn't be needed.

 In principle, I think your premise looks correct. ExtendCLOG() is not needed.

 If the xid truly is known assigned at a point in time, then the clog
 should already have been extended to allow that.

IIRC, the slru/clog code assumes, or at least did at one time assume,
that requests to initialize new pages are consecutive after startup.
Which is OK in normal operation since we don't skip over assignment
of any XIDs.

What I thought Andres was worried about here was that there might be
a gap in the extension requests when doing HS replay, and that maybe
that led to failure to create files or portions of files that later
the code would try to reference.

But maybe I misunderstood.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Andres Freund
On 2013-03-30 18:24:44 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On 30 March 2013 18:20, Andres Freund and...@2ndquadrant.com wrote:
  Imo this shouldn't be needed.
 
  In principle, I think your premise looks correct. ExtendCLOG() is not 
  needed.
 
  If the xid truly is known assigned at a point in time, then the clog
  should already have been extended to allow that.

I think thats pretty much guaranteed since xids should only be generated by
GetNewTransactionId() which calls ExtendCLOG() which emits CLOG_ZEROPAGE. So
any failure here would lead to broken crash recovery.

What I am worried about is mostly that it could lead to

a) errors with overwriting existing clog entries if the known assigned xids
   machinery lost track somehow. Check for example the bug that
   triggered me looking at the code referenced upthread.
b) future coding errors which only work with either HS enabled/disabled since
   the behaviour is now different between both.

 IIRC, the slru/clog code assumes, or at least did at one time assume,
 that requests to initialize new pages are consecutive after startup.
 Which is OK in normal operation since we don't skip over assignment
 of any XIDs.

It at least expects that ExtendClog() gets called for every xid since it
only zeroes it for the first xid on a page.

 What I thought Andres was worried about here was that there might be
 a gap in the extension requests when doing HS replay, and that maybe
 that led to failure to create files or portions of files that later
 the code would try to reference.

No, I am not particularly worried about that since I don't think the HS
specific call to ExtendCLOG() can happen before a CLOG_ZEROPAGE record
has been replayed, so I can't see how it ever can do something
necessary. Its more that I am worried that it could overwrite stuff in
edge-cases.

I am pretty sure that the scenario you describe happens for SUBTRANS
tho. Leading to the reported bug.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HS and clog

2013-03-30 Thread Andres Freund
On 2013-03-30 23:58:26 +0100, Andres Freund wrote:
 I am pretty sure that the scenario you describe happens for SUBTRANS
 tho. Leading to the reported bug.

For a slightly too long explanation see:
http://archives.postgresql.org/message-id/20130330172144.GI28736%40alap2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] By now, why PostgreSQL 9.2 don't support SSDs?

2013-03-30 Thread Ants Aasma
On Mar 30, 2013 7:13 PM, Satoshi Nagayasu sn...@uptime.jp wrote:
 But I heard that larger block size, like 256kB, would take
 advantage of the SSD performance because of the block management
 within SSD.

This is only true for very bad SSDs. Any SSD that you would want to trust
with your data do block remapping internally, eliminating the issue. (See
for example Intel DC3700 sustaining 34'000 random 4k writes/s)

Larger block sizes would just lift the random access workload write
amplification into Postgresql  where the drive can't fix it. For sequential
or mostly sequential workloads the OS can take care of it by merging
writes. Additionally, contention for page level locks will increase with
page size, cache efficiency goes down. I would expect cases where larger
block size is a significant benefit to be very rare.

Regards,
Ants Aasma


Re: [HACKERS] Fix for pg_upgrade and invalid indexes

2013-03-30 Thread Bruce Momjian

OK, patch applied and backpatched as far back as pg_upgrade exists in
git.

---

On Fri, Mar 29, 2013 at 11:35:13PM -0400, Bruce Momjian wrote:
 On Fri, Mar 29, 2013 at 07:03:05PM -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   Those columns cannot be NULL, so using IS DISTINCT FROM seems a bit
   clumsy.
  
  That was what I started to write, too, but actually I think the IS
  DISTINCT is correct and the RIGHT JOIN should be a LEFT JOIN.  Note
  that the query appears to be intended to collect regular tables as
  well as indexes.  (As patched, that's totally broken, so I infer
  Bruce hasn't tested it yet.)
 
 Yes, I only ran my simple tests so far --- I wanted to at least get some
 eyes on it.  I was wondering if we ever need to use parentheses for
 queries that mix normal and outer joins?  I am unclear on that.
 
 Attached is a fixed patch that uses LEFT JOIN.  I went back and looked
 at the patch that added this test and I think the patch is now complete.
 I would like to apply it tomorrow/Saturday so it will be ready for
 Monday's packaging, and get some buildfarm time on it.
 
 -- 
   Bruce Momjian  br...@momjian.ushttp://momjian.us
   EnterpriseDB http://enterprisedb.com
 
   + It's impossible for everything to be true. +

 diff --git a/contrib/pg_upgrade/check.c b/contrib/pg_upgrade/check.c
 new file mode 100644
 index 65fb548..35783d0
 *** a/contrib/pg_upgrade/check.c
 --- b/contrib/pg_upgrade/check.c
 *** static void check_is_super_user(ClusterI
 *** 20,26 
   static void check_for_prepared_transactions(ClusterInfo *cluster);
   static void check_for_isn_and_int8_passing_mismatch(ClusterInfo *cluster);
   static void check_for_reg_data_type_usage(ClusterInfo *cluster);
 - static void check_for_invalid_indexes(ClusterInfo *cluster);
   static void get_bin_version(ClusterInfo *cluster);
   static char *get_canonical_locale_name(int category, const char *locale);
   
 --- 20,25 
 *** check_and_dump_old_cluster(bool live_che
 *** 97,103 
   check_is_super_user(old_cluster);
   check_for_prepared_transactions(old_cluster);
   check_for_reg_data_type_usage(old_cluster);
 - check_for_invalid_indexes(old_cluster);
   check_for_isn_and_int8_passing_mismatch(old_cluster);
   
   /* old = PG 8.3 checks? */
 --- 96,101 
 *** check_for_reg_data_type_usage(ClusterInf
 *** 952,1046 
  %s\n\n, output_path);
   }
   else
 - check_ok();
 - }
 - 
 - 
 - /*
 -  * check_for_invalid_indexes()
 -  *
 -  *  CREATE INDEX CONCURRENTLY can create invalid indexes if the index build
 -  *  fails.  These are dumped as valid indexes by pg_dump, but the
 -  *  underlying files are still invalid indexes.  This checks to make sure
 -  *  no invalid indexes exist, either failed index builds or concurrent
 -  *  indexes in the process of being created.
 -  */
 - static void
 - check_for_invalid_indexes(ClusterInfo *cluster)
 - {
 - int dbnum;
 - FILE   *script = NULL;
 - boolfound = false;
 - charoutput_path[MAXPGPATH];
 - 
 - prep_status(Checking for invalid indexes from concurrent index 
 builds);
 - 
 - snprintf(output_path, sizeof(output_path), invalid_indexes.txt);
 - 
 - for (dbnum = 0; dbnum  cluster-dbarr.ndbs; dbnum++)
 - {
 - PGresult   *res;
 - booldb_used = false;
 - int ntups;
 - int rowno;
 - int i_nspname,
 - i_relname;
 - DbInfo *active_db = cluster-dbarr.dbs[dbnum];
 - PGconn *conn = connectToServer(cluster, active_db-db_name);
 - 
 - res = executeQueryOrDie(conn,
 - SELECT 
 n.nspname, c.relname 
 - FROM   
 pg_catalog.pg_class c, 
 -
 pg_catalog.pg_namespace n, 
 -
 pg_catalog.pg_index i 
 - WHERE  
 (i.indisvalid = false OR 
 -
  i.indisready = false) AND 
 -
 i.indexrelid = c.oid AND 
 -
 c.relnamespace = n.oid AND 
 - /* we do not 
 migrate these, so skip them */
 -
 n.nspname != 

Re: [HACKERS] pkg-config files for libpq and ecpg

2013-03-30 Thread Peter Eisentraut
On Wed, 2013-03-27 at 17:06 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On 3/24/13 1:55 PM, Tom Lane wrote:
  I experimented a bit with this version of the patch.  The hunk that
  removes -I$(libpq_srcdir) and $(libpq) from the ecpg/compatlib build
  breaks the build for me, so I took it out.
 
  What was the error message?  Probably not important, but curious.
 
 ecpg's #include of libpq-fe.h failed.  I speculate that you didn't
 notice because you tested on a machine where libpq-fe.h exists in
 /usr/include.

Right, we need to keep libpq in CPPFLAGS, but we can remove it from
SHLIB_LINK.

  At least for the libraries we are currently proposing to pkgconfig-ify,
  it seems to me that we only want a -I for where we are installing our
  own headers; there is no need for anything else.  That is,
  echo 'Cflags: -I$(includedir)'
  seems like plenty.  We aren't exposing any other packages' headers
  in the public header files for these libraries, so there's no need
  to tell client packages about them.
 
  libpq exposes at least openssl and gssapi, so we need those at least.
 
 No, it does not.  A client might choose to #include those of its own
 accord, but then it's the client's problem.  Our exported headers do
 not #include anything more exotic than stdio.h, and it's not the
 business of the pkg-config switches to provide for anything beyond
 allowing inclusions of our headers to succeed.

I was actually thinking of PQgetssl(), which is documented to require
OpenSSL, but that was actually changed a long time ago and the
documentation not updated.

So actually you are right, we don't need to provided any extra -I flags
(if we ignore libpq-int.h).  We do need that whole logic for
Libs.private however.

So here is my updated patch, with the ecpg business changed as explained
above, and the extra magic removed from the Cflags lines.
diff --git a/.gitignore b/.gitignore
index 4df314c..8e227a2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -22,6 +22,7 @@ lcov.info
 win32ver.rc
 *.exe
 lib*dll.def
+lib*.pc
 
 # Local excludes in root directory
 /GNUmakefile
diff --git a/doc/src/sgml/ecpg.sgml b/doc/src/sgml/ecpg.sgml
index 4d904cd..68bcb13 100644
--- a/doc/src/sgml/ecpg.sgml
+++ b/doc/src/sgml/ecpg.sgml
@@ -5715,6 +5715,15 @@ titleProcessing Embedded SQL Programs/title
   /para
 
   para
+   You can
+   use commandpg_config/commandindextermprimarypg_config/primarysecondary sortas=ecpgwith
+   ecpg/secondary/indexterm
+   or commandpkg-config/commandindextermprimarypkg-config/primarysecondary sortas=ecpgwith
+   ecpg/secondary/indexterm with package name literallibecpg/literal to
+   get the paths for your installation.
+  /para
+
+  para
If you manage the build process of a larger project using
applicationmake/application, it might be convenient to include
the following implicit rule to your makefiles:
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 1a09c1c..3b6ada0 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7641,6 +7641,18 @@ titleBuilding applicationlibpq/application Programs/title
  /para
 
  para
+  If you
+  have commandpkg-config/commandindextermprimarypkg-config/primarysecondary sortas=libpqwith
+  libpq/secondary/indexterm installed, you can run instead:
+screen
+prompt$/prompt pkg-config --cflags libpq
+computeroutput-I/usr/local/include/computeroutput
+/screen
+  Note that this will already include the option-I/option in front of
+  the path.
+ /para
+
+ para
   Failure to specify the correct option to the compiler will
   result in an error message such as:
 screen
@@ -7675,6 +7687,15 @@ titleBuilding applicationlibpq/application Programs/title
  /para
 
  para
+  Or again use commandpkg-config/command:
+screen
+prompt$/prompt pkg-config --libs libpq
+computeroutput-L/usr/local/pgsql/lib -lpq/computeroutput
+/screen
+  Note again that this prints the full options, not only the path.
+ /para
+
+ para
   Error messages that point to problems in this area could look like
   the following:
 screen
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 4da2f10..2a0c7a9 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -87,6 +87,7 @@ shlib_bare	= lib$(NAME)$(DLSUFFIX)
 # Testing the soname variable is a reliable way to determine whether a
 # linkable library is being built.
 soname		= $(shlib_major)
+pkgconfigdir = $(libdir)/pkgconfig
 else
 # Naming convention for dynamically loadable modules
 shlib		= $(NAME)$(DLSUFFIX)
@@ -305,6 +306,7 @@ all-lib: all-shared-lib
 ifdef soname
 # no static library when building a dynamically loadable module
 all-lib: all-static-lib
+all-lib: lib$(NAME).pc
 endif
 
 all-static-lib: $(stlib)
@@ -388,6 +390,23 @@ $(stlib): $(shlib) $(DLL_DEFFILE) | $(SHLIB_PREREQS)
 endif # PORTNAME == cygwin || PORTNAME == win32
 
 
+%.pc: $(MAKEFILE_LIST)
+	echo 'Name: lib$(NAME)' $@
+	echo 'Description: PostgreSQL 

Re: [HACKERS] citext like searches using index

2013-03-30 Thread David E. Wheeler
On Mar 20, 2013, at 1:45 AM, David E. Wheeler da...@kineticode.com wrote:

 Is there currently any way to create an index that can be used to speed up 
 searches like the one above?
 If not, do you have any idea how it might be implemented? Perhaps I could 
 give it a try myself.
 
 Thank you in advance for any suggestions you might have.
 
 I would think that text_pattern_ops would work, no?

Thorbjørn replied to me privately in the negative.

Hackers, what would be required to get an index on a CITEXT column to support 
LIKE?

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-03-30 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 Hackers, what would be required to get an index on a CITEXT column to support 
 LIKE?

The LIKE index optimization is hard-wired into
match_special_index_operator(), which never heard of citext's ~~
operators.

I've wanted for years to replace that mechanism with something that
would support plug-in extensions, but have no very good idea how to
do it.

A bigger problem though is that the LIKE optimization is generally
pretty ineffective for ILIKE (which is what you're really asking for
here) because we can't assume that both case versions are consecutive
in the index.  I think the optimization just punts upon seeing any
letter anyway, if the operator is ILIKE.

Or in short: fixing this is a research problem.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers