Re: [HACKERS] [rfc] overhauling pgstat.stat

2013-09-05 Thread Atri Sharma
On Thu, Sep 5, 2013 at 10:59 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Satoshi Nagayasu wrote:

 But, for now, I think we should have a real index for the
 statistics data because we already have several index storages,
 and it will allow us to minimize read/write operations.

 BTW, what kind of index would be preferred for this purpose?
 btree or hash?

 I find it hard to get excited about using the AM interface for this
 purpose.  To me it makes a lot more sense to have separate, much
 simpler code.  We don't need any transactionality, user defined types,
 user defined operators, or anything like that.

+1.

But, would not rewriting a lot of existing functionalities potentially
lead to points of contention and/or much more effort?

Regards,

Atri
-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 regression

2013-09-05 Thread KONDO Mitsumasa

(2013/09/05 0:04), Andres Freund wrote:
 I'd vote for adding zeroing *after* the fallocate() first.
+1, with FALLOC_FL_KEEP_SIZE flag.

At least, fallocate with FALLOC_FL_KEEP_SIZE flag is faster than nothing in my 
developing sorted checkpoint. I adopted it to relation file, so I don't know 
about WAL file.


With FALLOC_FL_KEEP_SIZE flag does not change file size and not do zero fill. It 
seems to only get continuous physical disk space in file system.


Here is draft patch.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index dc47c47..85b582c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3409,6 +3409,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock)
 (errcode_for_file_access(),
  errmsg(could not create file \%s\: %m, tmppath)));
 
+#if defined(HAVE_FALLOCATE)
+	fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, XLogSegSize);
+#endif
+
 	/*
 	 * Zero-fill the file.	We have to do this the hard way to ensure that all
 	 * the file space has really been allocated --- on platforms that allow

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Extending pg_stat_statements to expose queryid

2013-09-05 Thread Sameer Thakur
Hello All,

I am trying to revive the discussion about exposing queryid in
pg_stat_statements.

I did find the same request posted on hackers @
http://www.postgresql.org/message-id/CABUevExLnb6xJwS=8rTaLOfDOS-tFm09==z2m_vz5hhfkgw...@mail.gmail.com

and

http://www.postgresql.org/message-id/cacn56+nlmtwhg8eqqqnyzqe2q0negjokmgfiusk_aohw627...@mail.gmail.com

 From the discussions I concluded

1. The main use case for exposing queryid, is it being a better substitute
to hashing the query text of a pg_stat_statements snapshot, to make a
candidate key. Problems occur when hash value should be different even if
query text is same. For example when a table referred in a query is dropped
and recreated or when the query text is same on different schemas and
schema name is not included in query text.

2. Exposing queryid was proposed earlier but was not accepted. The main
reason was that queryid could be unstable as well. Since  queryid was
derived from hashing query tree and query tree could undergo changes
between minor PostgreSQL releases, meant the queryid for same query could
be different between releases, resulting in incorrect statement statistics
collection.

3. Another problem is to distinguish between queries whose statistics are
continuously maintained and queries which are intermittent, whose
statistics might be silent reset, without the reporting tool being wiser.

4. A solution to avoid misrepresentation of intermittent queries as
consistent queries would be to assign a unique number to each new row and
once that row is discarded, the unique number cannot be reused. The
drawbacks here is possible collision of unique values generated.

5. A  patch implementing solution for identifying intermittent query is @
https://github.com/fdr/postgres/branches/error-prop-pg_stat_statements-v2.

The solution avoids using a counter, and achieves the same result by the
property that intermittent queries accumulate errors due to eviction from
hashtable while consistent queries do not. Error accumulation would be the
parameter by which a reporting tool can figure out if there was eviction of
queries between snapshots.

6. To address the problem of unstable queryid generated from query tree, it
was proposed to eliminate any possible misunderstanding that queryid will
remain the same between releases, by xoring the hash from query tree with
statistics sessionid. This also helps in all cases where the statistics
file is reset like crash recovery,recovery mode, ensuring a new hash value
for reset statistics.

To avoid increasing the chance of collision, a longer session key and
padding the queryid can be done to complete the XOR. Implementation of this
is @
https://github.com/fdr/postgres/branches/pg_stat_statements-identification-v3

7. The patch pg_stat_statements-identification-v3 was returned with
feedback for more documentation in commitfest 2013-01.

Questions:

1. Is there a plan to re-introduce this patch? The code seems to be
documented.

2. There was mention of further testing of error propagation using hooks.
Could this be elaborated?

3. There was a use case that exposing queryid could be used to aggregate
statistics across WAL based replication clusters. But now that queryid is
derived from statistics session id, which is randomly generated, this use
case is still not addressed. Is this correct?

 Regards

Sameer


Re: [HACKERS] Analysis on backend-private memory usage (and a patch)

2013-09-05 Thread Heikki Linnakangas

On 04.09.2013 23:56, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

One fairly simple thing we could do is to teach catcache.c to resize the
caches. Then we could make the initial size of all the syscaches much
smaller.


I think this is attractive for the *other* reason you mention, namely
preserving reasonable performance when a catcache grows larger than
expected; but I'm pretty skeptical of nickel-and-diming caches that are
already really small.  Is it really worth cutting the TSPARSER caches
from 4 pointers to 2 for instance?


Yeah, that may be overdoing it. Then again, enlarging a hash table from 
2 to 4 entries when needed is also very cheap.



What concerns me about initially-undersized caches is that we'll waste
space and time in the enlargement process.


Enlarging a small hash tables is very cheap because, well, it's small. 
Enlarging a large hash table is more expensive, but if you have a lot of 
entries in the hash, then you also get the benefit of a larger hash when 
doing lookups. It does require some memory to hold the old and the new 
hash table while rehashing, but again, with a small hash table that's 
not significant, and with a large one the actual cached tuples take a 
lot more space than the buckets array anyway.


Per Wikipedia [1]:


If the table size increases or decreases by a fixed percentage at each 
expansion, the total cost of these resizings, amortized over all insert and 
delete operations, is still a constant, independent of the number of entries n 
and of the number m of operations performed.

For example, consider a table that was created with the minimum possible size 
and is doubled each time the load ratio exceeds some threshold. If m elements 
are inserted into that table, the total number of extra re-insertions that 
occur in all dynamic resizings of the table is at most m − 1. In other words, 
dynamic resizing roughly doubles the cost of each insert or delete operation.


Considering the amount of work involved in adding an entry to a catalog 
cache, I wouldn't worry about adding a tiny constant to the insertion time.


I did some quick testing by creating 10 tables, and running a 
pgbench script that selects randomly from them:


\setrandom tableno 1 10
select * from foo:tableno where i = 1;

I timed the rehash operations with gettimeofday calls before and after 
the rehash:



LOG:  rehashed catalog cache id 44 for pg_class from 256 to 512 buckets: 27 us
LOG:  rehashed catalog cache id 47 for pg_statistic from 128 to 256 buckets: 14 
us
LOG:  rehashed catalog cache id 44 for pg_class from 512 to 1024 buckets: 54 us
LOG:  rehashed catalog cache id 45 for pg_class from 256 to 512 buckets: 29 us
LOG:  rehashed catalog cache id 47 for pg_statistic from 256 to 512 buckets: 30 
us
LOG:  rehashed catalog cache id 44 for pg_class from 1024 to 2048 buckets: 147 
us
LOG:  rehashed catalog cache id 7 for pg_attribute from 512 to 1024 buckets: 87 
us
LOG:  rehashed catalog cache id 45 for pg_class from 512 to 1024 buckets: 80 us
LOG:  rehashed catalog cache id 47 for pg_statistic from 512 to 1024 buckets: 
88 us
LOG:  rehashed catalog cache id 44 for pg_class from 2048 to 4096 buckets: 342 
us
LOG:  rehashed catalog cache id 7 for pg_attribute from 1024 to 2048 buckets: 
197 us
LOG:  rehashed catalog cache id 45 for pg_class from 1024 to 2048 buckets: 183 
us
LOG:  rehashed catalog cache id 47 for pg_statistic from 1024 to 2048 buckets: 
194 us
LOG:  rehashed catalog cache id 44 for pg_class from 4096 to 8192 buckets: 764 
us
LOG:  rehashed catalog cache id 7 for pg_attribute from 2048 to 4096 buckets: 
401 us
LOG:  rehashed catalog cache id 45 for pg_class from 2048 to 4096 buckets: 383 
us
LOG:  rehashed catalog cache id 47 for pg_statistic from 2048 to 4096 buckets: 
406 us
LOG:  rehashed catalog cache id 44 for pg_class from 8192 to 16384 buckets: 
1758 us
LOG:  rehashed catalog cache id 7 for pg_attribute from 4096 to 8192 buckets: 
833 us
LOG:  rehashed catalog cache id 45 for pg_class from 4096 to 8192 buckets: 842 
us
LOG:  rehashed catalog cache id 47 for pg_statistic from 4096 to 8192 buckets: 
859 us
LOG:  rehashed catalog cache id 44 for pg_class from 16384 to 32768 buckets: 
3564 us
LOG:  rehashed catalog cache id 7 for pg_attribute from 8192 to 16384 buckets: 
1769 us
LOG:  rehashed catalog cache id 45 for pg_class from 8192 to 16384 buckets: 
1752 us
LOG:  rehashed catalog cache id 47 for pg_statistic from 8192 to 16384 buckets: 
1719 us
LOG:  rehashed catalog cache id 44 for pg_class from 32768 to 65536 buckets: 
7538 us
LOG:  rehashed catalog cache id 7 for pg_attribute from 16384 to 32768 buckets: 
3644 us
LOG:  rehashed catalog cache id 45 for pg_class from 16384 to 32768 buckets: 
3609 us
LOG:  rehashed catalog cache id 47 for pg_statistic from 16384 to 32768 
buckets: 3508 us
LOG:  rehashed catalog cache id 44 for pg_class from 65536 to 131072 buckets: 
16457 us
LOG:  rehashed catalog cache id 7 for pg_attribute from 

[HACKERS] Re: Is it necessary to rewrite table while increasing the scale of datatype numeric???

2013-09-05 Thread Noah Misch
On Thu, Sep 05, 2013 at 11:07:43AM +0800, wangs...@highgo.com.cn wrote:
 ??? 2013-09-04 19:30, Noah Misch ??:
 On Wed, Sep 04, 2013 at 12:08:48PM +0800, wangs...@highgo.com.cn  
 wrote:
 I find that it takes a long time when I increase the scale of a  
 numeric
 datatype.
 By checking the code, I found that's because it needs to rewrite  
 that
 table's file.
 After checking that table's data file, I found only parameter  
 n_header
 changed.
 And, I found the data in that numeric field never changed.
 So I thank It's not necessary to rewrite the table's file in this  
 case.


 Noah Misch n...@leadboat.com wrote:
 n_header is part of the numeric field's data.  That's not just  
 pedantry: the
 display scale stored in n_header affects how numeric_out() formats the 
 value.

 Thanks for your reply.

 Just because of what you said, I think increasing scale only lead to  
 differently
 diaplay. There's no difference between 5.25 and 5.2500 in use.
 So thers's no need to rewrite the table.

Right or wrong, our numeric type caters to applications that do care about the
difference between those outputs.  I grant that others do not care.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Peter Eisentraut
Can we consider getting rid of the SQL_ASCII server-side encoding?  I
don't see any good use for it, and it's often a support annoyance, and
it leaves warts all over the code.  This would presumably be a
multi-release effort.

As a first step in accommodating users who have existing SQL_ASCII
databases, we could change SQL_ASCII into a real encoding with
conversion routines to all other encodings that only convert 7-bit ASCII
characters.  That way, users who use SQL_ASCII as real ASCII or don't
care could continue to use it.  Others would be forced to either set
SQL_ASCII as the client encoding or adjust the encoding on the server.

On the client side, the default libpq client encoding SQL_ASCII would
be renamed to something like SAME or whatever, so the behavior would
stay the same.

Other ideas?  Are there legitimate uses for SQL_ASCII?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Merlin Moncure
On Thu, Sep 5, 2013 at 7:47 AM, Peter Eisentraut pete...@gmx.net wrote:
 Can we consider getting rid of the SQL_ASCII server-side encoding?  I
 don't see any good use for it, and it's often a support annoyance, and
 it leaves warts all over the code.  This would presumably be a
 multi-release effort.

 As a first step in accommodating users who have existing SQL_ASCII
 databases, we could change SQL_ASCII into a real encoding with
 conversion routines to all other encodings that only convert 7-bit ASCII
 characters.  That way, users who use SQL_ASCII as real ASCII or don't
 care could continue to use it.  Others would be forced to either set
 SQL_ASCII as the client encoding or adjust the encoding on the server.

 On the client side, the default libpq client encoding SQL_ASCII would
 be renamed to something like SAME or whatever, so the behavior would
 stay the same.

 Other ideas?  Are there legitimate uses for SQL_ASCII?

performance?

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Heikki Linnakangas

On 05.09.2013 15:47, Peter Eisentraut wrote:

Can we consider getting rid of the SQL_ASCII server-side encoding?  I
don't see any good use for it, and it's often a support annoyance, and
it leaves warts all over the code.  This would presumably be a
multi-release effort.


I think warts all over the code is an overstatement. There aren't that 
many places in the code that care about SQL_ASCII, and they're all 
related to encoding conversions.



As a first step in accommodating users who have existing SQL_ASCII
databases, we could change SQL_ASCII into a real encoding with
conversion routines to all other encodings that only convert 7-bit ASCII
characters.  That way, users who use SQL_ASCII as real ASCII or don't
care could continue to use it.  Others would be forced to either set
SQL_ASCII as the client encoding or adjust the encoding on the server.

On the client side, the default libpq client encoding SQL_ASCII would
be renamed to something like SAME or whatever, so the behavior would
stay the same.

Other ideas?  Are there legitimate uses for SQL_ASCII?


One use is if you want to use some special encoding that's not supported 
by PostgreSQL, and you want PostgreSQL to just regurgitate any strings 
as is. It's not common, but would be strange to remove that capability 
altogether, IMHO.


I agree it would be nice to have a real ASCII encoding, which only 
accepts 7-bit ASCII characters. And it would be nice if SQL_ASCII was 
called something else, like UNDEFINED or BYTE_PER_CHAR, to make the 
meaning more clear. But I'm not in favor of deprecating it altogether.


Also, during backend initialization there is a phase where 
client_encoding has not been set yet, and we don't do any conversions 
yet. That's exactly what SQL_ASCII means, so even if we get rid of 
SQL_ASCII, we'd still need to have some encoding value in the backend to 
mean that intermediate state.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread k...@rice.edu
On Thu, Sep 05, 2013 at 08:47:32AM -0400, Peter Eisentraut wrote:
 Can we consider getting rid of the SQL_ASCII server-side encoding?  I
 don't see any good use for it, and it's often a support annoyance, and
 it leaves warts all over the code.  This would presumably be a
 multi-release effort.
 
 As a first step in accommodating users who have existing SQL_ASCII
 databases, we could change SQL_ASCII into a real encoding with
 conversion routines to all other encodings that only convert 7-bit ASCII
 characters.  That way, users who use SQL_ASCII as real ASCII or don't
 care could continue to use it.  Others would be forced to either set
 SQL_ASCII as the client encoding or adjust the encoding on the server.
 
 On the client side, the default libpq client encoding SQL_ASCII would
 be renamed to something like SAME or whatever, so the behavior would
 stay the same.
 
 Other ideas?  Are there legitimate uses for SQL_ASCII?
 
Hi Peter,

Yes, we have processes that insert data from a large number of locales
into the same database and we need to process the information in a locale
agnostic way, just a a range of bytes. Not to mention how much faster it
can be.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for XML Schema Validation

2013-09-05 Thread Kodamasimham Pridhvi (MT2012066)
On  2013-08-09 14:14:34, Craig Ringer wrote:
Well, if you're planning on relying on libxml in core (and it'll have to
be in core if you're adding new syntax) then you'll need a solid, well
researched answer to that one or an alternative XML library that's
portable and doesn't have those issues.

 After doing some research regarding this issue and going through mailing list, 
we were unable to find a fix for memory management issue with libxml when used 
with postgresql can you please suggest us any alternative to libxml or redirect 
to some resources.



Thanks and Regrads,
Pridhvi and Vikranthsingh,
IIITB

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST with multiple args, and TABLE with multiple funcs

2013-09-05 Thread Noah Misch
On Wed, Aug 28, 2013 at 12:09:05AM -0400, Peter Eisentraut wrote:
 On Tue, 2013-08-27 at 09:44 -0400, Tom Lane wrote:
  Boszormenyi Zoltan z...@cybertec.at writes:
   When adding regression tests, can you please add intentional
   syntax error cases to exercise all the new ereport()s?
  
  Please do not add test cases merely to prove that.  Yeah, you should
  probably have exercised each error case in devel testing, but that does
  not mean that every future run of the regression tests needs to do it too.
 
 I disagree.  The next person who wants to hack on this feature should be
 given the confidence that he's not breaking behavior that the last guy
 put in.

+1.  I wouldn't make full error-outcome test coverage a condition of patch
acceptance.  However, when an author chooses to submit high-quality tests with
that level of detail, our source tree is the place to archive them.  I share
Tom's desire for a Makefile target that completes quickly and checks only
those behaviors most likely to break, but not at the cost of letting deep test
coverage dissipate in a mailing list attachment or in the feature author's
home directory.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Analysis on backend-private memory usage (and a patch)

2013-09-05 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 I ran pgbench for ten seconds, and printed the number of tuples in each 
 catcache after that:
 [ very tiny numbers ]

I find these numbers a bit suspicious.  For example, we must have hit at
least 13 different system catalogs, and more than that many indexes, in
the course of populating the syscaches you show as initialized.  How is
it there are only 4 entries in the RELOID cache?  I wonder if there were
cache resets going on.

A larger issue is that pgbench might not be too representative.  In
a quick check, I find that cache 37 (OPERNAMENSP) starts out empty,
and contains 1 entry after select 2=2, which is expected since 
the operator-lookup code will start by looking for int4 = int4 and
will get an exact match.  But after select 2=2::numeric there are
61 entries, as a byproduct of having thumbed through every binary
operator named = to resolve the ambiguous match.  We went so far
as to install another level of caching in front of OPERNAMENSP because
it was getting too expensive to deal with heavily-overloaded operators
like that one.  In general, we've had to spend enough sweat on optimizing
catcache searches to make me highly dubious of any claim that the caches
are usually almost empty.

I understand your argument that resizing is so cheap that it might not
matter, but nonetheless reducing these caches as far as you're suggesting
sounds to me to be penny-wise and pound-foolish.  I'm okay with setting
them on the small side rather than on the large side as they are now, but
not with choosing sizes that are guaranteed to result in resizing cycles
during startup of any real app.

 PS. Once the hashes are resized on demand, perhaps we should get rid of 
 the move-to-the-head behavior in SearchCatCache. If all the buckets 

-1.  If the bucket in fact has just one member, dlist_move_head reduces to
just one comparison.  And again I argue that you're optimizing for the
wrong case.  Pure luck will result in some hash chains being (much) longer
than the average, and if we don't do move-to-front we'll get hurt there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Alvaro Herrera
Joshua D. Drake wrote:
 
 On 09/05/2013 09:42 AM, Josh Berkus wrote:

 Other ideas?  Are there legitimate uses for SQL_ASCII?
 
 Migrating from MySQL.  We've had some projects where we couldn't fix
 MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
 receiving side.  If it hadn't been available, the user would have given
 up on Postgres.
 
 iconv?

Command Prompt helped a customer normalize encodings in their data,
which was a mixture of Latin1 and UTF8.  PGLoader was used for this, in
two stages; the first run in UTF8 saved the rejected data to a file
which was loaded in the second run as Latin1.  This worked like a charm.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread Alvaro Herrera
arthernan wrote:
 OidFunctionCall7 is called from costsize.c, in function void cost_index. From
 the comments, I understand that there is an indirection mechanism here to
 allow for different index access methods. Looking at fmgr.h where it is
 declared is not clear to me where I would ultimately find the code.
 
 Any pointers will be greatly appreciated

The code you're really looking for is the amcostestimate function of the
AM that the index uses.  So if it's a btree you want btcostestimate, for
instance.  You can see the AM definitions in the pg_am system catalog
and src/include/catalog/pg_am.h.

   /*
* Call index-access-method-specific code to estimate the processing 
 cost
* for scanning the index, as well as the selectivity of the index (ie,
* the fraction of main-table tuples we will have to retrieve) and its
* correlation to the main-table tuple order.
*/
   OidFunctionCall7(index-amcostestimate,
PointerGetDatum(root),
PointerGetDatum(path),
Float8GetDatum(loop_count),
PointerGetDatum(indexStartupCost),
PointerGetDatum(indexTotalCost),
PointerGetDatum(indexSelectivity),
PointerGetDatum(indexCorrelation));


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote:
 I have developed the attached patch which implements an auto-tuned
 effective_cache_size which is 4x the size of shared buffers.  I had to
 set effective_cache_size to its old 128MB default so the EXPLAIN
 regression tests would pass unchanged.

 That's not really autotuning though. ISTM that making the *default* 4
 x shared_buffers might make perfect sense, but do we really need to
 hijack the value of -1 for that? That might be useful for some time
 when we have actual autotuning, that somehow inspects the system and
 tunes it from there.

Well, the real problem with this patch is that it documents what the
auto-tuning algorithm is; without that commitment, just saying -1 means
autotune might be fine.

Did you consider the alternative of just tweaking initdb to insert a
default for effective_cache_size that's 4x whatever it picks for
shared_buffers?  That would probably be about 3 lines of code, and it
wouldn't nail down any particular server-side behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT...ON DUPLICATE KEY IGNORE

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 8:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 You've proposed an framework and algorithm for something I'd really, really 
 like to see. I don't think that it can work explicitly as you proposed, so I 
 roughly sketched out a solution I can see.
 I don't want my idea to win, I want a idea to win. I haven't fleshed out my 
 idea to the point where I would consider it something ready to implement or 
 something.
 You're the patch author here whose plans are laid open to be scrutinized ;). 
 If you think my idea has merit, use and adapt it to reality. If not, find 
 another, better, solution.

 Even if our path to that goal is confrontational at times, the goal is to 
 find a good solution, not the argument itself.

Totally agreed.

 I haven't argued about INSERT ... DUPLICATE LOCK because the page locking 
 scheme doesn't seem to work out for plain DUPLICATE. No need to think/argue 
 about the fancier version in that case.

Check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic shared memory

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 6:38 PM, Jim Nasby j...@nasby.net wrote:
 No, I meant something that would live as long as the postmaster and
 die when it dies.

 ISTM that at some point we'll want to look at putting top-level shared
 memory into this system (ie: allowing dynamic resizing of GUCs that affect
 shared memory size).

A lot of people want that, but being able to resize the shared memory
chunk itself is only the beginning of the problem.  So I wouldn't hold
my breath.

 Wouldn't it protect against a crash while writing the file? I realize the
 odds of that are pretty remote, but AFAIK it wouldn't cost that much to
 write a new file and do an atomic mv...

If there's an OS-level crash, we don't need the state file; the shared
memory will be gone anyway.  And if it's a PostgreSQL-level failure,
this game neither helps nor hurts.

 Sure.  A messed-up backend can clobber the control segment just as it
 can clobber anything else in shared memory.  There's really no way
 around that problem.  If the control segment has been overwritten by a
 memory stomp, we can't use it to clean up.  There's no way around that
 problem except to not the control segment, which wouldn't be better.

 Are we trying to protect against memory stomps when we restart after a
 backend dies? I thought we were just trying to ensure that all shared data
 structures were correct and consistent. If that's the case, then I was
 thinking that by using a pointer that can be updated in a CPU-atomic fashion
 we know we'd never end up with a corrupted entry that was in use; the
 partial write would be to a slot with nothing pointing at it so it could be
 safely reused.

When we restart after a backend dies, shared memory contents are
completely reset, from scratch.  This is true of both the fixed size
shared memory segment and of the dynamic shared memory control
segment.  The only difference is that, with the dynamic shared memory
control segment, we need to use the segment for cleanup before
throwing it out and starting over.  Extra caution is required because
we're examining memory that could hypothetically have been stomped on;
we must not let the postmaster do anything suicidal.

 Should dsm_impl_op sanity check the arguments after op? I didn't notice
 checks in the type-specific code but I also didn't read all of it... are
 we
 just depending on the OS to sanity-check?

 Sanity-check for what?

 Presumably there's limits to what the arguments can be rationally set to.
 IIRC there's nothing down-stream that's checking them in our code, so I'm
 guessing we're just depending on the kernel to sanity-check.

Pretty much.  It's possible more thought is needed there, but the
shape of those additional thoughts is not clear to me at this time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange IS NULL behaviour

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Sep  3, 2013 at 09:32:44PM -0400, Bruce Momjian wrote:
 In this test, SELECT NULL (which internally would produce SELECT
 ROW(NULL)), returns TRUE, while SELECT ROW(NULL) and further nesting
 returns false.

 This has made me adjust my goal and change it so SELECT ROW(NULL) IS
 NULL returns true, and any further nesting returns false.

 Attached is a patch which accomplishes this, and a documentation update.

 I have not heard any feedback on this patch, so I would like to apply it
 to give us a nested ROW/IS NULL API we can document.  It would have to
 be marked in the release notes as a backward incompatibility.

I don't have time to look at this in detail right now, but I think
that's considerably premature.  I'm not convinced that we understand
all of the problems in this area are yet, let alone the solutions.
And I notice that you haven't substantively responded to some of Tom's
concerns.

So -1 from me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improve Chinese locale performance

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 11:02 PM, Quan Zongliang quanzongli...@gmail.com wrote:
 I think of a new idea.
 Add a compare method column to pg_collation.
 Every collation has its own compare function or null.
 When function varstr_cmp is called, if specified collation
 has compare function, call it instead of strcoll().

I think we're going to need to have two kinds of collations:
OS-derived collations (which get all of their smarts from the OS), and
PG-internal collations (which use PG-aware code for everything).
Which I suspect is a bit more involved than what you're imagining, but
mixing and matching doesn't seem likely to end well.

However, what you're proposing might serve as a useful demonstration
of how much performance there is to be gained here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Magnus Hagander
On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote:
 On Tue, Jan  8, 2013 at 08:40:44PM -0500, Andrew Dunstan wrote:

 On 01/08/2013 08:08 PM, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 8, 2013 at 7:17 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 ...  And I don't especially like the idea of trying to
 make it depend directly on the box's physical RAM, for the same
 practical reasons Robert mentioned.
 For the record, I don't believe those problems would be particularly
 hard to solve.
 Well, the problem of find out the box's physical RAM is doubtless
 solvable if we're willing to put enough sweat and tears into it, but
 I'm dubious that it's worth the trouble.  The harder part is how to know
 if the box is supposed to be dedicated to the database.  Bear in mind
 that the starting point of this debate was the idea that we're talking
 about an inexperienced DBA who doesn't know about any configuration knob
 we might provide for the purpose.
 
 I'd prefer to go with a default that's predictable and not totally
 foolish --- and some multiple of shared_buffers seems like it'd fit the
 bill.

 +1. That seems to be by far the biggest bang for the buck. Anything
 else will surely involve a lot more code for not much more benefit.

 I have developed the attached patch which implements an auto-tuned
 effective_cache_size which is 4x the size of shared buffers.  I had to
 set effective_cache_size to its old 128MB default so the EXPLAIN
 regression tests would pass unchanged.

That's not really autotuning though. ISTM that making the *default* 4
x shared_buffers might make perfect sense, but do we really need to
hijack the value of -1 for that? That might be useful for some time
when we have actual autotuning, that somehow inspects the system and
tunes it from there.

I also don't think it should be called autotuning, when it's just a
smarter default value.

I like the feature, though, just not the packaging.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Andres Freund
On 2013-09-05 12:44:18 -0400, Robert Haas wrote:
 On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com wrote:
  On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
   0002 wal_decoding: Introduce InvalidCommandId and declare that to be the 
   new maximum for CommandCounterIncrement
 
  I'm still unconvinced we want this.
 
  Ok, so the reason for the existance of this patch is that currently
  there is no way to represent a unset CommandId. This is a problem for
  the following patches because we need to log the cmin, cmax of catalog
  rows and obviously there can be rows where cmax is unset.
 
 For heap tuples, we solve this problem by using flag bits.  Why not
 adopt the same approach?

We can, while it makes the amount of data stored/logged slightly larger
and it seems to lead to less idiomatic code to me, so if there's another
-1 I'll go that way.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Robert Haas
On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
  0002 wal_decoding: Introduce InvalidCommandId and declare that to be the 
  new maximum for CommandCounterIncrement

 I'm still unconvinced we want this.

 Ok, so the reason for the existance of this patch is that currently
 there is no way to represent a unset CommandId. This is a problem for
 the following patches because we need to log the cmin, cmax of catalog
 rows and obviously there can be rows where cmax is unset.

For heap tuples, we solve this problem by using flag bits.  Why not
adopt the same approach?

 The reason I chose to change the definition of CommandIds is that the
 other ondisk types we use like TransactionIds, XLogRecPtrs and such have
 an invalid type, CommandIds don't. Changing their definition to have 0
 - analogous to the previous examples - as their invalid value is not a
 problem because CommandIds from pg_upgraded clusters may never be used
 for anything. Going from 2^32 to 2^32-1 possible CommandIds doesn't seem
 like a problem to me. Imo the CommandIds should have been defined that
 way from the start.

 Makes some sense?

I don't have a problem with this if other people think it's a good
idea.  But I think it needs a few +1s and not too many -1s first, and
so far (AFAIK) no one else has weighed in with an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Josh Berkus
Peter,

 Other ideas?  Are there legitimate uses for SQL_ASCII?

Migrating from MySQL.  We've had some projects where we couldn't fix
MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
receiving side.  If it hadn't been available, the user would have given
up on Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] logical changeset generation v5

2013-09-05 Thread Andres Freund
Hi,

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
 On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  0005 wal_decoding: Log xl_running_xact's at a higher frequency than 
  checkpoints are done
  * benefits hot standby startup

I tried to update the patch to address the comments you made.

  0003 wal_decoding: Allow walsender's to connect to a specific database
  * biggest problem is how to specify the connection we connect
to. Currently with the patch walsender connects to a database if it's
not named replication (via dbname). Perhaps it's better to invent a
replication_dbname parameter?

I've updated the patch so it extends the replication startup parameter
to not only specify a boolean but also database. In the latter case it
will connect to the database specified in dbname.
As discussed downthread, this patch doesn't have an immediate advantage
for users until the changeset extraction patch itself is
applied. Whether or not it should be applied separately is unclear.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 2aa39548f5990e9663e95f011f25a89a0dc8d8a1 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 2/9] wal_decoding: Allow walsender's to connect to a specific
 database

Extend the existing 'replication' parameter to not only allow a boolean value
but also database. If the latter is specified we connect to the database
specified in 'dbname'.

This is useful for future walsender commands which need database interaction,
e.g. changeset extraction.
---
 doc/src/sgml/protocol.sgml | 24 +---
 src/backend/postmaster/postmaster.c| 23 ++--
 .../libpqwalreceiver/libpqwalreceiver.c|  4 +-
 src/backend/replication/walsender.c| 43 +++---
 src/backend/utils/init/postinit.c  |  5 +++
 src/bin/pg_basebackup/pg_basebackup.c  |  4 +-
 src/bin/pg_basebackup/pg_receivexlog.c |  4 +-
 src/bin/pg_basebackup/receivelog.c |  4 +-
 src/include/replication/walsender.h|  1 +
 9 files changed, 89 insertions(+), 23 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 0b2e60e..2ea14e5 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1301,10 +1301,13 @@
 
 para
 To initiate streaming replication, the frontend sends the
-literalreplication/ parameter in the startup message. This tells the
-backend to go into walsender mode, wherein a small set of replication commands
-can be issued instead of SQL statements. Only the simple query protocol can be
-used in walsender mode.
+literalreplication/ parameter in the startup message. A boolean value
+of literaltrue/ tells the backend to go into walsender mode, wherein a
+small set of replication commands can be issued instead of SQL statements. Only
+the simple query protocol can be used in walsender mode.
+Passing a literaldatabase/ as the value instructs walsender to connect to
+the database specified in the literaldbname/ paramter which will in future
+allow some additional commands to the ones specified below to be run.
 
 The commands accepted in walsender mode are:
 
@@ -1314,7 +1317,7 @@ The commands accepted in walsender mode are:
 listitem
  para
   Requests the server to identify itself. Server replies with a result
-  set of a single row, containing three fields:
+  set of a single row, containing four fields:
  /para
 
  para
@@ -1356,6 +1359,17 @@ The commands accepted in walsender mode are:
   /listitem
   /varlistentry
 
+  varlistentry
+  term
+   dbname
+  /term
+  listitem
+  para
+   Database connected to or NULL.
+  /para
+  /listitem
+  /varlistentry
+
   /variablelist
  /para
 /listitem
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 01d2618..a31b01d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1894,10 +1894,21 @@ retry1:
 port-cmdline_options = pstrdup(valptr);
 			else if (strcmp(nameptr, replication) == 0)
 			{
-if (!parse_bool(valptr, am_walsender))
+/*
+ * Due to backward compatibility concerns replication is a
+ * bybrid beast which allows the value to be either a boolean
+ * or the string 'database'. The latter connects to a specific
+ * database which is e.g. required for changeset extraction.
+ */
+if (strcmp(valptr, database) == 0)
+{
+	am_walsender = true;
+	am_db_walsender = true;
+}
+else if (!parse_bool(valptr, am_walsender))
 	ereport(FATAL,
 			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-			 errmsg(invalid value for boolean option \replication\)));
+		

Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread k...@rice.edu
On Thu, Sep 05, 2013 at 09:42:17AM -0700, Josh Berkus wrote:
 Peter,
 
  Other ideas?  Are there legitimate uses for SQL_ASCII?
 
 Migrating from MySQL.  We've had some projects where we couldn't fix
 MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
 receiving side.  If it hadn't been available, the user would have given
 up on Postgres.
 
+++1  :)

Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Josh Berkus
On 09/05/2013 10:02 AM, Alvaro Herrera wrote:
 Joshua D. Drake wrote:
 iconv?
 
 Command Prompt helped a customer normalize encodings in their data,
 which was a mixture of Latin1 and UTF8.  PGLoader was used for this, in
 two stages; the first run in UTF8 saved the rejected data to a file
 which was loaded in the second run as Latin1.  This worked like a charm.

There's certainly alternatives.  But all of the alternatives increase
the cost of the migration (either in staff time or in downtime), which
increases the likelyhood that the organization will abandon the migration.

Anyway, I think we've established that there are enough legitimate
uses for SQL_ASCII that we can't casually discard it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread k...@rice.edu
On Thu, Sep 05, 2013 at 09:53:18AM -0700, Joshua D. Drake wrote:
 
 On 09/05/2013 09:42 AM, Josh Berkus wrote:
 
 Peter,
 
 Other ideas?  Are there legitimate uses for SQL_ASCII?
 
 Migrating from MySQL.  We've had some projects where we couldn't fix
 MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
 receiving side.  If it hadn't been available, the user would have given
 up on Postgres.
 
 iconv?
 

Yes, you can use iconv but then you have to check that it generated
values that do not break your system including the application logic.
That can prove a major stumbling block to changing DBs.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread arthernan
OidFunctionCall7 is called from costsize.c, in function void cost_index. From
the comments, I understand that there is an indirection mechanism here to
allow for different index access methods. Looking at fmgr.h where it is
declared is not clear to me where I would ultimately find the code.

Any pointers will be greatly appreciated

/*
 * Call index-access-method-specific code to estimate the processing 
cost
 * for scanning the index, as well as the selectivity of the index (ie,
 * the fraction of main-table tuples we will have to retrieve) and its
 * correlation to the main-table tuple order.
 */
OidFunctionCall7(index-amcostestimate,
 PointerGetDatum(root),
 PointerGetDatum(path),
 Float8GetDatum(loop_count),
 PointerGetDatum(indexStartupCost),
 PointerGetDatum(indexTotalCost),
 PointerGetDatum(indexSelectivity),
 PointerGetDatum(indexCorrelation));



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread arthernan
Thank you guys!



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769753.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Analysis on backend-private memory usage (and a patch)

2013-09-05 Thread Heikki Linnakangas

On 05.09.2013 17:22, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

I ran pgbench for ten seconds, and printed the number of tuples in each
catcache after that:
[ very tiny numbers ]


I find these numbers a bit suspicious.  For example, we must have hit at
least 13 different system catalogs, and more than that many indexes, in
the course of populating the syscaches you show as initialized.  How is
it there are only 4 entries in the RELOID cache?  I wonder if there were
cache resets going on.


Relcache is loaded from the init file. The lookups of those system 
catalogs and indexes never hit the syscache, because the entries are 
found in relcache. When I delete the init file and launch psql, without 
running any queries, I get this (caches with 0 tups left out):


LOG:  cache id 45 on pg_class: 7 tups
LOG:  cache id 32 on pg_index: 63 tups
LOG:  cache id 21 on pg_database: 1 tups
LOG:  cache id 11 on pg_authid: 1 tups
LOG:  cache id 10 on pg_authid: 1 tups
LOG:  cache id 2 on pg_am: 1 tups


A larger issue is that pgbench might not be too representative.  In
a quick check, I find that cache 37 (OPERNAMENSP) starts out empty,
and contains 1 entry after select 2=2, which is expected since
the operator-lookup code will start by looking for int4 = int4 and
will get an exact match.  But after select 2=2::numeric there are
61 entries, as a byproduct of having thumbed through every binary
operator named = to resolve the ambiguous match.  We went so far
as to install another level of caching in front of OPERNAMENSP because
it was getting too expensive to deal with heavily-overloaded operators
like that one.  In general, we've had to spend enough sweat on optimizing
catcache searches to make me highly dubious of any claim that the caches
are usually almost empty.

I understand your argument that resizing is so cheap that it might not
matter, but nonetheless reducing these caches as far as you're suggesting
sounds to me to be penny-wise and pound-foolish.  I'm okay with setting
them on the small side rather than on the large side as they are now, but
not with choosing sizes that are guaranteed to result in resizing cycles
during startup of any real app.


Ok, committed the attached.

To choose the initial sizes, I put a WARNING into the rehash function, 
ran the regression suite, and adjusted the sizes so that most regression 
tests run without rehashing. With the attached patch, 18 regression 
tests cause rehashing (see regression.diffs). The ones that do are 
because they are exercising some parts of the system more than a typical 
application would do: the enum regression test for example causes 
rehashes of the pg_enum catalog cache, and the aggregate regression test 
causes rehashing of pg_aggregate, and so on. A few regression tests do a 
database-wide VACUUM or ANALYZE; those touch all relations, and cause a 
rehash of pg_class and pg_index.


- Heikki
diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index cca0572..c467f11 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -728,21 +728,20 @@ InitCatCache(int id,
 			 int nkeys,
 			 const int *key,
 			 int nbuckets)
 {
 	CatCache   *cp;
 	MemoryContext oldcxt;
 	int			i;
 
 	/*
-	 * nbuckets is the number of hash buckets to use in this catcache.
-	 * Currently we just use a hard-wired estimate of an appropriate size for
-	 * each cache; maybe later make them dynamically resizable?
+	 * nbuckets is the initial number of hash buckets to use in this catcache.
+	 * It will be enlarged later if it becomes too full.
 	 *
 	 * nbuckets must be a power of two.  We check this via Assert rather than
 	 * a full runtime check because the values will be coming from constant
 	 * tables.
 	 *
 	 * If you're confused by the power-of-two check, see comments in
 	 * bitmapset.c for an explanation.
 	 */
 	Assert(nbuckets  0  (nbuckets  -nbuckets) == nbuckets);
@@ -769,19 +768,20 @@ InitCatCache(int id,
 		on_proc_exit(CatCachePrintStats, 0);
 #endif
 	}
 
 	/*
 	 * allocate a new cache structure
 	 *
 	 * Note: we rely on zeroing to initialize all the dlist headers correctly
 	 */
-	cp = (CatCache *) palloc0(sizeof(CatCache) + nbuckets * sizeof(dlist_head));
+	cp = (CatCache *) palloc0(sizeof(CatCache));
+	cp-cc_bucket = palloc0(nbuckets * sizeof(dlist_head));
 
 	/*
 	 * initialize the cache's relation information for the relation
 	 * corresponding to this cache, and initialize some of the new cache's
 	 * other internal fields.  But don't open the relation yet.
 	 */
 	cp-id = id;
 	cp-cc_relname = (not known yet);
 	cp-cc_reloid = reloid;
@@ -808,18 +808,55 @@ InitCatCache(int id,
 	/*
 	 * back to the old context before we return...
 	 */
 	MemoryContextSwitchTo(oldcxt);
 
 	return cp;
 }
 
 /*
+ * Enlarge a catcache, doubling the number of buckets.
+ */
+static void
+RehashCatCache(CatCache *cp)
+{
+	dlist_head *newbucket;
+	int			newnbuckets;
+	int			i;
+
+	

Re: [HACKERS] get rid of SQL_ASCII?

2013-09-05 Thread Joshua D. Drake


On 09/05/2013 09:42 AM, Josh Berkus wrote:


Peter,


Other ideas?  Are there legitimate uses for SQL_ASCII?


Migrating from MySQL.  We've had some projects where we couldn't fix
MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
receiving side.  If it hadn't been available, the user would have given
up on Postgres.


iconv?






--
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
   a rose in the deeps of my heart. - W.B. Yeats


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] missing SSI check in heapgettup_pagemode() ?

2013-09-05 Thread Alvaro Herrera
I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and
noticed that the former calls CheckForSerializableConflictOut() for
tuples read, and the latter doesn't.  Is there a reason for this
discrepancy?  If so, I think it warrants a comment about the missing
check.

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - primary/candidate key in relcache

2013-09-05 Thread Andres Freund
Hi Kevin,

On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
 On Fri, Aug 30, 2013 at 11:19 AM, Andres Freund and...@2ndquadrant.com 
 wrote:
  0007 wal_decoding: Add information about a tables primary key to struct 
  RelationData
  * Could be used in the matview refresh code

 I think you and Kevin should discuss whether this is actually the
 right way to do this.  ISTM that if logical replication and
 materialized views end up selecting different approaches to this
 problem, everybody loses.

The patch we're discussion here adds a new struct RelationData field
called 'rd_primary' (should possibly be renamed) which contains
information about the best candidate key available for a table.

From the header comments:
/*
 * The 'best' primary or candidate key that has been found, only set
 * correctly if RelationGetIndexList has been called/rd_indexvalid  0.
 *
 * Indexes are chosen in the following order:
 * * Primary Key
 * * oid index
 * * the first (OID order) unique, immediate, non-partial and
 *   non-expression index over one or more NOT NULL'ed columns
 */
Oid rd_primary;

I thought we could use that in matview.c:refresh_by_match_merge() to
select a more efficient diff if rd_primary has a valid index. In that
case you only'd need to compare that index's fields which should result
in an more efficient plan.

Maybe it's also useful in other cases for you?

If it's relevant at all, would you like to have a different priority
list than the one above?

Regards,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From ee85b3bd8d8cc25fa547c004f6f6ea6bccef7c66 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH 4/9] wal_decoding: Add information about a tables primary key
 to struct RelationData

'rd_primary' now contains the Oid of an index over uniquely identifying
columns. Several types of indexes are interesting and are collected in that
order:
* Primary Key
* oid index
* the first (OID order) unique, immediate, non-partial and
  non-expression index over one or more NOT NULL'ed columns

To gather rd_primary value RelationGetIndexList() needs to have been called.

This is helpful because for logical replication we frequently - on the sending
and receiving side - need to lookup that index and RelationGetIndexList already
gathers all the necessary information.

This could be used to replace tablecmd.c's transformFkeyGetPrimaryKey, but
would change the meaning of that, so it seems to require additional discussion.
---
 src/backend/utils/cache/relcache.c | 52 +++---
 src/include/utils/rel.h| 12 +
 2 files changed, 61 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66fb63b..c588c29 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3465,7 +3465,9 @@ RelationGetIndexList(Relation relation)
 	ScanKeyData skey;
 	HeapTuple	htup;
 	List	   *result;
-	Oid			oidIndex;
+	Oid			oidIndex = InvalidOid;
+	Oid			pkeyIndex = InvalidOid;
+	Oid			candidateIndex = InvalidOid;
 	MemoryContext oldcxt;
 
 	/* Quick exit if we already computed the list. */
@@ -3522,17 +3524,61 @@ RelationGetIndexList(Relation relation)
 		Assert(!isnull);
 		indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+		if (!IndexIsValid(index))
+			continue;
+
 		/* Check to see if it is a unique, non-partial btree index on OID */
-		if (IndexIsValid(index) 
-			index-indnatts == 1 
+		if (index-indnatts == 1 
 			index-indisunique  index-indimmediate 
 			index-indkey.values[0] == ObjectIdAttributeNumber 
 			indclass-values[0] == OID_BTREE_OPS_OID 
 			heap_attisnull(htup, Anum_pg_index_indpred))
 			oidIndex = index-indexrelid;
+
+		if (index-indisunique 
+			index-indimmediate 
+			heap_attisnull(htup, Anum_pg_index_indpred))
+		{
+			/* always prefer primary keys */
+			if (index-indisprimary)
+pkeyIndex = index-indexrelid;
+			else if (!OidIsValid(pkeyIndex)
+	 !OidIsValid(oidIndex)
+	 !OidIsValid(candidateIndex))
+			{
+int key;
+bool found = true;
+for (key = 0; key  index-indnatts; key++)
+{
+	int16 attno = index-indkey.values[key];
+	Form_pg_attribute attr;
+	/* internal column, like oid */
+	if (attno = 0)
+		continue;
+
+	attr = relation-rd_att-attrs[attno - 1];
+	if (!attr-attnotnull)
+	{
+		found = false;
+		break;
+	}
+}
+if (found)
+	candidateIndex = index-indexrelid;
+			}
+		}
 	}
 
 	systable_endscan(indscan);
+
+	if (OidIsValid(pkeyIndex))
+		relation-rd_primary = pkeyIndex;
+	/* prefer oid indexes over normal candidate ones */
+	else if (OidIsValid(oidIndex))
+		relation-rd_primary = oidIndex;
+	else if (OidIsValid(candidateIndex))
+		

Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Robert Haas
On Thu, Sep 5, 2013 at 12:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-05 12:44:18 -0400, Robert Haas wrote:
 On Wed, Sep 4, 2013 at 12:07 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2013-09-03 11:40:57 -0400, Robert Haas wrote:
   0002 wal_decoding: Introduce InvalidCommandId and declare that to be 
   the new maximum for CommandCounterIncrement
 
  I'm still unconvinced we want this.
 
  Ok, so the reason for the existance of this patch is that currently
  there is no way to represent a unset CommandId. This is a problem for
  the following patches because we need to log the cmin, cmax of catalog
  rows and obviously there can be rows where cmax is unset.

 For heap tuples, we solve this problem by using flag bits.  Why not
 adopt the same approach?

 We can, while it makes the amount of data stored/logged slightly larger
 and it seems to lead to less idiomatic code to me, so if there's another
 -1 I'll go that way.

OK.  Consider me more of a -0 than a -1.  Like I say, I don't really
want to block it; I just don't feel comfortable committing it unless a
few other people say something like I don't see a problem with that.
 Or maybe point me to relevant changeset extraction code that's going
to get messier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?

2013-09-05 Thread Jeff Janes
On Wed, Sep 4, 2013 at 10:06 PM,  wangs...@highgo.com.cn wrote:
 于 2013-09-04 23:41, Jeff Janes 回复:

 On Tue, Sep 3, 2013 at 9:08 PM,  wangs...@highgo.com.cn wrote:

 Hi, Hackers!

 I find that it takes a long time when I increase the scale of a numeric
 datatype.
 By checking the code, I found that's because it needs to rewrite that
 table's file.
 After checking that table's data file, I found only parameter n_header
 changed.
 And, I found the data in that numeric field never changed.
 So I thank It's not necessary to rewrite the table's file in this case.

 Anyone has more idea about this, please come to talk about this!



 Jeff Janes jeff.ja...@gmail.com wrote:

 This was fixed in version 9.2.  You must be using an older version.

 Cheers,

 Jeff


 Thanks for your reply.

 To declare a column of type numeric use the syntax:
 NUMERIC(precision, scale).
 What I said is this scale,not yours.

You're right, I had tested a change in precision, not in scale.  Sorry.

In order to avoid the rewrite, the code would have to be changed to
look up the column definition and if it specifies the scale, then
ignore the per-row n_header, and look at the n_header only if the
column is NUMERIC with no precision or scale.  That should
conceptually be possible, but I don't know how hard it would be to
implement--it sounds pretty invasive to me.  Then if the column was
altered from NUMERIC with scale to be a plain NUMERIC, it would have
to rewrite the table to enforce the row-wise scale to match the old
column-wise scale.  Where as now that alter doesn't need a re-write.
I don't know if this would be an overall gain or not.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread Robert Haas
On Thu, Sep 5, 2013 at 12:38 PM, arthernan arther...@hotmail.com wrote:
 OidFunctionCall7 is called from costsize.c, in function void cost_index. From
 the comments, I understand that there is an indirection mechanism here to
 allow for different index access methods. Looking at fmgr.h where it is
 declared is not clear to me where I would ultimately find the code.

 Any pointers will be greatly appreciated

Use 'git grep'.  It's a macro defined in fmgr.h.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing SSI check in heapgettup_pagemode() ?

2013-09-05 Thread Heikki Linnakangas

On 05.09.2013 20:30, Alvaro Herrera wrote:

I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and
noticed that the former calls CheckForSerializableConflictOut() for
tuples read, and the latter doesn't.  Is there a reason for this
discrepancy?  If so, I think it warrants a comment about the missing
check.


In page-at-a-time mode, the CheckForSerializableConflictOut() is done in 
heapgetpage(). Same with the visibility checks.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange IS NULL behaviour

2013-09-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote:
 I have not heard any feedback on this patch, so I would like to apply it
 to give us a nested ROW/IS NULL API we can document.  It would have to
 be marked in the release notes as a backward incompatibility.

 I don't have time to look at this in detail right now, but I think
 that's considerably premature.  I'm not convinced that we understand
 all of the problems in this area are yet, let alone the solutions.
 And I notice that you haven't substantively responded to some of Tom's
 concerns.

In particular, I don't think it's a good idea to change
eval_const_expression's behavior in an incompatible way that simply
makes it differently inconsistent with other IS NULL code paths.
We should leave things alone until we have a full fix, otherwise we'll
just be breaking people's apps repeatedly.

I would also say that allowing eval_const_expression to drive what we
think the right behavior is is completely backwards, because it's
about the least performance-critical case.  You should be looking at
execQual.c first.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 OK.  Consider me more of a -0 than a -1.  Like I say, I don't really
 want to block it; I just don't feel comfortable committing it unless a
 few other people say something like I don't see a problem with that.

FWIW, I've always thought it was a wart that there wasn't a recognized
InvalidCommandId value.  It was never pressing to fix it before, but
if LCR needs it, let's do so.  I definitely *don't* find it cleaner to
eat up another flag bit to avoid that.  We don't have many to spare.

Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
but I suppose we can't have that without an on-disk compatibility break.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing SSI check in heapgettup_pagemode() ?

2013-09-05 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 I was scrolling heapgettup() and heapgettup_pagemode() in parallel, and
 noticed that the former calls CheckForSerializableConflictOut() for
 tuples read, and the latter doesn't.  Is there a reason for this
 discrepancy?  If so, I think it warrants a comment about the missing
 check.

That is covered in heapgetpage().

Do you want to offer a patch to add a comment to that effect, or
would you prefer that I do?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Andres Freund
Hi,

Thanks for weighin in.

On 2013-09-05 14:21:33 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  OK.  Consider me more of a -0 than a -1.  Like I say, I don't really
  want to block it; I just don't feel comfortable committing it unless a
  few other people say something like I don't see a problem with that.
 
 FWIW, I've always thought it was a wart that there wasn't a recognized
 InvalidCommandId value.  It was never pressing to fix it before, but
 if LCR needs it, let's do so.

Yes, its a bit anomalous to the other types.

  I definitely *don't* find it cleaner to
 eat up another flag bit to avoid that.  We don't have many to spare.

It wouldn't need to be a flag bit in any existing struct, so that's not
a problem.

 Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
 but I suppose we can't have that without an on-disk compatibility break.

The patch actually does change it exactly that way. My argument for that
being valid is that CommandIds don't play any role outside of their own
transaction. Now, somebody could argue that SELECT cmin, cmax can be
done outside the transaction, but: Those values are already pretty much
meaningless today since cmin/cmax have been merged. They also don't
check whether the field is initialized at all.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-09-05 14:21:33 -0400, Tom Lane wrote:
 Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
 but I suppose we can't have that without an on-disk compatibility break.

 The patch actually does change it exactly that way.

Oh.  I hadn't looked at the patch, but I had (mis)read what Robert said
to think that you were proposing introducing InvalidCommandId = 0x
while leaving FirstCommandId alone.  That would make more sense to me as
(1) it doesn't change the interpretation of anything that's (likely to be)
on disk; (2) it allows the check for overflow in CommandCounterIncrement
to not involve recovering from an *actual* overflow.  With the horsing
around we've been seeing from the gcc boys lately, I don't have a warm
feeling about whether they won't break that test someday on the grounds
that overflow is undefined behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Small catcache optimization

2013-09-05 Thread Andres Freund
Hi,

Heikki's catcache rehashing stuff reminded me that I'd posted an
optimization to catcache (20121220153555.gh4...@awork2.anarazel.de) some
time back which I didn't have energy to pursue at that point.

I've brushed the patch up a bit and verified it still gives be a
performance improvement. It's still about 2% in a readonly pgbench on my
elderly laptop.

There's basically two tricks in the patch:
1) Don't always copy the cache's ScanKey to the stack. Instead pass
   an array of arguments around. That get's rid of a good amount of
   memcpy()ing in the common, cached case.
2) If we have to memcpy() because we need to pass a ScanKey to
   systable_*, copy only cache-cc_nkey * sizeof(ScanKeyData) instead of
   always copying the maximum size.

I'd be nicer to get rid of the mostly copied HeapKeyTestArg, but I don't
immediately see how.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From df40dab17e7582ab597214d3c7fcefd953dcc9f7 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 5 Sep 2013 20:30:44 +0200
Subject: [PATCH] Improve catalog cache lookup performance.

Do so by not copying the entire ScanKey upon entering SearchCatCache,
SearchCatCacheList. Instead perform operations like hashing on a smaller array
containing only the passed in Datums.

When a lookup into the underlying relations need to happen, copy the Datums
into a local copy of the ScanKey as before, but use a smaller memcpy() if
possible.
---
 src/backend/utils/cache/catcache.c | 120 -
 src/include/access/valid.h |  56 +
 2 files changed, 122 insertions(+), 54 deletions(-)

diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c
index c467f11..fb7eaa2 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -74,7 +74,7 @@ static CatCacheHeader *CacheHdr = NULL;
 
 
 static uint32 CatalogCacheComputeHashValue(CatCache *cache, int nkeys,
-			 ScanKey cur_skey);
+		   Datum *argument);
 static uint32 CatalogCacheComputeTupleHashValue(CatCache *cache,
   HeapTuple tuple);
 
@@ -174,7 +174,7 @@ GetCCHashEqFuncs(Oid keytype, PGFunction *hashfunc, RegProcedure *eqfunc)
  * Compute the hash value associated with a given set of lookup keys
  */
 static uint32
-CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey)
+CatalogCacheComputeHashValue(CatCache *cache, int nkeys, Datum *argument)
 {
 	uint32		hashValue = 0;
 	uint32		oneHash;
@@ -189,28 +189,28 @@ CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey)
 		case 4:
 			oneHash =
 DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[3],
-   cur_skey[3].sk_argument));
+   argument[3]));
 			hashValue ^= oneHash  24;
 			hashValue ^= oneHash  8;
 			/* FALLTHROUGH */
 		case 3:
 			oneHash =
 DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[2],
-   cur_skey[2].sk_argument));
+   argument[2]));
 			hashValue ^= oneHash  16;
 			hashValue ^= oneHash  16;
 			/* FALLTHROUGH */
 		case 2:
 			oneHash =
 DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[1],
-   cur_skey[1].sk_argument));
+   argument[1]));
 			hashValue ^= oneHash  8;
 			hashValue ^= oneHash  24;
 			/* FALLTHROUGH */
 		case 1:
 			oneHash =
 DatumGetUInt32(DirectFunctionCall1(cache-cc_hashfunc[0],
-   cur_skey[0].sk_argument));
+   argument[0]));
 			hashValue ^= oneHash;
 			break;
 		default:
@@ -229,17 +229,14 @@ CatalogCacheComputeHashValue(CatCache *cache, int nkeys, ScanKey cur_skey)
 static uint32
 CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple)
 {
-	ScanKeyData cur_skey[CATCACHE_MAXKEYS];
+	Datum		arguments[CATCACHE_MAXKEYS];
 	bool		isNull = false;
 
-	/* Copy pre-initialized overhead data for scankey */
-	memcpy(cur_skey, cache-cc_skey, sizeof(cur_skey));
-
 	/* Now extract key fields from tuple, insert into scankey */
 	switch (cache-cc_nkeys)
 	{
 		case 4:
-			cur_skey[3].sk_argument =
+			arguments[3] =
 (cache-cc_key[3] == ObjectIdAttributeNumber)
 ? ObjectIdGetDatum(HeapTupleGetOid(tuple))
 : fastgetattr(tuple,
@@ -249,7 +246,7 @@ CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple)
 			Assert(!isNull);
 			/* FALLTHROUGH */
 		case 3:
-			cur_skey[2].sk_argument =
+			arguments[2] =
 (cache-cc_key[2] == ObjectIdAttributeNumber)
 ? ObjectIdGetDatum(HeapTupleGetOid(tuple))
 : fastgetattr(tuple,
@@ -259,7 +256,7 @@ CatalogCacheComputeTupleHashValue(CatCache *cache, HeapTuple tuple)
 			Assert(!isNull);
 			/* FALLTHROUGH */
 		case 2:
-			cur_skey[1].sk_argument =
+			arguments[1] =
 (cache-cc_key[1] == ObjectIdAttributeNumber)
 ? ObjectIdGetDatum(HeapTupleGetOid(tuple))
 : fastgetattr(tuple,
@@ -269,7 +266,7 @@ 

Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Peter Geoghegan
On Thu, Sep 5, 2013 at 11:30 AM, Andres Freund and...@2ndquadrant.com wrote:
 Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
 but I suppose we can't have that without an on-disk compatibility break.

 The patch actually does change it exactly that way. My argument for that
 being valid is that CommandIds don't play any role outside of their own
 transaction.

Right. It seems like this should probably be noted in the
documentation under 5.4. System Columns -- I just realized that it
isn't.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hstore: Query speedups with Gin index

2013-09-05 Thread Blake Smith
Hi Peter,

Thanks for checking the tests. I wasn't able to duplicate your test
results. Did you run the hstore regression tests with the revised patch I
attached in the thread?  Attached is the output I got with the latest patch
applied.

Thanks!

Blake



On Tue, Sep 3, 2013 at 8:04 PM, Peter Eisentraut pete...@gmx.net wrote:

 On Tue, 2013-09-03 at 09:24 -0500, Blake Smith wrote:
  Thanks for the feedback everyone. I've attached the patch that we are
  now running in production to service our hstore include queries. We
  rebuilt the index to account for the on-disk incompatibility. I've
  submitted the patch to commitfest
  here: https://commitfest.postgresql.org/action/patch_view?id=1203

 I'm getting the attached failure from the hstore regression test.






-- 
Blake Smith
http://blakesmith.me
@blakesmith


regressions.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Andres Freund
On 2013-09-05 21:02:44 +0200, Andres Freund wrote:
 On 2013-09-05 14:37:01 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   On 2013-09-05 14:21:33 -0400, Tom Lane wrote:
   Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
   but I suppose we can't have that without an on-disk compatibility break.
  
   The patch actually does change it exactly that way.
  
  Oh.  I hadn't looked at the patch, but I had (mis)read what Robert said
  to think that you were proposing introducing InvalidCommandId = 0x
  while leaving FirstCommandId alone.  That would make more sense to me as
  (1) it doesn't change the interpretation of anything that's (likely to be)
  on disk; (2) it allows the check for overflow in CommandCounterIncrement
  to not involve recovering from an *actual* overflow.  With the horsing
  around we've been seeing from the gcc boys lately
 
 Ok, I can do it that way. LCR obviously shouldn't care.

It doesn't care to the point that the patch already does exactly what
you propose. It's just my memory that remembered things differently.

So, a very slightly updated patch attached.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 0592af4ae2e5a2bb1e4919560ab2768a18d88dd4 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Mon, 19 Aug 2013 13:24:30 +0200
Subject: [PATCH] Introduce InvalidCommandId

Do so by reducing the limit of allowed CommandIds by one and declare ~0 as
InvalidCommandId.

This decreases the possible number of subtransactions by one which seems
unproblematic. Its also not a problem for pg_upgrade because cmin/cmax are
never looked at outside the context of their own transaction.
---
 src/backend/access/transam/xact.c | 4 ++--
 src/include/c.h   | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 31e868d..0591f3f 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -766,12 +766,12 @@ CommandCounterIncrement(void)
 	if (currentCommandIdUsed)
 	{
 		currentCommandId += 1;
-		if (currentCommandId == FirstCommandId) /* check for overflow */
+		if (currentCommandId == InvalidCommandId)
 		{
 			currentCommandId -= 1;
 			ereport(ERROR,
 	(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-	 errmsg(cannot have more than 2^32-1 commands in a transaction)));
+	 errmsg(cannot have more than 2^32-2 commands in a transaction)));
 		}
 		currentCommandIdUsed = false;
 
diff --git a/src/include/c.h b/src/include/c.h
index 5961183..14bfdcd 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -368,6 +368,7 @@ typedef uint32 MultiXactOffset;
 typedef uint32 CommandId;
 
 #define FirstCommandId	((CommandId) 0)
+#define InvalidCommandId	(~(CommandId)0)
 
 /*
  * Array indexing support
-- 
1.8.3.251.g1462b67


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lcr v5 - introduction of InvalidCommandId

2013-09-05 Thread Andres Freund
On 2013-09-05 14:37:01 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-09-05 14:21:33 -0400, Tom Lane wrote:
  Ideally I'd have made InvalidCommandId = 0 and FirstCommandId = 1,
  but I suppose we can't have that without an on-disk compatibility break.
 
  The patch actually does change it exactly that way.
 
 Oh.  I hadn't looked at the patch, but I had (mis)read what Robert said
 to think that you were proposing introducing InvalidCommandId = 0x
 while leaving FirstCommandId alone.  That would make more sense to me as
 (1) it doesn't change the interpretation of anything that's (likely to be)
 on disk; (2) it allows the check for overflow in CommandCounterIncrement
 to not involve recovering from an *actual* overflow.  With the horsing
 around we've been seeing from the gcc boys lately

Ok, I can do it that way. LCR obviously shouldn't care.

 I don't have a warm
 feeling about whether they won't break that test someday on the grounds
 that overflow is undefined behavior.

Unsigned overflow is pretty strictly defined, so I don't see much danger
there. Also, we'd feel the pain pretty definitely with xids...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?

2013-09-05 Thread Greg Stark
On Thu, Sep 5, 2013 at 6:41 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 Then if the column was
 altered from NUMERIC with scale to be a plain NUMERIC, it would have
 to rewrite the table to enforce the row-wise scale to match the old
 column-wise scale.  Where as now that alter doesn't need a re-write.
 I don't know if this would be an overall gain or not.


We've talked about cases like this in the past. It's mostly a SOP and I
think it may already be on the TODO.

The main difficulty is that Postgres is very extensible. So to implement
this you need to think bigger than NUMERIC. It should also be possible to
alter a column from varchar(5) to varchar(10) for example (but not the
other way around).

One way to do it would be to extend pg_type to have another column that
specifies a function. That function would take the old and new typmod
(which is what stores the 5 in varchar(5)) and tell the server whether it's
a safe change to make without rechecking.

Another way might be to overload the cast functions, though they currently
receive no information about the typmod. They might have the benefit of
being able to handle things like varchar(5) - text though.

But it has to be that general. Any data type should be able to specify
whether an old and new typmod are compatible.

-- 
greg


Re: [HACKERS] Eliminating pg_catalog.pg_rewrite.ev_attr

2013-09-05 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:

 New patch attached.



Pushed (to master only).


277607d600fb71e25082b94302ca1716403cd0bc


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?

2013-09-05 Thread Alvaro Herrera
Greg Stark escribió:

 The main difficulty is that Postgres is very extensible. So to implement
 this you need to think bigger than NUMERIC. It should also be possible to
 alter a column from varchar(5) to varchar(10) for example (but not the
 other way around).

We already allow that.  See commits
8f9fe6edce358f7904e0db119416b4d1080a83aa and
3cc0800829a6dda5347497337b0cf43848da4acf

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 06:14:33PM +0200, Magnus Hagander wrote:
  I have developed the attached patch which implements an auto-tuned
  effective_cache_size which is 4x the size of shared buffers.  I had to
  set effective_cache_size to its old 128MB default so the EXPLAIN
  regression tests would pass unchanged.
 
 That's not really autotuning though. ISTM that making the *default* 4
 x shared_buffers might make perfect sense, but do we really need to
 hijack the value of -1 for that? That might be useful for some time
 when we have actual autotuning, that somehow inspects the system and
 tunes it from there.

 I also don't think it should be called autotuning, when it's just a
 smarter default value.
 
 I like the feature, though, just not the packaging.

That auto-tuning text came from the wal_buffer documentation, which
does exactly this based on shared_buffers:

The contents of the WAL buffers are written out to disk at every
transaction commit, so extremely large values are unlikely to
provide a significant benefit.  However, setting this value to at
least a few megabytes can improve write performance on a busy
-- server where many clients are committing at once.  The auto-tuning
   ---
selected by the default setting of -1 should give reasonable
results in most cases.

I am fine with rewording and not using -1, but we should change the
wal_buffer default and documentation too then.  I am not sure what other
value than -1 to use?  0?  I figure if we ever get better auto-tuning,
we would just remove this functionality and make it better.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange IS NULL behaviour

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 02:14:39PM -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Wed, Sep 4, 2013 at 9:26 PM, Bruce Momjian br...@momjian.us wrote:
  I have not heard any feedback on this patch, so I would like to apply it
  to give us a nested ROW/IS NULL API we can document.  It would have to
  be marked in the release notes as a backward incompatibility.
 
  I don't have time to look at this in detail right now, but I think
  that's considerably premature.  I'm not convinced that we understand
  all of the problems in this area are yet, let alone the solutions.
  And I notice that you haven't substantively responded to some of Tom's
  concerns.
 
 In particular, I don't think it's a good idea to change
 eval_const_expression's behavior in an incompatible way that simply
 makes it differently inconsistent with other IS NULL code paths.

Let me summarize what eval_const_expressions_mutator() does:  it takes a
ROW() IS NULL construct, and converts it to individual _value_ IS NULL
clauses so they can be better optimized.  It changes:

ROW(x, ROW(y)) IS NULL

to

x IS NULL AND ROW(y) IS NULL

By removing this one level of ROW construct, it causes ROW(ROW(NULL)) IS
NULL to return true, while more nested cases do not.  It also tries to
short-circuit the IS NULL clause if it finds a literal NULL constant
inside the row.

My patch uses this short-circuit logic to return false for IS NULL if it
finds an embedded ROW construct.  This makes the code match the code in
ExecEvalNullTest(), which already treats embedded ROW values as
non-null, e.g. it already treats ROW(ROW(x)) IS NULL as false. 
Specifically, any code path that doesn't pass through
eval_const_expressions_mutator() and gets to execQual.c will return
false for this.

 We should leave things alone until we have a full fix, otherwise we'll
 just be breaking people's apps repeatedly.

Yes, we don't want that.

 I would also say that allowing eval_const_expression to drive what we
 think the right behavior is is completely backwards, because it's
 about the least performance-critical case.  You should be looking at
 execQual.c first.

I am making eval_const_expression match execQual.c, specifically
ExecEvalNullTest(), which calls heap_attisnull(), and looks at the NULL
indicator, which can't be true for an attribute containing a ROW value.

I am confused why others don't see what I am seeing.

Another possible fix would be to avoid the IS NULL value optimizer
expansion if a ROW construct is inside a ROW().  I have attached a patch
that does this for review.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
new file mode 100644
index 76c032c..54e59dc
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*** eval_const_expressions_mutator(Node *nod
*** 3100,3105 
--- 3100,3122 
  return makeBoolConst(false, false);
  			continue;
  		}
+ 
+ 		/* Do we have a ROW() expression in the row? */
+ 		if (type_is_rowtype(exprType(relem)))
+ 		{
+ 			/*
+ 			 * If this ROW() has an embedded ROW() value,
+ 			 * our optimization  of removing a level of ROW()
+ 			 * nesting causes ROW(ROW(NULL)) IS NULL to return
+ 			 * true, so just return the original clause.
+ 			 */
+ 			newntest = makeNode(NullTest);
+ 			newntest-arg = (Expr *) arg;
+ 			newntest-nulltesttype = ntest-nulltesttype;
+ 			newntest-argisrow = ntest-argisrow;
+ 			return (Node *) newntest;
+ 		}
+ 
  		newntest = makeNode(NullTest);
  		newntest-arg = (Expr *) relem;
  		newntest-nulltesttype = ntest-nulltesttype;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] Extend namespace of valid guc names

2013-09-05 Thread Andres Freund
On 2013-02-26 13:08:54 +0100, Andres Freund wrote:
 Just for reference, thats the grammar for SET/SHOW:

 var_name: ColId   
 { $$ = $1; }
   | var_name '.' ColId

 Any arguments whether we should try to validate postgres -c,
 set_config and SET/SHOW the same?


Any input here?

Currently set_config() and postgres -c basically don't have any
restrictions on the passed guc name whereas SHOW, SET and
postgresql.conf do.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 12:48:54PM -0400, Tom Lane wrote:
 Magnus Hagander mag...@hagander.net writes:
  On Thu, Sep 5, 2013 at 3:01 AM, Bruce Momjian br...@momjian.us wrote:
  I have developed the attached patch which implements an auto-tuned
  effective_cache_size which is 4x the size of shared buffers.  I had to
  set effective_cache_size to its old 128MB default so the EXPLAIN
  regression tests would pass unchanged.
 
  That's not really autotuning though. ISTM that making the *default* 4
  x shared_buffers might make perfect sense, but do we really need to
  hijack the value of -1 for that? That might be useful for some time
  when we have actual autotuning, that somehow inspects the system and
  tunes it from there.
 
 Well, the real problem with this patch is that it documents what the
 auto-tuning algorithm is; without that commitment, just saying -1 means
 autotune might be fine.

OK, but I did this based on wal_buffers, which has a -1 default, calls
it auto-tuning, and explains how the default is computed.

 Did you consider the alternative of just tweaking initdb to insert a
 default for effective_cache_size that's 4x whatever it picks for
 shared_buffers?  That would probably be about 3 lines of code, and it
 wouldn't nail down any particular server-side behavior.

The problem there is that many users are told to tune shared_buffers,
but don't touch effective cache size.  Having initdb set the
effective_cache_size value would not help there.  Again, this is all
based on the auto-tuning of wal_buffers.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 regression

2013-09-05 Thread Bruce Momjian
On Mon, Aug 19, 2013 at 09:27:57PM -0400, Stephen Frost wrote:
 * Andres Freund (and...@2ndquadrant.com) wrote:
  I vote for adapting the patch to additionally zero out the file via
  write(). In your tests that seemed to perform at least as good as the
  old method... It also has the advantage that we can use it a littlebit
  more as a testbed for possibly using it for heap extensions one day.
  We're pretty early in the cycle, so I am not worried about this too much...
 
 I dunno, I'm pretty disappointed that this doesn't actually improve
 things.  Just following this casually, it looks like it might be some
 kind of locking issue in the kernel that's causing it to be slower; or
 at least some code path that isn't exercise terribly much and therefore
 hasn't been given the love that it should.
 
 Definitely interested in what Ts'o says, but if we can't figure out why
 it's slower *without* writing out the zeros, I'd say we punt on this
 until Linux and the other OS folks improve the situation.

FYI, the patch has been reverted.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Josh Berkus
On 09/05/2013 02:16 PM, Bruce Momjian wrote:
 Well, the real problem with this patch is that it documents what the
 auto-tuning algorithm is; without that commitment, just saying -1 means
 autotune might be fine.
 
 OK, but I did this based on wal_buffers, which has a -1 default, calls
 it auto-tuning, and explains how the default is computed.

I don't see a real problem with this.  For users who have set their
shared_buffers correctly, effective_cache_size should also be correct.

 The problem there is that many users are told to tune shared_buffers,
 but don't touch effective cache size.  Having initdb set the
 effective_cache_size value would not help there.  Again, this is all
 based on the auto-tuning of wal_buffers.

Standard advice we've given in the past is 25% shared buffers, 75%
effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
 Maybe we're changing the conventional calculation, but I thought I'd
point that out.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread arthernan
The comments on this sourcebase are really good, plus it is very well
structured!

I read somewhere that you can actually store an execution plan by using
stored procedures. If that is the case. I would imagine that variables would
prevent the histograms from being used. I don't mind finding the code
myself, but... Where is the code that does that switch? Clues are welcome
too!!



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769800.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Where can I find the code for extern OidFunctionCall7?

2013-09-05 Thread arthernan
Never mind I think I found it.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Where-can-I-find-the-code-for-extern-OidFunctionCall7-tp5769737p5769801.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Merlin Moncure
On Thu, Sep 5, 2013 at 5:11 PM, Josh Berkus j...@agliodbs.com wrote:
 On 09/05/2013 02:16 PM, Bruce Momjian wrote:
 Well, the real problem with this patch is that it documents what the
 auto-tuning algorithm is; without that commitment, just saying -1 means
 autotune might be fine.

 OK, but I did this based on wal_buffers, which has a -1 default, calls
 it auto-tuning, and explains how the default is computed.

 I don't see a real problem with this.  For users who have set their
 shared_buffers correctly, effective_cache_size should also be correct.

Agreed.  I think -1 is the right setting for autotune as things stand
today. If we want something else, then we should change other settings
as well (like wal_buffers) and that is not in the scope of this patch.

 The problem there is that many users are told to tune shared_buffers,
 but don't touch effective cache size.  Having initdb set the
 effective_cache_size value would not help there.  Again, this is all
 based on the auto-tuning of wal_buffers.

 Standard advice we've given in the past is 25% shared buffers, 75%
 effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
  Maybe we're changing the conventional calculation, but I thought I'd
 point that out.

This was debated upthread.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 03:11:53PM -0700, Josh Berkus wrote:
 On 09/05/2013 02:16 PM, Bruce Momjian wrote:
  Well, the real problem with this patch is that it documents what the
  auto-tuning algorithm is; without that commitment, just saying -1 means
  autotune might be fine.
  
  OK, but I did this based on wal_buffers, which has a -1 default, calls
  it auto-tuning, and explains how the default is computed.
 
 I don't see a real problem with this.  For users who have set their
 shared_buffers correctly, effective_cache_size should also be correct.
 
  The problem there is that many users are told to tune shared_buffers,
  but don't touch effective cache size.  Having initdb set the
  effective_cache_size value would not help there.  Again, this is all
  based on the auto-tuning of wal_buffers.
 
 Standard advice we've given in the past is 25% shared buffers, 75%
 effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
  Maybe we're changing the conventional calculation, but I thought I'd
 point that out.

Yes, I had wondered that myself, and 3x and 4x were thrown out as
options.  There were more people who liked 4x, but one of the reasons
was that 3x sounded odd --- not sure what to make of that, but I went
with the most popular.  I am fine with 3x, and I do think it logically
makes more sense, and is less likely to over-estimate than 4x.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Bruce Momjian
On Wed, Sep  4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote:
  Add a column pg_class.relallvisible to remember the number of pages
  that were all-visible according to the visibility map as of the last
  VACUUM
  (or ANALYZE, or some other operations that update pg_class.relpages).
  Use relallvisible/relpages, instead of an arbitrary constant, to
  estimate how many heap page fetches can be avoided during an
  index-only scan.
  
  Have I missed some nuance?
 
 I am looking back at this issue now and I think you are correct.  The
 commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
 and the code matches that:
 
   if (!inh)
   vac_update_relstats(onerel,
   RelationGetNumberOfBlocks(onerel),
   totalrows,
 --   visibilitymap_count(onerel),
   hasindex,
   InvalidTransactionId);
 
 so if an index scan was not being used after an ANALYZE, it isn't a bad
 allvisibile estimate but something else.  This code was in PG 9.2.

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:

1)  When are VM bits set:
vacuum (non-full)
analyze (only some random pages)
2)  When are massive rows added but VM bits not set:
copy
3) When are VM bits cleared:
insert/update/delete
vacuum (non-full)
4)  When are VM map files cleared:
vacuum full
cluster
5) When is pg_class.relallvisible updated via a VM map file scan:
vacuum (non-full)
analyze

Vacuums run by autovacuum are driven by n_dead_tuples, which is only
update and delete.  Therefore, any scenario where vacuum (non-full) is
never run will not have significant VM bits set.  The only bits that
will be set will be by pages visited randomly by analyze.

The following table activities will not set proper VM bits:

vacuum full
cluster
copy
insert-only

If updates and deletes happen, there will eventually be sufficient
reason for autovacuum to vacuum the table and set proper VM bits, and
pg_class.relallvisible.

The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Robert Haas
On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
 Actually, I now realize it is more complex than that, and worse.  There
 are several questions to study to understand when pg_class.relallvisible
 is updated (which is used to determine if index-only scans are a good
 optimization choice), and when VM all-visible bits are set so heap pages
 can be skipped during index-only scans:

 1)  When are VM bits set:
 vacuum (non-full)
 analyze (only some random pages)

Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.

 The calculus we should use to determine when we need to run vacuum has
 changed with index-only scans, and I am not sure we ever fully addressed
 this.

Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_restore multiple --function options

2013-09-05 Thread Josh Kupershmidt
On Tue, Aug 27, 2013 at 1:14 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Assuming no objections, I'll apply the attached patch to 9.3 and master
 later tonight.

Just a little stylistic nitpick: could we pluralize the --help outputs
for the modified options so that they make clear that multiple
specifications are supported. E.g. restore named index(es) instead
of just restore named index. The last round of such changes tried to
make such --help tweaks, it would be nice to keep 'em consistent.

Josh


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Gavin Flower

On 06/09/13 13:10, Robert Haas wrote:

On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:

 1)  When are VM bits set:
 vacuum (non-full)
 analyze (only some random pages)

Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.


The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction 
syntax?) that would:


1. only be valid in a transaction
2. initiate a vacuum after the current transaction completed
3. defer any vacuum triggered due to other criteria

If the transaction was rolled back: then if there was a pending vacuum, 
due to other reasons, it would then be actioned.


On normal transaction completion, then if there was a pending vacuum it 
would be combined with the one in the transaction.


Still would need some method of ensuring any pending vacuum was done if 
the transaction hung, or took too long.



Cheers,
Gavin


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
 On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
  Actually, I now realize it is more complex than that, and worse.  There
  are several questions to study to understand when pg_class.relallvisible
  is updated (which is used to determine if index-only scans are a good
  optimization choice), and when VM all-visible bits are set so heap pages
  can be skipped during index-only scans:
 
  1)  When are VM bits set:
  vacuum (non-full)
  analyze (only some random pages)
 
 Analyze doesn't set visibility-map bits.  It only updates statistics
 about how many are set.

Sorry, yes you are correct.

  The calculus we should use to determine when we need to run vacuum has
  changed with index-only scans, and I am not sure we ever fully addressed
  this.
 
 Yeah, we didn't.  I think the hard part is figuring out what behavior
 would be best.  Counting inserts as well as updates and deletes would
 be a simple approach, but I don't have much confidence in it.  My
 experience is that having vacuum or analyze kick in during a bulk-load
 operation is a disaster.  We'd kinda like to come up with a way to
 make vacuum run after the bulk load is complete, maybe, but how would
 we identify that time, and there are probably cases where that's not
 right either.

I am unsure how we have gone a year with index-only scans and I am just
now learning that it only works well with update/delete workloads or by
running vacuum manually.  I only found this out going back over January
emails.  Did other people know this?  Was it not considered a serious
problem?

Well, our logic has been that vacuum is only for removing expired rows. 
I think we either need to improve that, or somehow make sequential scans
update the VM map, and then find a way to trigger update of
relallvisible even without inserts.

Ideas
-

I think we need to detect tables that do not have VM bits set and try to
determine if they should be vacuumed.  If a table has most of its VM
bits set, there in need to vacuum it for VM bit setting.

Autovacuum knows how many pages are in the table via its file size, and
it can scan the VM map to see how many pages are _not_ marked
all-visible.  If the VM map has many pages that are _not_ marked as
all-visible, and change count since last vacuum is low, those pages
might now be all-visible and vacuum might find them.  One problem is
that a long-running transaction is not going to update relallvisible
until commit, so you might be vacuuming a table that is being modified,
e.g. bulk loads.  Do we have any way of detecting if a backend is
modifying a table?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Is it necessary to rewrite table while increasing the scale of datatype numeric?

2013-09-05 Thread Noah Misch
On Thu, Sep 05, 2013 at 10:41:25AM -0700, Jeff Janes wrote:
 In order to avoid the rewrite, the code would have to be changed to
 look up the column definition and if it specifies the scale, then
 ignore the per-row n_header, and look at the n_header only if the
 column is NUMERIC with no precision or scale.  That should
 conceptually be possible, but I don't know how hard it would be to
 implement--it sounds pretty invasive to me.  Then if the column was
 altered from NUMERIC with scale to be a plain NUMERIC, it would have
 to rewrite the table to enforce the row-wise scale to match the old
 column-wise scale.  Where as now that alter doesn't need a re-write.
 I don't know if this would be an overall gain or not.

Invasive indeed.  The type-supplementary data would need to reach essentially
everywhere we now convey a type OID.  Compare the invasiveness of adding
collation support.  However, this is not the first time it would have been
useful.  We currently store a type OID in every array and composite datum.
That's wasteful and would be unnecessary if we reliably marshalled similar
information to all the code needing it.  Given a few more use cases, the
effort would perhaps start to look credible relative to the benefits.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: Set effective_cache_size to greater of .conf value, shared_buffers

2013-09-05 Thread Josh Berkus
On 09/05/2013 03:30 PM, Merlin Moncure wrote:

 Standard advice we've given in the past is 25% shared buffers, 75%
 effective_cache_size.  Which would make EFS *3X* shared_buffers, not 4X.
  Maybe we're changing the conventional calculation, but I thought I'd
 point that out.
 
 This was debated upthread.

Actually, no, it wasn't.  Tom threw out a suggestion that we use 4X for
historical reasons.  That's all, there was no discussion.

So, my point stands: our historical advice has been to set EFS to 75% of
RAM.  Maybe we're changing that advice, but if so, let's change it.
Otherwise 3X makes more sense.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Alvaro Herrera
Bruce Momjian escribió:

 Ideas
 -
 
 I think we need to detect tables that do not have VM bits set and try to
 determine if they should be vacuumed.  If a table has most of its VM
 bits set, there in need to vacuum it for VM bit setting.

I think it's shortsighted to keep thinking of autovacuum as just a way
to run VACUUM and ANALYZE.  We have already discussed work items that
need to be done separately, such as truncating the last few empty pages
on a relation that was vacuumed recently.  We also need to process a GIN
index' pending insertion list; and with minmax indexes I will want to
run summarization of heap page ranges.

So maybe instead of trying to think of VM bit setting as part of vacuum,
we could just keep stats about how many pages we might need to scan
because of possibly needing to set the bit, and then doing that in
autovacuum, independently from actually vacuuming the relation.

I'm not sure if we need to expose all these new maintenance actions as
SQL commands.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers