date:20130306

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-06 Thread Amit Kapila

On Thursday, March 07, 2013 10:54 AM Greg Smith wrote:
> On 3/5/13 9:07 AM, Amit Kapila wrote:
> > In v11 patch, I have changed name of directory to config.
> > For file name, currently I have not changed, but if you feel it needs
> to be
> > changed, kindly suggest any one of above or if any other better you
> have in
> > mind.
> 
> This seems fine for now.  Hashing out exactly which name is best is not
> going to be the thing that determines whether this can be committed or
> not.
> 
> Your v11 applies cleanly for me too, and the code does look a lot
> nicer.
>   The code diff itself is down to 1300 lines and not nearly as
> disruptive, the rest is docs or the many regression tests you've added.

I also think the tests added for regression may be more than required.
Currently it is organized as below:

1. 3 type of parameters to use with set persistent. 
   a. parameters for which new connection is required (log_connections) 
   b. parameters for which server restart is required
   c. parameters for which reload is required

   I think in third category "c", there are many parameters, I think may be
3-4 parameters should be sufficient.
   The initial intention to keep more parameters is to have test it for
various type(Boolean, integer, float, string)

2. Set some of the parameters to default, then try reload and reconnect.
3. Again set those parameters to some value and re-verify by reload and
reconnect.
4. Test for set persistent command inside functions and transaction.
5. Set all parameters value to default.

I think tests for step-3 can be removed and tests for step-5 can be merged
to step-2.

If you think above optimization's to reduce number of tests are okay, then I
will 
update the patch.

>   That's still not the sort of small change Tom was hoping this was
> possible, but it's closer to the right range.  It may not really be
> possible to make this much smaller.
> 
> I'm out of time for today, but I'll try to do some functional review
> work on this tomorrow if no one beats me to it.  I would find it very
> useful to me personally if this feature were committed, and we know
> there's plenty of user demand for it too.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Smith


On 3/7/13 12:15 AM, Daniel Farina wrote:

I have only done some cursory research, but cpu-time of 20% seem to
expected for InnoDB's CRC computation[0].  Although a galling number,
this comparison with other systems may be a way to see how much of
that overhead is avoidable or just the price of entry.  It's unclear
how this 20% cpu-time compares to your above whole-system results, but
it's enough to suggest that nothing comes for (nearly) free.


That does provide a useful measuring point:  how long does the 
computation take compared to the memcpy that moves the buffer around. 
It looks like they started out with 3.2 memcpy worth of work, and with 
enough optimization ended up at 1.27 worth.


The important thing to keep in mind is that shared_buffers works pretty 
well at holding on to the most frequently accessed information.  A 
typical server I see will show pg_statio information suggesting 90%+ of 
block requests are coming from hits there, the rest misses suggesting a 
mix of OS cache and real disk reads.  Let's say 90% are hits, 5% are 
fetches at this 20% penalty, and 5% are real reads where the checksum 
time is trivial compared to physical disk I/O.  That works out to be a 
real average slowdown of 6%.  I think way more deployments are going to 
be like that case, which matches most of my pgbench runs, than the worse 
case workloads.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-03-06 Thread Greg Smith


On 3/5/13 9:07 AM, Amit Kapila wrote:

In v11 patch, I have changed name of directory to config.
For file name, currently I have not changed, but if you feel it needs to be
changed, kindly suggest any one of above or if any other better you have in
mind.


This seems fine for now.  Hashing out exactly which name is best is not 
going to be the thing that determines whether this can be committed or not.


Your v11 applies cleanly for me too, and the code does look a lot nicer. 
 The code diff itself is down to 1300 lines and not nearly as 
disruptive, the rest is docs or the many regression tests you've added. 
 That's still not the sort of small change Tom was hoping this was 
possible, but it's closer to the right range.  It may not really be 
possible to make this much smaller.


I'm out of time for today, but I'll try to do some functional review 
work on this tomorrow if no one beats me to it.  I would find it very 
useful to me personally if this feature were committed, and we know 
there's plenty of user demand for it too.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Daniel Farina

On Wed, Mar 6, 2013 at 8:17 PM, Greg Smith  wrote:
> TL;DR summary:  on a system I thought was a fair middle of the road server,
> pgbench tests are averaging about a 2% increase in WAL writes and a 2%
> slowdown when I turn on checksums.  There are a small number of troublesome
> cases where that overhead rises to closer to 20%, an upper limit that's
> shown up in a few tests aiming to stress this feature now.

I have only done some cursory research, but cpu-time of 20% seem to
expected for InnoDB's CRC computation[0].  Although a galling number,
this comparison with other systems may be a way to see how much of
that overhead is avoidable or just the price of entry.  It's unclear
how this 20% cpu-time compares to your above whole-system results, but
it's enough to suggest that nothing comes for (nearly) free.

[0]: http://mysqlha.blogspot.com/2009/05/innodb-checksum-performance.html

--
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Smith

TL;DR summary:  on a system I thought was a fair middle of the road 
server, pgbench tests are averaging about a 2% increase in WAL writes 
and a 2% slowdown when I turn on checksums.  There are a small number of 
troublesome cases where that overhead rises to closer to 20%, an upper 
limit that's shown up in a few tests aiming to stress this feature now.


On 3/4/13 10:09 PM, Jeff Davis wrote:

= Test 2 - worst-case overhead for calculating checksum while reading data =

Jeff saw an 18% slowdown, I get 24 to 32%.  This one bothers me because
the hit is going to happen during the very common situation where data
is shuffling a lot between a larger OS cache and shared_buffers taking a
relatively small fraction.


I believe that test 1 and test 2 can be improved a little, if there is a
need. Right now we copy the page and then calculate the checksum on the
copy. If we instead calculate as we're copying, I believe it will make
it significantly faster.


It's good to know there's at least some ideas for optimizing this one 
further.  I think the situation where someone has:


shared_buffers < database < total RAM

is fairly common for web applications.  For people on Amazon EC2 
instances for example, giving out the performance tuning advice of "get 
a bigger instance until the database fits in RAM" works amazingly well. 
 If the hotspot of that data set fits in shared_buffers, those people 
will still be in good shape even with checksums enabled.  If the hot 
working set is spread out more randomly, though, it's not impossible to 
see how they could suffer regularly from this ~20% OS cache->shared 
buffers movement penalty.


Regardless, Jeff's three cases are good synthetic exercises to see 
worst-case behavior, but they are magnifying small differences.  To see 
a more general case, I ran through a series of pgbench tests in its 
standard write mode.  In order to be useful, I ended up using a system 
with a battery-backed write cache, but with only a single drive 
attached.  I needed fsync to be fast to keep that from being the 
bottleneck.  But I wanted physical I/O to be slow.  I ran three test 
sets at various size/client loads:  one without the BBWC (which I kept 
here because it gives some useful scale to the graphs), one with the 
baseline 9.3 code, and one with checksums enabled on the cluster.  I did 
only basic postgresql.conf tuning:


 checkpoint_segments| 64
 shared_buffers | 2GB

There's two graphs comparing sets attached, you can see that the 
slowdown of checksums for this test is pretty minor.  There is a clear 
gap between the two plots, but it's not a very big one, especially if 
you note how much difference a BBWC makes.


I put the numeric results into a spreadsheet, also attached.  There's so 
much noise in pgbench results that I found it hard to get a single 
number for the difference; they bounce around about +/-5% here. 
Averaging across everything gives a solid 2% drop when checksums are on 
that looked detectable above the noise.


Things are worse on the bigger data sets.  At the highest size I tested, 
the drop was more like 7%.  The two larger size / low client count 
results I got were really bad, 25% and 16% drops.  I think this is 
closing in on the range of things:  perhaps only 2% when most of your 
data fits in shared_buffers, more like 10% if your database is bigger, 
and in the worst case 20% is possible.  I don't completely trust those 
25/16% numbers though, I'm going to revisit that configuration.


The other thing I track now in pgbench-tools is how many bytes of WAL 
are written.  Since the total needs to be measured relative to work 
accomplished, the derived number that looks useful there is "average 
bytes of WAL per transaction".  On smaller database this is around 6K, 
while larger databases topped out for me at around 22K WAL 
bytes/transaction.  Remember that the pgbench transaction is several 
statements.  Updates touch different blocks in pgbench_accounts, index 
blocks, and the small tables.


The WAL increase from checksumming is a bit more consistent than the TPS 
rates.  Many cases were 3 to 5%.  There was one ugly case were it hit 
30%, and I want to dig into where that came from more.  On average, 
again it was a 2% increase over the baseline.


Cases where you spew hint bit WAL data where before none were written 
(Jeff's test #3) remain a far worst performer than any of these.  Since 
pgbench does a VACUUM before starting, none of those cases were 
encountered here though.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com
<><>

Checksum-pgbench.xls
Description: MS-Excel spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Stark

On Wed, Mar 6, 2013 at 11:04 PM, Robert Haas  wrote:
>> When we first talked about this feature for
>> 9.2, we were going to exclude hint bits from checksums, in order to
>> avoid this issue; what happened to that?
>
> I don't think anyone ever thought that was a particularly practical
> design.  I certainly don't.

Really? I thought it was pretty much the consensus for a good while.

The main problem it ran into was that we kept turning up hint bits
that we didn't realize we had. Index line pointers turned out to have
hint bits, page headers have one, and so on. As long as it was just
the heap page per-tuple transaction hint bits it seemed plausible to
just skip them or move them all to a contiguous blocks. Once it
started to look like the checksumming code had to know about every
data structure on every page it seemed a bit daunting. But that wasn't
something we realized for quite a long time.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Smith


On 3/6/13 1:24 PM, Tom Lane wrote:

Andres Freund  writes:

On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:

If picking a CRC why not a short optimal one rather than truncate CRC32C?



CRC32C is available in hardware since SSE4.2.


I think that should be at most a fourth-order consideration, since we
are not interested solely in Intel hardware, nor do we have any portable
way of getting at such a feature even if the hardware has it.


True, but that situation might actually improve.

The Castagnoli CRC-32C that's accelerated on the better Intel CPUs is 
also used to protect iSCSI and SCTP (a streaming protocol).  And there 
is an active project to use a CRC32C to checksum ext4 metadata blocks on 
Linux:  https://ext4.wiki.kernel.org/index.php/Ext4_Metadata_Checksums 
https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/APKfoMzjgdY


Now, that project doesn't make the Postgres feature obsolete, because 
there's nowhere to put checksum data for every block on ext4 without 
whacking block alignment.  The filesystem can't make an extra 32 bits 
appear on every block any more than we can.  It's using a similar trick 
to the PG checksum feature, grabbing some empty space just for the 
metadata then shoving the CRC32C into there.  But the fact that this is 
going on means that there are already Linux kernel modules built with 
both software/hardware accelerated versions of the CRC32C function.  And 
the iSCSI/SCTP use cases means it's not out of the question this will 
show up in other useful forms one day.  Maybe two years from now, there 
will be a common Linux library that autoconf can find to compute the CRC 
for us--with hardware acceleration when available, in software if not.


The first of those ext4 links above even discusses the exact sort of 
issue we're facing.  The author wonders if the easiest way to proceed 
for 16 bit checksums is to compute the CRC32C, then truncate it, simply 
because CRC32C creation is so likely to get hardware help one day.  I 
think that logic doesn't really apply to the PostgreSQL case as strongly 
though, as the timetime before we can expect a hardware accelerated 
version to be available is much further off than a Linux kernel 
developer's future.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Smith


On 3/6/13 6:34 AM, Heikki Linnakangas wrote:

Another thought is that perhaps something like CRC32C would be faster to
calculate on modern hardware, and could be safely truncated to 16-bits
using the same technique you're using to truncate the Fletcher's
Checksum. Greg's tests showed that the overhead of CRC calculation is
significant in some workloads, so it would be good to spend some time to
optimize that. It'd be difficult to change the algorithm in a future
release without breaking on-disk compatibility, so let's make sure we
pick the best one.


Simon sent over his first rev of this using a quick to compute 16 bit 
checksum as a reasonable trade-off, one that it's possible to do right 
now.  It's not optimal in a few ways, but it catches single bit errors 
that are missed right now, and Fletcher-16 computes quickly and without 
a large amount of code.  It's worth double-checking that the code is 
using the best Fletcher-16 approach available.  I've started on that, 
but I'm working on your general performance concerns first, with the 
implementation that's already there.


From what I've read so far, I think picking Fletcher-16 instead of the 
main alternative, CRC-16-IBM AKA CRC-16-ANSI, is a reasonable choice. 
There's a good table showing the main possibilities here at 
https://en.wikipedia.org/wiki/Cyclic_redundancy_check


One day I hope that in-place upgrade learns how to do page format 
upgrades, with the sort of background conversion tools and necessary 
tracking metadata we've discussed for that work.  When that day comes, I 
would expect it to be straightforward to upgrade pages from 16 bit 
Fletcher checksums to 32 bit CRC-32C ones.  Ideally we would be able to 
jump on the CRC-32C train today, but there's nowhere to put all 32 bits. 
 Using a Fletcher 16 bit checksum for 9.3 doesn't prevent the project 
from going that way later though, once page header expansion is a solved 
problem.


The problem with running CRC32C in software is that the standard fast 
approach uses a "slicing" technique that requires a chunk of 
pre-computed data be around, a moderately large lookup table.  I don't 
see that there's any advantage to having all that baggage around if 
you're just going to throw away half of the result anyway.  More on 
CRC32Cs in my next message.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks

2013-03-06 Thread Robert Haas

On Sun, Jan 27, 2013 at 1:55 AM, Kohei KaiGai  wrote:
> The part-2 patch adds new OAT_POST_ALTER event type, and
> its relevant permission checks on contrib/sepgsql.

This documentation hunk is unclear:

+On , install permission
+will be checked if leakproof attribute was given, not only
+create on the new function.

Based on previous experience reading your patches, I'm guessing that
what you actually mean is that both things are checked, but the
wording doesn't make that clear.  Also, if permissions are now checked
on functions, doesn't the previous sentence need an update?

+In addition, add_name and remove_name permission
+will be checked towards relevant schema when we try to rename or set
+new schema on the altered object.

Suggest: In addition, remove_name and add_name
will be checked on the old and new schemas, respectively, when an
object is moved to a new schema.

+A few additional checks are applied depending on object types.

For certain object types, additional checks are performed.

+On , install permission
+will be checked if leakproof attribute was turned on, not
+only setattr on the new function.

This is a near-duplicate of the previous hunk and suffers from the
same awkwardness.

+ * is_internal: TRUE if constraint is constructed unless user's intention

I dunno what this means.  What's the difference between an internal
constraint and a non-internal constraint, and why do we need that
distinction?  This seems to percolate to a lot of places; I'd rather
not do that without a real good reason.

+   /* XXX - should be checked at caller side */

XXX should be used only for things that really ought to be revisited
and changed.  See the wording I used in the just-committed part 1
patch.

+#include "catalog/objectaccess.h"

This is the only hunk in collationcmds.c, hence presumably not needed.

+   /* Post create hook of this access method operator */
+   InvokeObjectPostCreateHook(AccessMethodOperatorRelationId,
+  entryoid, 0);

I suggest uniformly adding a blank line before each of these hunks,
rather than adding it for some and not others.  I think, though, that
we could probably ditch the comments throughout.  They don't add
anything, really.

@@ -3330,7 +3342,6 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab,
Relation rel,
 */
break;
case AT_SetTableSpace:  /* SET TABLESPACE */
-
/*
 * Nothing to do here; Phase 3 does the work
 */

Spurious whitespace hunk.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement

2013-03-06 Thread Jim Nasby


On 2/2/13 3:23 AM, Pavel Stehule wrote:

Hello

I propose enhancing GET DIAGNOSTICS statement about new field
PG_CONTEXT. It is similar to GET STACKED DIAGNOSTICS'
PG_EXCEPTION_CONTEXT.

Motivation for this proposal is possibility to get  call stack for
debugging without raising exception.

This code is based on cleaned code from Orafce, where is used four
years without any error reports.

CREATE OR REPLACE FUNCTION public."inner"(integer)
  RETURNS integer
  LANGUAGE plpgsql
AS $function$
declare _context text;
begin
   get diagnostics _context = pg_context;
   raise notice '***%***', _context;
   return 2 * $1;
end;
$function$

postgres=# select outer_outer(10);
NOTICE:  ***PL/pgSQL function "inner"(integer) line 4 at GET DIAGNOSTICS
PL/pgSQL function "outer"(integer) line 3 at RETURN
PL/pgSQL function outer_outer(integer) line 3 at RETURN***
CONTEXT:  PL/pgSQL function "outer"(integer) line 3 at RETURN
PL/pgSQL function outer_outer(integer) line 3 at RETURN


That could *really* stand for some indentation in the output instead of the *** 
business. But other than that, this definitely seems useful. I had no idea 
_context was even an option... :/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using indexes for partial index builds

2013-03-06 Thread Jim Nasby


On 2/2/13 4:05 AM, Paul Norman wrote:

Hello,
After a discussion on IRC in #postgresql, I had a feature suggestion and it
was suggested I write it up here.

I have a large (200GB, 1.7b rows) table with a number of columns, but the
two of interest here are a hstore column, tags and a postgis geometry
column, geom.

There is a GIN index on tags and a gist index on geom. These took about
36-48 hours to build in total. Obviously index building on a table this size
is not trivial.

Periodically I want to do a number of specialized queries on objects with a
particular tag or in a particular area. To do this I often want to create a
partial index. For example, I created the index btree ((tags ->
'name_1'::text) text_pattern_ops) WHERE tags ? 'name_1'::text. My
understanding is to create this index PostgreSQL does a scan of the entire
table, even though the GIN index on tags could be used to identify which
rows could belong in the index. Where the WHERE condition selects only a
small portion of the table this is scanning a lot more data than is
necessary.

Another case where it would be useful is when I am conducting a detailed
analysis of some aspect of the rows in a particular city. This leads to all
the queries being of the form SELECT ... FROM ... WHERE
is_in_my_area(geom)[1].

My current project is doing analysis involving addresses. The ability to
create an index like btree((tags -> 'addr:housenumber'), (tags ->
'addr:street'), (tags -> 'addr:city')) WHERE is_in_my_area(geom) in a
reasonable time would allow me to use a view instead of copying the local
area to a temporary table and indexing that table. The local area is about
350k rows, or about 0.02% of the database.

[1] The actual function for determining if it's in my area is long and not
really essential to the point here.


Something worth considering on this... I suspect it's possible to use an 
index-only scan to do this, regardless of whether the heap page is all visible. 
The reason is that the newly created index would just use the same access 
methodology as the original index, so any dead rows would be ignored.

We'd almost certainly need to block vacuums during the build however. Obviously 
not an issue for regular index builds, but it would be for concurrent ones.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Jim Nasby


On 3/6/13 1:14 PM, Josh Berkus wrote:



There may be good reasons to reject this patch.  Or there may not.
But I completely disagree with the idea that asking them to solve the
problem at the filesystem level is sensible.


Yes, can we get back to the main issues with the patch?

1) argument over whether the checksum is sufficient to detect most
errors, or if it will give users false confidence.

2) performance overhead.

Based on Smith's report, I consider (2) to be a deal-killer right now.
The level of overhead reported by him would prevent the users I work
with from ever employing checksums on production systems.


FWIW, the write workload most likely wouldn't be a problem for us. I am 
concerned about the reported 24-32% hit when reading back in from FS cache... 
that might kill this for us.

I'm working on doing a test to see how bad it actually is for us... but getting 
stuff like that done at work is like pulling teeth, so we'll see...


Specifically, the writing checksums for a read-only query is a defect I
think is prohibitively bad.  When we first talked about this feature for
9.2, we were going to exclude hint bits from checksums, in order to
avoid this issue; what happened to that?

(FWIW, I still support the idea of moving hint bits to a separate
filehandle, as we do with the FSM, but clearly that's not happening for
9.3 ...)


+1


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Jim Nasby


On 3/4/13 7:04 PM, Daniel Farina wrote:

Corruption has easily occupied more than one person-month of time last
year for us.


Just FYI for anyone that's experienced corruption... we've looked into doing 
row-level checksums at work. The only challenge we ran into was how to check 
them when reading data back. I don't remember the details but there was an 
issue with doing this via SELECT rules. It would be possible if you were 
willing to put writable views on all your tables (which isn't actually as 
horrible as it sounds; it wouldn't be hard to write a function to automagically 
do that for you).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [v9.3] OAT_POST_ALTER object access hooks

2013-03-06 Thread Robert Haas

On Sun, Jan 27, 2013 at 1:55 AM, Kohei KaiGai  wrote:
> The part-1 patch adds catalog/objectaccess.c to have entrypoints of
> object_access_hook, instead of simple macro definition, to simplify
> invocations with arguments. It is just a replacement of existing
> OAT_POST_CREATE and OAT_DROP invocation with new style,

Looks good.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-03-06 Thread Robert Haas

On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost  wrote:
> * Boszormenyi Zoltan (z...@cybertec.at) wrote:
>> But unlike statement_timeout,
>> with lock_timeout_stmt the statement can still finish after this limit
>> as it does useful work besides waiting for locks.
>
> It's still entirely possible to get 99% done and then hit that last
> tuple that you need a lock on and just tip over the lock_timeout_stmt
> limit due to prior waiting and ending up wasting a bunch of work, hence
> why I'm not entirely sure that this is that much better than
> statement_timeout.

I tend to agree that this should be based on the length of any
individual lock wait, not the cumulative duration of lock waits.
Otherwise, it seems like it'll be very hard to set this to a
meaningful value.  For example, if you set this to 1 minute, and that
means the length of any single wait, then you basically know that
it'll only kick in if there is some other, long-running transaction
that's holding the lock.  But if it means the cumulative length of all
waits, it's not so clear, because now you might also have this kick in
if you wait for 100ms on 600 different occasions.  In other words,
complex queries that lock or update many tuples may get killed even if
they never wait very long at all for any single lock.  That seems like
it will be almost indistinguishable from random, unprincipled query
cancellations.

Now, you could try to work around that by varying the setting based on
the complexity of the query you're about to run, but that seems like a
pain in the neck, to put it mildly.  And it might still not give you
the behavior that you want.  Personally, I'd think a big part of the
appeal of this is to make sure that you don't hang waiting for a
RELATION level lock for too long before giving up.  And for that,
scaling with the complexity of the query would be exactly the wrong
thing to do, even if you could figure out some system for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Michael Paquier

On Thu, Mar 7, 2013 at 9:48 AM, Michael Paquier
wrote:

>
>
> On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund wrote:
>
>> On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
>> > On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao 
>> wrote:
>> >
>> > > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund > >
>> > > wrote:
>> > > >> Indexes:
>> > > >> "hoge_pkey" PRIMARY KEY, btree (i)
>> > > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>> > > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
>> > > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>> > > >
>> > > > Huh, why did that go through? It should have errored out?
>> > >
>> > > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
>> > > be marked as invalid, I think.
>> > >
>> > CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
>> > case process is interrupted by user. This has been mentioned in a pas
>> > review but it was missing, so it might have slipped out during a
>> > refactoring or smth. Btw, I am surprised to see that this *_cct_cct
>> index
>> > has been created knowing that hoge_pkey_cct is invalid. I tried with the
>> > latest version of the patch and even the patch attached but couldn't
>> > reproduce it.
>>
>> The strange think about "hoge_pkey_cct_cct" is that it seems to imply
>> that an invalid index was reindexed concurrently?
>>
>> But I don't see how it could happen either. Fujii, can you reproduce it?
>>
> Curious about that also.
>
>
>> > >> +The recommended recovery method in such cases is to drop the
>> > > concurrent
>> > > >> +index and try again to perform REINDEX
>> CONCURRENTLY.
>> > > >>
>> > > >> If an invalid index depends on the constraint like primary key,
>> "drop
>> > > >> the concurrent
>> > > >> index" cannot actually drop the index. In this case, you need to
>> issue
>> > > >> "alter table
>> > > >> ... drop constraint ..." to recover the situation. I think this
>> > > >> informataion should be
>> > > >> documented.
>> > > >
>> > > > I think we just shouldn't set ->isprimary on the temporary indexes.
>> Now
>> > > > we switch only the relfilenodes and not the whole index, that
>> should be
>> > > > perfectly fine.
>> > >
>> > > Sounds good. But, what about other constraint case like unique
>> constraint?
>> > > Those other cases also can be resolved by not setting ->isprimary?
>> > >
>> > We should stick with the concurrent index being a twin of the index it
>> > rebuilds for consistency.
>>
>> I don't think its legal. We cannot simply have two indexes with
>> 'indisprimary'. Especially not if bot are valid.
>> Also, there will be no pg_constraint row that refers to it which
>> violates very valid expectations that both users and pg may have.
>>
> So what to do with that?
> Mark the concurrent index as valid, then validate it and finally mark it
> as invalid inside the same transaction at phase 4?
> That's moving 2 lines of code...
>
Sorry phase 4 is the swap phase. Validation happens at phase 3.
-- 
Michael

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Craig Ringer

On 03/07/2013 08:41 AM, Andres Freund wrote:
> On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
>> On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
>>> It'd be difficult to change the algorithm in a future release without
>>> breaking on-disk compatibility,
>> On-disk compatibility is broken with major releases anyway, so I don't
>> see this as a huge barrier.
> Uh, pg_upgrade?
Yeah. I was thinking that pg_upgrade copes with a lot of
incompatibilities already, but this is lower-level. Darn.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Michael Paquier

On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund wrote:

> On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
> > On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao 
> wrote:
> >
> > > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund 
> > > wrote:
> > > >> Indexes:
> > > >> "hoge_pkey" PRIMARY KEY, btree (i)
> > > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> > > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> > > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> > > >
> > > > Huh, why did that go through? It should have errored out?
> > >
> > > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> > > be marked as invalid, I think.
> > >
> > CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
> > case process is interrupted by user. This has been mentioned in a pas
> > review but it was missing, so it might have slipped out during a
> > refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
> > has been created knowing that hoge_pkey_cct is invalid. I tried with the
> > latest version of the patch and even the patch attached but couldn't
> > reproduce it.
>
> The strange think about "hoge_pkey_cct_cct" is that it seems to imply
> that an invalid index was reindexed concurrently?
>
> But I don't see how it could happen either. Fujii, can you reproduce it?
>
Curious about that also.


> > >> +The recommended recovery method in such cases is to drop the
> > > concurrent
> > > >> +index and try again to perform REINDEX
> CONCURRENTLY.
> > > >>
> > > >> If an invalid index depends on the constraint like primary key,
> "drop
> > > >> the concurrent
> > > >> index" cannot actually drop the index. In this case, you need to
> issue
> > > >> "alter table
> > > >> ... drop constraint ..." to recover the situation. I think this
> > > >> informataion should be
> > > >> documented.
> > > >
> > > > I think we just shouldn't set ->isprimary on the temporary indexes.
> Now
> > > > we switch only the relfilenodes and not the whole index, that should
> be
> > > > perfectly fine.
> > >
> > > Sounds good. But, what about other constraint case like unique
> constraint?
> > > Those other cases also can be resolved by not setting ->isprimary?
> > >
> > We should stick with the concurrent index being a twin of the index it
> > rebuilds for consistency.
>
> I don't think its legal. We cannot simply have two indexes with
> 'indisprimary'. Especially not if bot are valid.
> Also, there will be no pg_constraint row that refers to it which
> violates very valid expectations that both users and pg may have.
>
So what to do with that?
Mark the concurrent index as valid, then validate it and finally mark it as
invalid inside the same transaction at phase 4?
That's moving 2 lines of code...
-- 
Michael

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Greg Smith


On 3/6/13 1:34 PM, Robert Haas wrote:

We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS.  Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not have that effect.


There are a couple of major tuning issues you have to get right for good 
ZFS performance, like its tendency to gobble more RAM than is 
necessarily appropriate for a PostgreSQL host.  If you nail down all 
those and carefully setup everything it can work OK.  When Sun had a 
bunch of good engineers working on the problem they certainly pulled it 
off.  I managed a 3TB database on a ZFS volume for a while myself. 
Being able to make filesystem snapshots cleanly and easily was very nice.


As for the write performance implications of COW, though, at a couple of 
points I was only able to keep that system ingesting data fast enough if 
I turned fsync off :(  It's not as if even ZFS makes all the filesystem 
issues the database worries about go away either.  Take a look at 
http://www.c0t0d0s0.org/archives/6071-No,-ZFS-really-doesnt-need-a-fsck.html 
as an example.  That should leave you with a healthy concern over ZFS 
handling of power interruption and lying drives.  "[NTFS and ext3] have 
the same problem, but it has different effects, that aren't as visible 
as in ZFS."  ext4 actually fixed this for most hardware though, and I 
believe ZFS still has the same uberblock concern.  ZFS reliability and 
its page checksums are good, but they're not magic for eliminating torn 
page issues.


Normally I would agree with Heikki's theory of "let's wait a few years 
and see if the filesystem will take care of it" idea.  But for me, the 
"when do we get checksums?" clock started ticking in 2006 when ZFS 
popularized its implementation, and now it's gone off and it keeps 
ringing at new places.  I would love it if FreeBSD had caught a massive 
popularity wave in the last few years, so ZFS was running in a lot more 
places.  Instead what I keep seeing is deployments Linux with filesystem 
choices skewed toward conservative.  Forget about the leading edge--I'd 
be happy if I could get one large customer to migrate off of ext3...


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Andres Freund

On 2013-03-07 08:37:40 +0800, Craig Ringer wrote:
> On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
> > It'd be difficult to change the algorithm in a future release without
> > breaking on-disk compatibility,
> On-disk compatibility is broken with major releases anyway, so I don't
> see this as a huge barrier.

Uh, pg_upgrade?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Craig Ringer

On 03/06/2013 07:34 PM, Heikki Linnakangas wrote:
> It'd be difficult to change the algorithm in a future release without
> breaking on-disk compatibility,
On-disk compatibility is broken with major releases anyway, so I don't
see this as a huge barrier.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Small patch for "CREATE TRIGGER" documentation

2013-03-06 Thread Ian Lawrence Barwick

I found this sentence somewhat confusing:

"It is possible for a column's value to change even when the trigger
is not fired,"

http://www.postgresql.org/docs/devel/static/sql-createtrigger.html#SQL-CREATETRIGGER-NOTES

More precise would be something along the lines "It is possible that
changes to the
column's value will not cause the trigger to be fired".

The attached patch hopefully rewords the entire paragraph for clarity.

Regards

Ian Barwick


create-trigger-doc-notes-2013-03-07.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] odd behavior in materialized view

2013-03-06 Thread Kevin Grittner

Fujii Masao  wrote:
> On Tue, Mar 5, 2013 at 7:36 AM, Kevin Grittner  wrote:
>> Fujii Masao  wrote:
>>
>>> When I accessed the materialized view in the standby server,
>>
>>> I got the following ERROR message. Looks odd to me. Is this a bug?
>>>
>>> ERROR:  materialized view "hogeview" has not been populated
>>> HINT:  Use the REFRESH MATERIALIZED VIEW command.
>>>
>>> The procedure to reproduce this error message is:
>>>
>>> In the master server:
>>> CREATE TABLE hoge (i int);
>>> INSERT INTO hoge VALUES (generate_series(1,100));
>>> CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
>>> DELETE FROM hoge;
>>> REFRESH MATERIALIZED VIEW hogeview;
>>> SELECT count(*) FROM hogeview;
>>>
>>> In the standby server
>>> SELECT count(*) FROM hogeview;
>>>
>>> SELECT count(*) goes well in the master, and expectedly returns 0.
>>> OTOH, in the standby, it emits the error message.
>>
>> Will investigate.
>
> Thanks!
>
> And I found another problem. When I ran the following SQLs in the master,
> PANIC error occurred in the standby.
>
> CREATE TABLE hoge (i int);
> INSERT INTO hoge VALUES (generate_series(1,100));
> CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
> VACUUM ANALYZE;
>
> The PANIC error messages that I got in the standby are
>
> WARNING:  page 0 of relation base/12297/16387 is uninitialized
> CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0
> PANIC:  WAL contains references to invalid pages
> CONTEXT:  xlog redo visible: rel 1663/12297/16387; blk 0
>
> base/12297/16387 is the file of the materialized view 'hogeview'.

I was able to replicate both bugs, and they both appear to be fixed
by the attached, which I have just pushed.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company*** a/src/backend/commands/matview.c
--- b/src/backend/commands/matview.c
***
*** 14,19 
--- 14,20 
   */
  #include "postgres.h"
  
+ #include "access/heapam_xlog.h"
  #include "access/multixact.h"
  #include "access/relscan.h"
  #include "access/xact.h"
***
*** 68,77  SetRelationIsScannable(Relation relation)
  	Assert(relation->rd_rel->relkind == RELKIND_MATVIEW);
  	Assert(relation->rd_isscannable == false);
  
- 	RelationOpenSmgr(relation);
  	page = (Page) palloc(BLCKSZ);
  	PageInit(page, BLCKSZ, 0);
  	smgrextend(relation->rd_smgr, MAIN_FORKNUM, 0, (char *) page, true);
  	pfree(page);
  
  	smgrimmedsync(relation->rd_smgr, MAIN_FORKNUM);
--- 69,83 
  	Assert(relation->rd_rel->relkind == RELKIND_MATVIEW);
  	Assert(relation->rd_isscannable == false);
  
  	page = (Page) palloc(BLCKSZ);
  	PageInit(page, BLCKSZ, 0);
+ 
+ 	if (RelationNeedsWAL(relation))
+ 		log_newpage(&(relation->rd_node), MAIN_FORKNUM, 0, page);
+ 
+ 	RelationOpenSmgr(relation);
  	smgrextend(relation->rd_smgr, MAIN_FORKNUM, 0, (char *) page, true);
+ 
  	pfree(page);
  
  	smgrimmedsync(relation->rd_smgr, MAIN_FORKNUM);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Tom Lane

Andres Freund  writes:
> On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
>> If picking a CRC why not a short optimal one rather than truncate CRC32C?

> CRC32C is available in hardware since SSE4.2.

I think that should be at most a fourth-order consideration, since we
are not interested solely in Intel hardware, nor do we have any portable
way of getting at such a feature even if the hardware has it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Joshua D. Drake



On 03/06/2013 03:06 PM, Robert Haas wrote:


On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus  wrote:

We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS.  Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not have that effect.
So I think telling people who want checksums "go use ZFS" is a lot
like telling them "oh, I see you have a hangnail, we recommend that
you solve that by cutting your arm off with a rusty saw".


Wow, what platform are you using ZFS on?

(we have a half-dozen clients on ZFS ...)


Not us, customers.  But as to platform, I have yet to run across
anyone running ZFS on anything but Solaris.  I'd be interested to hear
your experiences.   Mine rhyme with "sun a play dreaming".


I would guess he meant on X86_64 or Sparc.

JD







--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Robert Haas

On Wed, Mar 6, 2013 at 6:00 PM, Josh Berkus  wrote:
>> We've had a few EnterpriseDB customers who have had fantastically
>> painful experiences with PostgreSQL + ZFS.  Supposedly, aligning the
>> ZFS block size to the PostgreSQL block size is supposed to make these
>> problems go away, but in my experience it does not have that effect.
>> So I think telling people who want checksums "go use ZFS" is a lot
>> like telling them "oh, I see you have a hangnail, we recommend that
>> you solve that by cutting your arm off with a rusty saw".
>
> Wow, what platform are you using ZFS on?
>
> (we have a half-dozen clients on ZFS ...)

Not us, customers.  But as to platform, I have yet to run across
anyone running ZFS on anything but Solaris.  I'd be interested to hear
your experiences.   Mine rhyme with "sun a play dreaming".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Robert Haas

On Wed, Mar 6, 2013 at 2:14 PM, Josh Berkus  wrote:
> Based on Smith's report, I consider (2) to be a deal-killer right now.

I was pretty depressed by those numbers, too.

> The level of overhead reported by him would prevent the users I work
> with from ever employing checksums on production systems.

Agreed.

> Specifically, the writing checksums for a read-only query is a defect I
> think is prohibitively bad.

That particular part doesn't bother me so much as some of the others -
but let's step back and look at the larger issue.  I suspect we can
all agree that the performance of this feature is terrible.  The
questions I think we should be asking are:

1. Are the problems fundamental, or things where we can reasonable
foresee future improvement?  The latter situation wouldn't bother me
very much even if the current situation is pretty bad, but if there's
no real hope of improvement, that's more of a problem.

2. Are the performance results sufficiently bad that we think this
would be more of a liability than an asset?

> When we first talked about this feature for
> 9.2, we were going to exclude hint bits from checksums, in order to
> avoid this issue; what happened to that?

I don't think anyone ever thought that was a particularly practical
design.  I certainly don't.

> (FWIW, I still support the idea of moving hint bits to a separate
> filehandle, as we do with the FSM, but clearly that's not happening for
> 9.3 ...)

Or, most likely, ever.  The whole benefit of hint bits is that the
information you need is available in the same bytes you have to read
anyway.  Moving the information to another fork (not filehandle) would
probably give up most of the benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Josh Berkus

Robert,

> We've had a few EnterpriseDB customers who have had fantastically
> painful experiences with PostgreSQL + ZFS.  Supposedly, aligning the
> ZFS block size to the PostgreSQL block size is supposed to make these
> problems go away, but in my experience it does not have that effect.
> So I think telling people who want checksums "go use ZFS" is a lot
> like telling them "oh, I see you have a hangnail, we recommend that
> you solve that by cutting your arm off with a rusty saw".

Wow, what platform are you using ZFS on?

(we have a half-dozen clients on ZFS ...)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Tatsuo Ishii

> Kevin Grittner  wrote:
>> Tatsuo Ishii  wrote:
>>
>>> Was the remaining work on docs done? I would like to test MVs and
>>> am waiting for the docs completed.
>>
>> I think they are done.  If you notice anything missing or in need
>> of clarification please let me know.  At this point missing docs
>> would be a bug in need of a fix.

Ok.

> I decided to take another pass through to try to spot anyplace I
> might have missed.  I found that I had missed documenting the new
> pg_matviews system view, so I have just pushed that.
> 
> I also think that something should be done about the documentation
> for indexes.  Right now that always refers to a "table".  It would
> clearly be awkward to change that to "table or materialized view"
> everywhere.  I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better.  Opinions welcome.

You might want to add small description about MVs to Tutorial
documentation "3.2 Views". Here is the current description of views in
the doc.

-
3.2. Views

Refer back to the queries in Section 2.6. Suppose the combined listing
of weather records and city location is of particular interest to your
application, but you do not want to type the query each time you need
it. You can create a view over the query, which gives a name to the
query that you can refer to like an ordinary table:

CREATE VIEW myview AS
SELECT city, temp_lo, temp_hi, prcp, date, location
FROM weather, cities
WHERE city = name;

SELECT * FROM myview;

Making liberal use of views is a key aspect of good SQL database
design. Views allow you to encapsulate the details of the structure of
your tables, which might change as your application evolves, behind
consistent interfaces.

Views can be used in almost any place a real table can be
used. Building views upon other views is not uncommon.
-
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread David E. Wheeler

On Mar 6, 2013, at 1:51 PM, Kevin Grittner  wrote:

> I also think that something should be done about the documentation
> for indexes.  Right now that always refers to a "table".  It would
> clearly be awkward to change that to "table or materialized view"
> everywhere.  I wonder if most of thosse should be changed to
> "relation" with a few mentions that the relation could be a table
> or a materialized view, or whether some less intrusive change would
> be better.  Opinions welcome.

Isn’t a materialized view really just a table that gets updated periodically? 
And isn’t a non-matierialized view also thought of as a “relation”?

If the answer to both those questions is “yes,” I think the term should remain 
“table,” with a few mentions that the term includes materialized views (and 
excludes foreign tables).

Best,

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
> On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao  wrote:
> 
> > On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund 
> > wrote:
> > >> Indexes:
> > >> "hoge_pkey" PRIMARY KEY, btree (i)
> > >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> > >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> > >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> > >
> > > Huh, why did that go through? It should have errored out?
> >
> > I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> > be marked as invalid, I think.
> >
> CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
> case process is interrupted by user. This has been mentioned in a pas
> review but it was missing, so it might have slipped out during a
> refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
> has been created knowing that hoge_pkey_cct is invalid. I tried with the
> latest version of the patch and even the patch attached but couldn't
> reproduce it.

The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?

But I don't see how it could happen either. Fujii, can you reproduce it?
 
> >> +The recommended recovery method in such cases is to drop the
> > concurrent
> > >> +index and try again to perform REINDEX CONCURRENTLY.
> > >>
> > >> If an invalid index depends on the constraint like primary key, "drop
> > >> the concurrent
> > >> index" cannot actually drop the index. In this case, you need to issue
> > >> "alter table
> > >> ... drop constraint ..." to recover the situation. I think this
> > >> informataion should be
> > >> documented.
> > >
> > > I think we just shouldn't set ->isprimary on the temporary indexes. Now
> > > we switch only the relfilenodes and not the whole index, that should be
> > > perfectly fine.
> >
> > Sounds good. But, what about other constraint case like unique constraint?
> > Those other cases also can be resolved by not setting ->isprimary?
> >
> We should stick with the concurrent index being a twin of the index it
> rebuilds for consistency.

I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.

> Also, I think that it is important from the session viewpoint to perform a
> swap with 2 valid indexes. If the process fails just before swapping
> indexes user might want to do that himself and drop the old index, then use
> the concurrent one.

The most likely outcome will be to rerun REINDEX CONCURRENTLY. Which
will then reindex one more index since it now has the old valid index
and the new valid index. Also, I don't think its fair game to expose
indexes that used to belong to a constraint without a constraint
supporting it as valid indexes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Kevin Grittner

Kevin Grittner  wrote:
> Tatsuo Ishii  wrote:
>
>> Was the remaining work on docs done? I would like to test MVs and
>> am waiting for the docs completed.
>
> I think they are done.  If you notice anything missing or in need
> of clarification please let me know.  At this point missing docs
> would be a bug in need of a fix.

I decided to take another pass through to try to spot anyplace I
might have missed.  I found that I had missed documenting the new
pg_matviews system view, so I have just pushed that.

I also think that something should be done about the documentation
for indexes.  Right now that always refers to a "table".  It would
clearly be awkward to change that to "table or materialized view"
everywhere.  I wonder if most of thosse should be changed to
"relation" with a few mentions that the relation could be a table
or a materialized view, or whether some less intrusive change would
be better.  Opinions welcome.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-07 02:34:54 +0900, Fujii Masao wrote:
> On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund  wrote:
> >> Indexes:
> >> "hoge_pkey" PRIMARY KEY, btree (i)
> >> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> >> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
> >
> > Huh, why did that go through? It should have errored out?
> 
> I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
> be marked as invalid, I think.

Hm. Yea.

I am still not sure yet why hoge_pkey_cct_cct sprung into existance, but
that hoge_pkey_cct1 springs into existance makes sense.

I see a problem here, there is a moment here between phase 3 and 4 where
both the old and the new indexes are valid and ready. Thats not good
because if we abort in that moment we essentially have doubled the
amount of indexes.

Options:
a) we live with it
b) we only mark the new index as valid within phase 4. That should be
   fine I think?
c) we invent some other state to mark indexes that are in-progress to
   replace another one.

I guess b) seems fine?

> >> +The recommended recovery method in such cases is to drop the 
> >> concurrent
> >> +index and try again to perform REINDEX CONCURRENTLY.
> >>
> >> If an invalid index depends on the constraint like primary key, "drop
> >> the concurrent
> >> index" cannot actually drop the index. In this case, you need to issue
> >> "alter table
> >> ... drop constraint ..." to recover the situation. I think this
> >> informataion should be
> >> documented.
> >
> > I think we just shouldn't set ->isprimary on the temporary indexes. Now
> > we switch only the relfilenodes and not the whole index, that should be
> > perfectly fine.
> 
> Sounds good. But, what about other constraint case like unique constraint?
> Those other cases also can be resolved by not setting ->isprimary?

Unique indexes can exist without a constraint attached, so thats fine. I
need to read a bit more code whether its safe to unset it, although
indisexclusion, indimmediate might be more important.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in tm2timestamp

2013-03-06 Thread Alvaro Herrera

Michael Meskes wrote:
> On Mon, Mar 04, 2013 at 05:08:26PM -0300, Alvaro Herrera wrote:
> > Another point worth considering is that most of this is duplicated by
> > ecpg's libpgtypes.  Do we want to fix that one too, or do we just let it
> > continue to be broken?  I note that other bugs are already unfixed in
> > ecpg's copy.  One other idea to consider is moving these things to
> 
> Meaning that a fix wasn't put there, too?

Yes, a fix was put there by Tom (which is why I retracted my comment
initially).  I did note that the ecpg code has diverged from the backend
code; it's not unlikely that other bug fixes have not gone to the ecpg
copy.  But I didn't investigate each difference in detail.

> > But in light of this bug and other already fixed date/time bugs, perhaps
> > it's warranted?  Opinions please.
> 
> I'd love to go to a single source. Most of libpgtypes was taken from the
> backend back when it was developed.

I will keep that in mind, if I get back to moving the timestamp/datetime
code to src/common.  It's not a high priority item right now.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in tm2timestamp

2013-03-06 Thread Michael Meskes

On Mon, Mar 04, 2013 at 05:55:26PM -0300, Alvaro Herrera wrote:
> error codes for the caller to figure out. Maybe we could create a layer
> on top of ereport, that gets both the error message, sqlstate etc, and
> ...

Couldn't we just create ecpg's own version of ereport, that does "the right
thing" for ecpg?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Bug in tm2timestamp

2013-03-06 Thread Michael Meskes

On Mon, Mar 04, 2013 at 05:08:26PM -0300, Alvaro Herrera wrote:
> Another point worth considering is that most of this is duplicated by
> ecpg's libpgtypes.  Do we want to fix that one too, or do we just let it
> continue to be broken?  I note that other bugs are already unfixed in
> ecpg's copy.  One other idea to consider is moving these things to

Meaning that a fix wasn't put there, too?

> But in light of this bug and other already fixed date/time bugs, perhaps
> it's warranted?  Opinions please.

I'd love to go to a single source. Most of libpgtypes was taken from the
backend back when it was developed.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Michael at BorussiaFan dot De, Meskes at (Debian|Postgresql) dot Org
Jabber: michael.meskes at gmail dot com
VfL Borussia! Força Barça! Go SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Michael Paquier

On Thu, Mar 7, 2013 at 2:09 AM, Fujii Masao  wrote:

> On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
>  wrote:
> > OK. Patches updated... Please see attached.
>
> I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
> I found that the index which was not marked as INVALID remained
> unexpectedly.
>
> =# CREATE TABLE hoge (i int primary key);
> CREATE TABLE
> =# INSERT INTO hoge VALUES (generate_series(1,10));
> INSERT 0 10
> =# SET statement_timeout TO '1s';
> SET
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR:  canceling statement due to statement timeout
> =# \d hoge
>  Table "public.hoge"
>  Column |  Type   | Modifiers
> +-+---
>  i  | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR:  canceling statement due to statement timeout
> =# \d hoge
>  Table "public.hoge"
>  Column |  Type   | Modifiers
> +-+---
>  i  | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>
Invalid indexes cannot be reindexed concurrently and are simply bypassed
during process, so _cct_cct has no reason to exist. For example here is
what I get with a relation having an invalid index:
ioltas=# \d aa
  Table "public.aa"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
Indexes:
"aap" btree (a)
"aap_cct" btree (a) INVALID

ioltas=# reindex table concurrently aa;
WARNING:  cannot reindex concurrently invalid index "public.aap_cct",
skipping
REINDEX


> +The recommended recovery method in such cases is to drop the
> concurrent
> +index and try again to perform REINDEX CONCURRENTLY.
>
> If an invalid index depends on the constraint like primary key, "drop
> the concurrent
> index" cannot actually drop the index. In this case, you need to issue
> "alter table
> ... drop constraint ..." to recover the situation. I think this
> information should be
> documented.
>
You are right. I'll add a note in the documentation about that. Personally
I find it more instinctive to use DROP CONSTRAINT for a primary key as the
image I have of a concurrent index is a twin of the index it rebuilds.
-- 
Michael

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Josh Berkus


> There may be good reasons to reject this patch.  Or there may not.
> But I completely disagree with the idea that asking them to solve the
> problem at the filesystem level is sensible.

Yes, can we get back to the main issues with the patch?

1) argument over whether the checksum is sufficient to detect most
errors, or if it will give users false confidence.

2) performance overhead.

Based on Smith's report, I consider (2) to be a deal-killer right now.
The level of overhead reported by him would prevent the users I work
with from ever employing checksums on production systems.

Specifically, the writing checksums for a read-only query is a defect I
think is prohibitively bad.  When we first talked about this feature for
9.2, we were going to exclude hint bits from checksums, in order to
avoid this issue; what happened to that?

(FWIW, I still support the idea of moving hint bits to a separate
filehandle, as we do with the FSM, but clearly that's not happening for
9.3 ...)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-03-06 Thread Boszormenyi Zoltan


2013-03-06 19:53 keltezéssel, Tom Lane írta:

Robert Haas  writes:

On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost  wrote:

It's still entirely possible to get 99% done and then hit that last
tuple that you need a lock on and just tip over the lock_timeout_stmt
limit due to prior waiting and ending up wasting a bunch of work, hence
why I'm not entirely sure that this is that much better than
statement_timeout.

I tend to agree that this should be based on the length of any
individual lock wait, not the cumulative duration of lock waits.
Otherwise, it seems like it'll be very hard to set this to a
meaningful value.  For example, if you set this to 1 minute, and that
means the length of any single wait, then you basically know that
it'll only kick in if there is some other, long-running transaction
that's holding the lock.  But if it means the cumulative length of all
waits, it's not so clear, because now you might also have this kick in
if you wait for 100ms on 600 different occasions.  In other words,
complex queries that lock or update many tuples may get killed even if
they never wait very long at all for any single lock.  That seems like
it will be almost indistinguishable from random, unprincipled query
cancellations.

Yeah.  I'm also unconvinced that there's really much use-case territory
here that statement_timeout doesn't cover well enough.  To have a case
that statement-level lock timeout covers and statement_timeout doesn't,
you need to suppose that you know how long the query can realistically
wait for all locks together, but *not* how long it's going to run in the
absence of lock delays.  That seems a bit far-fetched, particularly when
thinking of row-level locks, whose cumulative timeout would presumably
need to scale with the number of rows the query will visit.

If statement-level lock timeouts were cheap to add, that would be one
thing; but given that they're complicating the code materially, I think
we need a more convincing argument for them.


OK, so it's not wanted. Surprise, surprise, it was already dropped
from the patch. Can you _please_ review the last patch and comment
on it instead of the state of past?

Thanks and best regards,
Zoltán Böszörményi



regards, tom lane





--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Strange Windows problem, lock_timeout test request

2013-03-06 Thread Tom Lane

Robert Haas  writes:
> On Wed, Feb 27, 2013 at 8:58 AM, Stephen Frost  wrote:
>> It's still entirely possible to get 99% done and then hit that last
>> tuple that you need a lock on and just tip over the lock_timeout_stmt
>> limit due to prior waiting and ending up wasting a bunch of work, hence
>> why I'm not entirely sure that this is that much better than
>> statement_timeout.

> I tend to agree that this should be based on the length of any
> individual lock wait, not the cumulative duration of lock waits.
> Otherwise, it seems like it'll be very hard to set this to a
> meaningful value.  For example, if you set this to 1 minute, and that
> means the length of any single wait, then you basically know that
> it'll only kick in if there is some other, long-running transaction
> that's holding the lock.  But if it means the cumulative length of all
> waits, it's not so clear, because now you might also have this kick in
> if you wait for 100ms on 600 different occasions.  In other words,
> complex queries that lock or update many tuples may get killed even if
> they never wait very long at all for any single lock.  That seems like
> it will be almost indistinguishable from random, unprincipled query
> cancellations.

Yeah.  I'm also unconvinced that there's really much use-case territory
here that statement_timeout doesn't cover well enough.  To have a case
that statement-level lock timeout covers and statement_timeout doesn't,
you need to suppose that you know how long the query can realistically
wait for all locks together, but *not* how long it's going to run in the
absence of lock delays.  That seems a bit far-fetched, particularly when
thinking of row-level locks, whose cumulative timeout would presumably
need to scale with the number of rows the query will visit.

If statement-level lock timeouts were cheap to add, that would be one
thing; but given that they're complicating the code materially, I think
we need a more convincing argument for them.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Robert Haas

On Mon, Mar 4, 2013 at 3:13 PM, Heikki Linnakangas
 wrote:
> On 04.03.2013 20:58, Greg Smith wrote:
>>
>> There
>> is no such thing as a stable release of btrfs, and no timetable for when
>> there will be one. I could do some benchmarks of that but I didn't think
>> they were very relevant. Who cares how fast something might run when it
>> may not work correctly? btrfs might as well be /dev/null to me right
>> now--sure it's fast, but maybe the data won't be there at all.
>
> This PostgreSQL patch hasn't seen any production use, either. In fact, I'd
> consider btrfs to be more mature than this patch. Unless you think that
> there will be some major changes to the worse in performance in btrfs, it's
> perfectly valid and useful to compare the two.
>
> A comparison with ZFS would be nice too. That's mature, and has checksums.

We've had a few EnterpriseDB customers who have had fantastically
painful experiences with PostgreSQL + ZFS.  Supposedly, aligning the
ZFS block size to the PostgreSQL block size is supposed to make these
problems go away, but in my experience it does not have that effect.
So I think telling people who want checksums "go use ZFS" is a lot
like telling them "oh, I see you have a hangnail, we recommend that
you solve that by cutting your arm off with a rusty saw".

There may be good reasons to reject this patch.  Or there may not.
But I completely disagree with the idea that asking them to solve the
problem at the filesystem level is sensible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-06 Thread Andres Freund

On 2013-03-06 09:53:29 -0800, Josh Berkus wrote:
> Peter,
> 
> > At run time, this will sort itself out, because all the required modules
> > will be loaded.  The problem is when you create the "glue" extension and
> > haven't invoked any code from any of the dependent extension in the
> > session.  
> 
> Just invoking code doesn't seem to be enough.  I tried just using the
> Hstore data type, and then loading hstore_plperl, but that still didn't
> work.  It seems like only CREATE EXTENSION loads *all* the symbols.

Your error looks like youre erroring out in plperl not in hstore?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Andres Freund

On 2013-03-06 11:31:06 -0600, Merlin Moncure wrote:
> On Wed, Mar 6, 2013 at 10:53 AM, Andres Freund  wrote:
> > On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
> >> On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
> >> > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> >> >  wrote:
> >> >> With these tweaks, I was able to make pglz-based delta encoding perform
> >> >> roughly as well as Amit's patch.
> >> >
> >> > Out of curiosity, do we know how pglz compares with other algorithms, 
> >> > e.g. lz4 ?
> >>
> >> This has been a subject of much recent discussion. It compares very
> >> poorly, but installing a new compressor tends to be problematic due to
> >> patent concerns (something which I disagree with but it's there).  All
> >> that said, Heikki's proposed changes seem to be low risk and quite
> >> fast.
> >
> > Imo the licensing part is by far the smaller one. The interesting part
> > is making a compatible change to the way toast compression works that
> > supports multiple compression schemes. Afaics nobody has done that work.
> > After that the choice of to-be-integrated compression schemes needs to
> > be discussed, sure.
> 
> That wasn't the conversation I remember.  lz4 completely smokes pglz
> (Heikki's changes notwithstanding) and is bsd licensed so AFAICS there
> it no reason to support multiple compression technologies except for
> migration purposes (and even if you did want to, a plausible way to do
> that was identified).

Well, we need to permanently support at least two technologies -
otherwise we will break pg_upgrade.
And there is absolutely no reason to believe nothing better than lz4
will come along in the future so just supporting two seems like a bad
idea.  And sure, there are rough ideas on how to support different
compression schemes in toast, but nobody has implemented anything
afaics.
It would be possible to reuse what I proposed for indirect toast tuples
for that purpose although I am not sure whether I like using
varatt_external's in multiple types as indicated by
varatt1_be.va_len_1be (renamed to va_type or such) apointing to
different types. Which means that an uncompressible datum would
potentially have multiple representations.

> ...but that's a separate topic for another day.  Heikki's proposal
> seems like a win either way.

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-06 Thread Josh Berkus

Peter,

> At run time, this will sort itself out, because all the required modules
> will be loaded.  The problem is when you create the "glue" extension and
> haven't invoked any code from any of the dependent extension in the
> session.  

Just invoking code doesn't seem to be enough.  I tried just using the
Hstore data type, and then loading hstore_plperl, but that still didn't
work.  It seems like only CREATE EXTENSION loads *all* the symbols.

> Abstractly, the possible solutions are either not to check the
> functions when the extension is created (possibly settable by a flag) or
> to somehow force a load of all dependent extensions when the new
> extension is created. 

The latter would be ideal, but I don't know that we currently have any
mechanism for it.

--Josh

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Tom Lane

Kohei KaiGai  writes:
> 2013/3/6 Tom Lane :
>> I think if we're going to support magic row identifiers, they need to
>> be actual system columns, complete with negative attnums and entries
>> in pg_attribute.

> Sorry, -1 for me.

> The proposed design tried to kill two birds with one stone.
> The reason why the new GetForeignRelWidth() can reserve multiple slot
> for pseudo-columns is, that intends to push-down complex calculation in
> target-list to the remote computing resource.

[ shrug... ]  That's just pie in the sky: there is no such capability
in the submitted patch, nor is it close to being implementable.  When
weighing that against the very real probability that this patch won't
get into 9.3 at all, I have no problem tossing that overboard to try to
get to something committable.

The larger issue here is that the patch is confusing the set of
possibly-computed columns that could be returned by a scan node with
the catalog-specified set of columns for a table.  I don't think that
changing the (representation of the) latter on the fly within the
planner is a tenable approach.  You've had to put in an unreasonable
number of crude hacks already to make that sort-of work, but I have
exactly zero confidence that you've hacked everyplace that would have
to change.  I also have no confidence that there aren't unfixable
problems where the same TupleDesc would need to be in both states
to satisfy the expectations of different bits of code.  (An example
of the risks here is that, IIRC, parts of the planner use the relation's
TupleDesc as a guide to what "relname.*" means.  So I'm pretty
suspicious that the existing patch breaks behavior for whole-row Vars
in some cases.)  Really the idea is a bit broken in this form anyway,
because if we did have a plan that involved calculating "(x-y)^2" at
the scan level, we'd want that expression to be represented using Vars
for x and y, not some made-up Var whose meaning is not apparent from
the system catalogs.

Also, the right way to deal with this desire is to teach the planner in
general, not just FDWs, about pushing expensive calculations down the
plan tree --- basically, resurrecting Joe Hellerstein's thesis work,
which we ripped out more than ten years ago.  I don't think there's that
much that would need to change about the planner's data structures, but
deciding where is cheapest to evaluate an expression is a pretty hard
problem.  Trying to handle that locally within FDWs is doomed to failure
IMO.

So my intention is to get rid of GetForeignRelWidth() and make use of
the existing ctid system column for returning remote TIDs in
postgres_fdw.  (On review I notice that we're creating ctid columns
for foreign tables already, so we don't even need the proposed hook
to let FDWs control that; though we will certainly want one in future
if we are to support non-TID magic row identifiers.)

If you find that unacceptable, I'm quite willing to mark this patch
Returned With Feedback and get on with dealing with some of the other
forty-odd patches in the CF queue.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Fujii Masao

On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund  wrote:
>> Indexes:
>> "hoge_pkey" PRIMARY KEY, btree (i)
>> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
>> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
>> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
>
> Huh, why did that go through? It should have errored out?

I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.

>> +The recommended recovery method in such cases is to drop the concurrent
>> +index and try again to perform REINDEX CONCURRENTLY.
>>
>> If an invalid index depends on the constraint like primary key, "drop
>> the concurrent
>> index" cannot actually drop the index. In this case, you need to issue
>> "alter table
>> ... drop constraint ..." to recover the situation. I think this
>> informataion should be
>> documented.
>
> I think we just shouldn't set ->isprimary on the temporary indexes. Now
> we switch only the relfilenodes and not the whole index, that should be
> perfectly fine.

Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Merlin Moncure

On Wed, Mar 6, 2013 at 10:53 AM, Andres Freund  wrote:
> On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
>> On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
>> > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
>> >  wrote:
>> >> With these tweaks, I was able to make pglz-based delta encoding perform
>> >> roughly as well as Amit's patch.
>> >
>> > Out of curiosity, do we know how pglz compares with other algorithms, e.g. 
>> > lz4 ?
>>
>> This has been a subject of much recent discussion. It compares very
>> poorly, but installing a new compressor tends to be problematic due to
>> patent concerns (something which I disagree with but it's there).  All
>> that said, Heikki's proposed changes seem to be low risk and quite
>> fast.
>
> Imo the licensing part is by far the smaller one. The interesting part
> is making a compatible change to the way toast compression works that
> supports multiple compression schemes. Afaics nobody has done that work.
> After that the choice of to-be-integrated compression schemes needs to
> be discussed, sure.

That wasn't the conversation I remember.  lz4 completely smokes pglz
(Heikki's changes notwithstanding) and is bsd licensed so AFAICS there
it no reason to support multiple compression technologies except for
migration purposes (and even if you did want to, a plausible way to do
that was identified).

...but that's a separate topic for another day.  Heikki's proposal
seems like a win either way.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transforms

2013-03-06 Thread Peter Eisentraut

On 3/5/13 5:52 PM, Josh Berkus wrote:
> More on this: the problem appears to be that the symbols for hstore are
> loaded only if I've just just created the extension in that database:

I see.  This is a problem with any kind of dynamically loadable module
that uses another module.  The other module is only loaded either at
creation time or when a function from it is first used (or if a preload
mechanism is used).

At run time, this will sort itself out, because all the required modules
will be loaded.  The problem is when you create the "glue" extension and
haven't invoked any code from any of the dependent extension in the
session.  Abstractly, the possible solutions are either not to check the
functions when the extension is created (possibly settable by a flag) or
to somehow force a load of all dependent extensions when the new
extension is created.  (I say extension here even though dynamically
loadable modules are attached to functions, which makes this even more
confusing.)

In "normal" programming languages, this is normally addressed by placing
explicit load/require/import statements before you do anything else.
What we are doing here is more like an autoload functionality that some
environments have.  Those have the same problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-07 02:09:49 +0900, Fujii Masao wrote:
> On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
>  wrote:
> > OK. Patches updated... Please see attached.
> 
> I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
> I found that the index which was not marked as INVALID remained unexpectedly.

Thats to be expected. Indexes need to be valid *before* we can drop the
old one. So if you abort in the right moment you will see those and
thats imo fine.
 
> =# CREATE TABLE hoge (i int primary key);
> CREATE TABLE
> =# INSERT INTO hoge VALUES (generate_series(1,10));
> INSERT 0 10
> =# SET statement_timeout TO '1s';
> SET
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR:  canceling statement due to statement timeout
> =# \d hoge
>  Table "public.hoge"
>  Column |  Type   | Modifiers
> +-+---
>  i  | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> 
> =# REINDEX TABLE CONCURRENTLY hoge;
> ERROR:  canceling statement due to statement timeout
> =# \d hoge
>  Table "public.hoge"
>  Column |  Type   | Modifiers
> +-+---
>  i  | integer | not null
> Indexes:
> "hoge_pkey" PRIMARY KEY, btree (i)
> "hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
> "hoge_pkey_cct_cct" PRIMARY KEY, btree (i)

Huh, why did that go through? It should have errored out?
 
> +The recommended recovery method in such cases is to drop the concurrent
> +index and try again to perform REINDEX CONCURRENTLY.
> 
> If an invalid index depends on the constraint like primary key, "drop
> the concurrent
> index" cannot actually drop the index. In this case, you need to issue
> "alter table
> ... drop constraint ..." to recover the situation. I think this
> informataion should be
> documented.

I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Andres Freund

On 2013-03-06 09:08:10 -0800, Jeff Janes wrote:
> On Wed, Mar 6, 2013 at 8:53 AM, Andres Freund wrote:
> 
> > On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
> > > On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
> > > > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> > > >  wrote:
> > > >> With these tweaks, I was able to make pglz-based delta encoding
> > perform
> > > >> roughly as well as Amit's patch.
> > > >
> > > > Out of curiosity, do we know how pglz compares with other algorithms,
> > e.g. lz4 ?
> > >
> > > This has been a subject of much recent discussion. It compares very
> > > poorly, but installing a new compressor tends to be problematic due to
> > > patent concerns (something which I disagree with but it's there).  All
> > > that said, Heikki's proposed changes seem to be low risk and quite
> > > fast.
> >
> > Imo the licensing part is by far the smaller one. The interesting part
> > is making a compatible change to the way toast compression works that
> > supports multiple compression schemes. Afaics nobody has done that work.
> > After that the choice of to-be-integrated compression schemes needs to
> > be discussed, sure.
> >
> 
> 
> Another thing to consider would be some way of recording an exemplar value
> for each column which is used to seed whatever compression algorithm is
> used.  I think there often a lot of redundancy that does not appear within
> any given value, but does appear when viewing all the values of a given
> column.  Finding some way to take advantage of that could give a big
> improvement in compression ratio.

But then that value cannot be changed/removed because existing values
depend on it. So either its a one of thing - which means you can only
compute it after the table is filled to some extent - or you need to
have a growing list of such values somewhere (refcounting it would be
hard).
I think the more reasonable route for such a thing would be to try and
get page-level compression working. Which is a pretty hard project, I
admit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Fujii Masao

On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
 wrote:
> OK. Patches updated... Please see attached.

I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained unexpectedly.

=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR:  canceling statement due to statement timeout
=# \d hoge
 Table "public.hoge"
 Column |  Type   | Modifiers
+-+---
 i  | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID

=# REINDEX TABLE CONCURRENTLY hoge;
ERROR:  canceling statement due to statement timeout
=# \d hoge
 Table "public.hoge"
 Column |  Type   | Modifiers
+-+---
 i  | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)


+The recommended recovery method in such cases is to drop the concurrent
+index and try again to perform REINDEX CONCURRENTLY.

If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Jeff Janes

On Wed, Mar 6, 2013 at 8:53 AM, Andres Freund wrote:

> On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
> > On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
> > > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> > >  wrote:
> > >> With these tweaks, I was able to make pglz-based delta encoding
> perform
> > >> roughly as well as Amit's patch.
> > >
> > > Out of curiosity, do we know how pglz compares with other algorithms,
> e.g. lz4 ?
> >
> > This has been a subject of much recent discussion. It compares very
> > poorly, but installing a new compressor tends to be problematic due to
> > patent concerns (something which I disagree with but it's there).  All
> > that said, Heikki's proposed changes seem to be low risk and quite
> > fast.
>
> Imo the licensing part is by far the smaller one. The interesting part
> is making a compatible change to the way toast compression works that
> supports multiple compression schemes. Afaics nobody has done that work.
> After that the choice of to-be-integrated compression schemes needs to
> be discussed, sure.
>


Another thing to consider would be some way of recording an exemplar value
for each column which is used to seed whatever compression algorithm is
used.  I think there often a lot of redundancy that does not appear within
any given value, but does appear when viewing all the values of a given
column.  Finding some way to take advantage of that could give a big
improvement in compression ratio.

Cheers,

Jeff

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Tom Lane

Shigeru Hanada  writes:
> I'm not sure whether postgres_fdw should support, but updatable views
> have no system column including ctid.  So, we need magic identifier,
> perhaps it would be set of primary key value(s), to support updating
> remote updatable views via foreign tables.

Yeah, I considered that.  I thought seriously about proposing that we
forget magic row identifiers altogether, and instead make postgres_fdw
require a remote primary key for a foreign table to be updatable.  That
would solve the immediate problem since we'd no longer need any system
columns at all.  On balance though I think it's better for postgres_fdw
to use ctid for this purpose, at least for now: ctid is
better-performing and more reliable than a PK (since someone might
mis-describe the PK, or change a row's PK value underneath us).
Perhaps in a future release we could add a table option to use PK row
identification, but I don't want to try to implement both methods in
postgres_fdw right now.

On the other hand, I don't have a problem with decreeing that
non-Postgres FDWs need to use PK row identification in the first
release; which would be the consequence if we don't do anything about
allowing new system columns in 9.3.  We will certainly need that style
of row identification to be written and tested anyway.  It won't stop
us from extending things later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Andres Freund

On 2013-03-06 11:21:21 -0500, Garick Hamlin wrote:
> If picking a CRC why not a short optimal one rather than truncate CRC32C?

CRC32C is available in hardware since SSE4.2.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Andres Freund

On 2013-03-06 09:36:19 -0600, Merlin Moncure wrote:
> On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
> > On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
> >  wrote:
> >> With these tweaks, I was able to make pglz-based delta encoding perform
> >> roughly as well as Amit's patch.
> >
> > Out of curiosity, do we know how pglz compares with other algorithms, e.g. 
> > lz4 ?
> 
> This has been a subject of much recent discussion. It compares very
> poorly, but installing a new compressor tends to be problematic due to
> patent concerns (something which I disagree with but it's there).  All
> that said, Heikki's proposed changes seem to be low risk and quite
> fast.

Imo the licensing part is by far the smaller one. The interesting part
is making a compatible change to the way toast compression works that
supports multiple compression schemes. Afaics nobody has done that work.
After that the choice of to-be-integrated compression schemes needs to
be discussed, sure.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Garick Hamlin

On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
>> On 5 March 2013 18:02, Jeff Davis  wrote:
>>
>>> Fletcher is probably significantly faster than CRC-16, because I'm just
>>> doing int32 addition in a tight loop.
>>>
>>> Simon originally chose Fletcher, so perhaps he has more to say.
>>
>> IIRC the research showed Fletcher was significantly faster for only a
>> small loss in error detection rate.
>>
>> It was sufficient to make our error detection>  1 million times
>> better, possibly more. That seems sufficient to enable early detection
>> of problems, since if we missed the first error, a second is very
>> likely to be caught (etc). So I am assuming that we're trying to catch
>> a pattern of errors early, rather than guarantee we can catch the very
>> first error.
>
> Fletcher's checksum is good in general, I was mainly worried about  
> truncating the Fletcher-64 into two 8-bit values. I can't spot any  
> obvious weakness in it, but if it's indeed faster and as good as a  
> straightforward Fletcher-16, I wonder why that method is not more widely  
> used.

I was wondering about the effectiveness of this resulting truncated hash
function as well.

> Another thought is that perhaps something like CRC32C would be faster to  
> calculate on modern hardware, and could be safely truncated to 16-bits  
> using the same technique you're using to truncate the Fletcher's  
> Checksum. Greg's tests showed that the overhead of CRC calculation is  
> significant in some workloads, so it would be good to spend some time to  
> optimize that. It'd be difficult to change the algorithm in a future  
> release without breaking on-disk compatibility, so let's make sure we  
> pick the best one.

If picking a CRC why not a short optimal one rather than truncate CRC32C?

I've been reading about optimal checksum for small messages for other 
reasons and found this paper quite good.

http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf

I was interested in small messages and small checksums so this paper may not be
as much help here.

Other than CRCs and fletcher sums, Pearson hashing with a 16-bit block might
be worth considering.  Either a pearson hash or a 16-CRC is small enough to
implement with a lookup table rather than a formula.  

I've been wondering what kind of errors we expect?  Single bit flips?  Large
swaths of bytes corrupted?  Are we more worried about collisions (the odds 
total garbage has the same checksum) or the odds we detect a flip of n-bits.
I would think since the message is large and a write to the wrong location
seems about as likely as a bit flip a pearson hash be good.

Any choice seems like it would be a nice improvement of noticing a storage stack
problem.  The difference would be subtle.  Can I estimate the odds of 
undetected corruption that occurred since the condition was first detected
accurately or does the checksum/hash perform poorly?

Garick

>
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Greg Stark

On Tue, Mar 5, 2013 at 9:08 PM, Robert Haas  wrote:
> All that having been said, it's hard for me to imagine that anyone
> really cares about any of this until we have an incremental update
> feature, which right now we don't.  Actually, I'm betting that's going
> to be significantly harder than automatic-query-rewrite, when all is
> said and done.  Automatic-query-rewrite, if and when we get it, will
> not be easy and will require a bunch of work from someone with a good
> understanding of the planner, but it strikes me as the sort of thing
> that might work out to one large project and then it's done.  Whereas,
> incremental update sounds to me like a series of projects over a
> series of releases targeting various special cases, where we can
> always point to some improvements vs. release N-1 but we're never
> actually done and able to move on to the next thing.  As a roadmap
> goes, I think that's OK.  Even a reasonably simplistic and partial
> implementation of incremental update will benefit a lot of users.  But
> in terms of relative difficulty, it's not at all obvious to me that
> that's the easier part of the project.

While true that's true for a lot of Postgres features. The only ones
that are one-shot projects are buried deep in the internals. Anything
with UI implications inevitably has limitations and then other people
come along and and work on removing or extending those features.

I do agree with Tom though -- the most frequently asked for
materialized view in the past has always been "select count(*) from
tab". People assume we already do this and are surprised when we
don't. The cookie cutter solution for it is basically exactly what a
incrementally updated materialized view  solution would look like
(with the queue of updates with transacion information that are
periodically flattened into the aggregate). Rewriting this might be a
bit tricky and require heuristics to determine just how much work to
expend trying to match materialized views, this type of view would be
where most of the win would be.

I also can't see implementing query rewriting for
non-transactionally-accurate materialized views. If people want a
snapshot of the data that may be out of date that's great. I can tons
of use cases for that. But then surely they won't be surprised to have
to query the snapshot explicitly. If can't see going to all this
trouble to implement transactions and snapshots and wal logging and so
on and then silently rewriting queries to produce data that is not up
to date. I think users would be surprised to find bog-standard SQL
occasionally producing "incorrect" results.

That said, there are cases where snapshots might be up to date even
though we don't implement any incremental updates. If the underlying
data is read-only or hasn't received any update commits since the
snapshot was taken then it might still be useful. There are tons of
ETL applications where you load the data once and then build MVs for
it and  never touch the underlying data again.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Peter Eisentraut

Kevin,

I haven't seen a reply to this.  Were you able to give my notes below
any consideration?


On 2/15/13 12:44 PM, Peter Eisentraut wrote:
> On 1/25/13 1:00 AM, Kevin Grittner wrote:
>> New patch rebased, fixes issues raised by Thom Brown, and addresses
>> some of your points.
> 
> This patch doesn't apply anymore, so I just took a superficial look.  I
> think the intended functionality and the interfaces look pretty good.
> Documentation looks complete, tests are there.
> 
> I have a couple of notes:
> 
> * What you call WITH [NO] DATA, Oracle calls BUILD IMMEDIATE/DEFERRED.
> It might be better to use that as well then.
> 
> * You use fields named relkind in the parse nodes, but they don't
> actually contain relkind values, which is confusing.  I'd just name the
> field is_matview or something.
> 
> * More generally, I wouldn't be so fond of combining the parse handling
> of CREATE TABLE AS and CREATE MATERIALIZED VIEW.  They are similar, but
> then again so are a lot of other things.
> 
> * Some of the terminology is inconsistent.  A materialized view is
> sometimes called valid, populated, or built, with approximately the same
> meaning.  Personally, I would settle on "built", as per above, but it
> should be one term only.
> 
> * I find the name of the relisvalid column a bit confusing.  Especially
> because it only applies to materialized views, and there is already a
> meaning of "valid" for indexes.  (Recall that indexes are also stored in
> pg_class, but they are concerned about indisvalid.)  I would name it
> something like relmvbuilt.
> 
> 
> Btw., half of the patch seems to consist of updating places referring to
> relkind.  Is something wrong with the meaning of relkind that this sort
> of thing is required?  Maybe these places should be operating in terms
> of features, not accessing relkind directly.
> 
> 
> 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized View patch broke pg_dump

2013-03-06 Thread Kevin Grittner

Bernd Helmle  wrote:

> Looking into this issue, it seems the version check in getTables() of 
> pg_dump.c
> is wrong. Shouldn't the check be
>
> if (fout->remoteVersion >= 90300)
> {
>
> }
>
> since this is where pg_relation_is_scannable() is introduced?

Fixed.

Thanks for the report!

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Merlin Moncure

On Wed, Mar 6, 2013 at 8:32 AM, Joachim Wieland  wrote:
> On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
>  wrote:
>> With these tweaks, I was able to make pglz-based delta encoding perform
>> roughly as well as Amit's patch.
>
> Out of curiosity, do we know how pglz compares with other algorithms, e.g. 
> lz4 ?

This has been a subject of much recent discussion. It compares very
poorly, but installing a new compressor tends to be problematic due to
patent concerns (something which I disagree with but it's there).  All
that said, Heikki's proposed changes seem to be low risk and quite
fast.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized View patch broke pg_dump

2013-03-06 Thread Kevin Grittner

Bernd Helmle  wrote:

> It looks like the recent matview patch broke pg_dump in a way, which make it
> impossible to dump 9.1 and 9.2 databases.
>
> it fails with
>
> pg_dump: [Archivierer (DB)] query failed: ERROR:  function
> pg_relation_is_scannable(oid) does not exist
>
> Looking into this issue, it seems the version check in getTables() of 
> pg_dump.c
> is wrong. Shouldn't the check be
>
> if (fout->remoteVersion >= 90300)
> {
>
> }
>
> since this is where pg_relation_is_scannable() is introduced?

Right.  Will fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Kevin Grittner

Tatsuo Ishii  wrote:

> Was the remaining work on docs done? I would like to test MVs and
> am waiting for the docs completed.

I think they are done.  If you notice anything missing or in need
of clarification please let me know.  At this point missing docs
would be a bug in need of a fix.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Andres Freund

On 2013-03-06 13:34:21 +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
> >On 5 March 2013 18:02, Jeff Davis  wrote:
> >
> >>Fletcher is probably significantly faster than CRC-16, because I'm just
> >>doing int32 addition in a tight loop.
> >>
> >>Simon originally chose Fletcher, so perhaps he has more to say.
> >
> >IIRC the research showed Fletcher was significantly faster for only a
> >small loss in error detection rate.
> >
> >It was sufficient to make our error detection>  1 million times
> >better, possibly more. That seems sufficient to enable early detection
> >of problems, since if we missed the first error, a second is very
> >likely to be caught (etc). So I am assuming that we're trying to catch
> >a pattern of errors early, rather than guarantee we can catch the very
> >first error.
> 
> Fletcher's checksum is good in general, I was mainly worried about
> truncating the Fletcher-64 into two 8-bit values. I can't spot any obvious
> weakness in it, but if it's indeed faster and as good as a straightforward
> Fletcher-16, I wonder why that method is not more widely used.

I personally am not that convinced that fletcher is a such good choice given
that it afaik doesn't distinguish between all-zero and all-one runs that are
long enough.

> Another thought is that perhaps something like CRC32C would be faster to
> calculate on modern hardware, and could be safely truncated to 16-bits using
> the same technique you're using to truncate the Fletcher's Checksum. Greg's
> tests showed that the overhead of CRC calculation is significant in some
> workloads, so it would be good to spend some time to optimize that. It'd be
> difficult to change the algorithm in a future release without breaking
> on-disk compatibility, so let's make sure we pick the best one.

I had implemented a noticeably faster CRC32 implementation somewhere
around 201005202227.49990.and...@anarazel.de . I have since repeatedly
seen pg's CRC32 implementation being a major limitation, so I think
brushing up that patch would be a good idea.
We might think about switching the polynom for WAL at the same time,
given, as you say, CRC32c is available in hardware. The bigger problem
is probably stuff like the control file et al.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimizing pglz compressor

2013-03-06 Thread Joachim Wieland

On Tue, Mar 5, 2013 at 8:32 AM, Heikki Linnakangas
 wrote:
> With these tweaks, I was able to make pglz-based delta encoding perform
> roughly as well as Amit's patch.

Out of curiosity, do we know how pglz compares with other algorithms, e.g. lz4 ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-06 Thread Florian Weimer


On 03/05/2013 07:23 PM, Tom Lane wrote:

Maciek Sakrejda  writes:

Thank you: I think this is what I was missing, and what wasn't clear
from the proposed doc patch. But then how can pg_dump assume that it's
always safe to set extra_float_digits = 3?


It's been proven (don't have a link handy, but the paper is at least
a dozen years old) that 3 extra digits are sufficient to accurately
reconstruct any IEEE single or double float value, given properly
written conversion functions in libc.  So that's where that number comes
from.  Now, if either end is not using IEEE floats, you may or may not
get equivalent results --- but it's pretty hard to make any guarantees
at all in such a case.


There's also gdtoa, which returns the shortest decimal representation 
which rounds to the same decimal number.  It would print 0.1 as 0.1, but 
0.1 + 0.2 as 0.30004.


--
Florian Weimer / Red Hat Product Security Team


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Tom Lane

Simon Riggs  writes:
> On 5 March 2013 22:02, Tom Lane  wrote:
>> FWIW, my opinion is that doing anything like this in the planner is
>> going to be enormously expensive.

> As we already said: no MVs => zero overhead => no problem.

Well, in the first place that statement is false on its face: we'll
still spend cycles looking for relevant MVs, or at least maintaining a
complexly-indexed cache that helps us find out that there are none in
a reasonable amount of time.  In the second place, even if it were
approximately true it wouldn't help the people who were using MVs.

> It costs in
> the cases where time savings are possible and not otherwise.

And that is just complete nonsense: matching costs whether you find a
match or not.  Could we have a little less Pollyanna-ish optimism and
a bit more realism about the likely cost of such a feature?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Heikki Linnakangas


On 06.03.2013 10:41, Simon Riggs wrote:

On 5 March 2013 18:02, Jeff Davis  wrote:


Fletcher is probably significantly faster than CRC-16, because I'm just
doing int32 addition in a tight loop.

Simon originally chose Fletcher, so perhaps he has more to say.


IIRC the research showed Fletcher was significantly faster for only a
small loss in error detection rate.

It was sufficient to make our error detection>  1 million times
better, possibly more. That seems sufficient to enable early detection
of problems, since if we missed the first error, a second is very
likely to be caught (etc). So I am assuming that we're trying to catch
a pattern of errors early, rather than guarantee we can catch the very
first error.


Fletcher's checksum is good in general, I was mainly worried about 
truncating the Fletcher-64 into two 8-bit values. I can't spot any 
obvious weakness in it, but if it's indeed faster and as good as a 
straightforward Fletcher-16, I wonder why that method is not more widely 
used.


Another thought is that perhaps something like CRC32C would be faster to 
calculate on modern hardware, and could be safely truncated to 16-bits 
using the same technique you're using to truncate the Fletcher's 
Checksum. Greg's tests showed that the overhead of CRC calculation is 
significant in some workloads, so it would be good to spend some time to 
optimize that. It'd be difficult to change the algorithm in a future 
release without breaking on-disk compatibility, so let's make sure we 
pick the best one.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2013-03-06 Thread Stephen Frost

* Alexander Korotkov (aekorot...@gmail.com) wrote:
> Now, we probably don't have enough of time before 9.3 to solve an API
> problem :(. It's likely we have to choose either commit to 9.3 without
> clean API factorization or postpone it to 9.4.

As much as I'd like this to get in, I don't think there's a question
here- it should be punted to 9.4.  These don't look like small issues to
me.

Thanks,

Stephen

signature.asc
Description: Digital signature

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-06 21:19:57 +0900, Michael Paquier wrote:
> On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund wrote:
> 
> > On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> > > OK. Patches updated... Please see attached.
> > > With all the work done on those patches, I suppose this is close to being
> > > something clean...
> >
> > Yes, its looking good. There are loads of improvements possible but
> > those can very well be made incrementally.
> > > > I have the feeling we are talking past each other. Unless I miss
> > > > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > > > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > > > sufficient.
> > > >
> > > OK, sorry for the confusion. I added a call to
> > WaitForMultipleVirtualLocks
> > > also before phase 3.
> > > Honestly, I am still not very comfortable with the fact that the
> > ShareLock
> > > wait on parent relation is done outside each index transaction for build
> > > and validation... Changed as requested though...
> >
> > Could you detail your concerns a bit? I tried to think it through
> > multiple times now and I still can't see a problem. The lock only
> > ensures that nobody has the relation open with the old index definition
> > in mind...
> >
> I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
> wait is made inside the build and validation transactions. Was there any
> particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
> transaction block?
> That's my only concern.

Well, it needs to be executed in a transaction because it needs a valid
resource owner and a previous CommitTransactionCommand() will leave that
at NULL. And there is no reason in the single-index case of CREATE INDEX
CONCURRENTLY to do it in a separate transaction.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Michael Paquier

On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund wrote:

> On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> > OK. Patches updated... Please see attached.
> > With all the work done on those patches, I suppose this is close to being
> > something clean...
>
> Yes, its looking good. There are loads of improvements possible but
> those can very well be made incrementally.
> > > I have the feeling we are talking past each other. Unless I miss
> > > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > > sufficient.
> > >
> > OK, sorry for the confusion. I added a call to
> WaitForMultipleVirtualLocks
> > also before phase 3.
> > Honestly, I am still not very comfortable with the fact that the
> ShareLock
> > wait on parent relation is done outside each index transaction for build
> > and validation... Changed as requested though...
>
> Could you detail your concerns a bit? I tried to think it through
> multiple times now and I still can't see a problem. The lock only
> ensures that nobody has the relation open with the old index definition
> in mind...
>
I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
wait is made inside the build and validation transactions. Was there any
particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
transaction block?
That's my only concern.
-- 
Michael

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
> OK. Patches updated... Please see attached.
> With all the work done on those patches, I suppose this is close to being
> something clean...

Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.
> > I have the feeling we are talking past each other. Unless I miss
> > something *there is no* WaitForMultipleVirtualLocks between phase 2 and
> > 3. But one WaitForMultipleVirtualLocks for all would be totally
> > sufficient.
> >
> OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
> also before phase 3.
> Honestly, I am still not very comfortable with the fact that the ShareLock
> wait on parent relation is done outside each index transaction for build
> and validation... Changed as requested though...

Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Shigeru Hanada

On Wed, Mar 6, 2013 at 9:30 AM, Tom Lane  wrote:
> For postgres_fdw, that would really be enough, since it could just
> cause a "ctid" column to be created with the usual definition.  Then
> it could put the remote ctid into the usual t_self field in returned
> tuples.

I'm not sure whether postgres_fdw should support, but updatable views
have no system column including ctid.  So, we need magic identifier,
perhaps it would be set of primary key value(s), to support updating
remote updatable views via foreign tables.

Just a heads up.

-- 
Shigeru HANADA

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Pavan Deolasee



On 06-Mar-2013, at 4:12 PM, Shigeru Hanada  wrote:

> On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee
>  wrote:
>> In the context of postgres_fdw, I am not sure if we need an additional
>> system column like a node_id. Would there be a possibility where
>> tuples to-be-modified are coming from different foreign tables and at
>> runtime we need to decide where to execute the UPDATE/DELETE operation
>> ? If we start supporting inheritance involving foreign tables as child
>> tables, this will become a reality.
> 
> Foreign tables have tableoid system column which holds pg_class.oid of
> the foreign table.  It seems sufficient to determine source server.
> 

I think you are right. In the context of foreign tables, tableoid might be 
enough to get the source. Unfortunately, Postgres-XC had not yet leveraged 
foreign tables fully and hence special handling was required.

Thanks,
Pavan


> -- 
> Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance Improvement by reducing WAL for Update Operation

2013-03-06 Thread Amit Kapila

On Wednesday, March 06, 2013 2:57 AM Heikki Linnakangas wrote:
> On 04.03.2013 06:39, Amit Kapila wrote:
> > On Sunday, March 03, 2013 8:19 PM Craig Ringer wrote:
> >> On 02/05/2013 11:53 PM, Amit Kapila wrote:
>  Performance data for the patch is attached with this mail.
>  Conclusions from the readings (these are same as my previous
> patch):
> 
> 
> I've been doing investigating the pglz option further, and doing
> performance comparisons of the pglz approach and this patch. I'll begin
> with some numbers:
> 
> unpatched (63d283ecd0bc5078594a64dfbae29276072cdf45):
> 
>  testname | wal_generated |
> duration
> 
> -+---+-
> -
> -+---+
>   two short fields, no change |1245525360 |
> 9.94613695144653
>   two short fields, one changed   |1245536528 |
> 10.146910905838
>   two short fields, both changed  |1245523160 |
> 11.2332470417023
>   one short and one long field, no change |1054926504 |
> 5.90477800369263
>   ten tiny fields, all changed|1411774608 |
> 13.4536008834839
>   hundred tiny fields, all changed| 635739680 |
> 7.57448387145996
>   hundred tiny fields, half changed   | 636930560 |
> 7.56888699531555
>   hundred tiny fields, half nulled| 573751120 |
> 6.68991994857788
> 
> Amit's wal_update_changes_v10.patch:
> 
>  testname | wal_generated |
> duration
> 
> -+---+-
> -
> -+---+
>   two short fields, no change |1249722112 |
> 13.0558869838715
>   two short fields, one changed   |1246145408 |
> 12.9947438240051
>   two short fields, both changed  |1245951056 |
> 13.0262880325317
>   one short and one long field, no change | 678480664 |
> 5.70031690597534
>   ten tiny fields, all changed|1328873920 |
> 20.0167419910431
>   hundred tiny fields, all changed| 638149416 |
> 14.4236788749695
>   hundred tiny fields, half changed   | 635560504 |
> 14.8770561218262
>   hundred tiny fields, half nulled| 558468352 |
> 16.2437210083008
> 
> pglz-with-micro-optimizations-1.patch:
> 
>   testname | wal_generated |
> duration
> -+---+-
> -
> -+---+
>   two short fields, no change |1245519008 |
> 11.6702048778534
>   two short fields, one changed   |1245756904 |
> 11.3233819007874
>   two short fields, both changed  |1249711088 |
> 11.6836447715759
>   one short and one long field, no change | 664741392 |
> 6.44810795783997
>   ten tiny fields, all changed|1328085568 |
> 13.9679481983185
>   hundred tiny fields, all changed| 635974088 |
> 9.15514206886292
>   hundred tiny fields, half changed   | 636309040 |
> 9.13769292831421
>   hundred tiny fields, half nulled| 496396448 |
> 8.77351498603821

For some of the tests, it doesn't even execute main part of
compression/encoding. 
The reason is that the length of tuple is less than strategy min length, so
it returns from below check
in function pglz_delta_encode()
if (strategy->match_size_good <= 0 || 
slen < strategy->min_input_size || 
slen > strategy->max_input_size) 
return false;

The tests for which it doesn't execute encoding are below:
two short fields, no change
two short fields, one changed 
two short fields, both changed
ten tiny fields, all changed 


For above cases, the reason of difference in timings for both approaches
with original could be due to the reason that
this check is done after some processing. So I think if we check the length
in log_heap_update, then
there should not be timing difference for above test scenario's. I can check
that once.

This optimization helps only when tuple is of length > 128~200 bytes and
upto 1800 bytes (till it turns to toast), otherwise it could result in
overhead without any major WAL reduction. 
Infact I think in one of my initial patch there is a check if length of
tuple is greater than 128 bytes then perform the optimization.

I shall try to run both patches for cases when length of tuple is > 128~200
bytes, as this optimization has benefits in those cases. 

 
> In each test, a table is created with a large number of identical rows,
> and fillfactor=50. Then a full-table UPDATE is performed, and the
> UPDATE is timed. Duration is the time spent in the UPDATE (lower is
> better), and wal_generated is the amount of WAL generated by the
> updates (lower is better).
> 
> The summary is that A

[HACKERS] Materialized View patch broke pg_dump

2013-03-06 Thread Bernd Helmle

It looks like the recent matview patch broke pg_dump in a way, which make 
it impossible to dump 9.1 and 9.2 databases.


it fails with

pg_dump: [Archivierer (DB)] query failed: ERROR:  function 
pg_relation_is_scannable(oid) does not exist


Looking into this issue, it seems the version check in getTables() of 
pg_dump.c is wrong. Shouldn't the check be


if (fout->remoteVersion >= 90300)
{

}

since this is where pg_relation_is_scannable() is introduced?

--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Shigeru Hanada

On Wed, Mar 6, 2013 at 12:35 PM, Pavan Deolasee
 wrote:
> In the context of postgres_fdw, I am not sure if we need an additional
> system column like a node_id. Would there be a possibility where
> tuples to-be-modified are coming from different foreign tables and at
> runtime we need to decide where to execute the UPDATE/DELETE operation
> ? If we start supporting inheritance involving foreign tables as child
> tables, this will become a reality.

Foreign tables have tableoid system column which holds pg_class.oid of
the foreign table.  It seems sufficient to determine source server.

-- 
Shigeru HANADA


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [GENERAL] Floating point error

2013-03-06 Thread Albe Laurenz

Maciek Sakrejda wrote:
> On Tue, Mar 5, 2013 at 12:03 AM, Albe Laurenz  wrote:
>> I don't think that it is about looking nice.
>> C doesn't promise you more than FLT_DIG or DBL_DIG digits of
>> precision, so PostgreSQL cannot either.
>>
>> If you allow more, that would mean that if you store the same
>> number on different platforms and query it, it might come out
>> differently.  Among other things, that would be a problem for
>> the regression tests.
> 
> Thank you: I think this is what I was missing, and what wasn't clear
> from the proposed doc patch. But then how can pg_dump assume that it's
> always safe to set extra_float_digits = 3? Why the discrepancy between
> default behavior and what pg_dump gets? It can't know whether the dump
> is to be restored into the same system or a different one (and AFAICT,
> there's not even an option to tweak extra_float_digits there).

How about this elaboration?

Yours,
Laurenz Albe


float-2.patch
Description: float-2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Writable foreign tables: how to identify rows

2013-03-06 Thread Andres Freund

On 2013-03-05 19:30:53 -0500, Tom Lane wrote:
> One of the core problems for a writable-foreign-tables feature is how
> to identify a previously-fetched row for UPDATE or DELETE actions.
> In an ordinary Postgres table, we use the ctid system column for that,
> but a remote table doesn't necessarily have such a thing.
> ...
> For postgres_fdw, that would really be enough, since it could just
> cause a "ctid" column to be created with the usual definition.  Then
> it could put the remote ctid into the usual t_self field in returned
> tuples.
> 
> Supporting magic identifiers that aren't of the TID data type is
> considerably harder, mainly because it's not clear how heap_getsysattr()
> could know how to fetch the column value.  I have some rough ideas
> about that, but I suggest that we might want to punt on that extension
> for the time being.

What about just making it a bytea in fdw's? Now its not nice to waste
space for a (probably 1byte) bytea header, but its not too bad either.
The fdw author then needs to ensure only the correct data ends up in
that system column.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: index support for regexp search

2013-03-06 Thread Alexander Korotkov

On Wed, Jan 23, 2013 at 7:29 PM, Tom Lane  wrote:

> Heikki Linnakangas  writes:
> > On 23.01.2013 09:36, Alexander Korotkov wrote:
> >> On Wed, Jan 23, 2013 at 6:08 AM, Tom Lane  wrote:
> >>> The biggest problem is that I really don't care for the idea of
> >>> contrib/pg_trgm being this cozy with the innards of regex_t.
>
> >> The only option I see now is to provide a method like "export_cnfa"
> which
> >> would export corresponding CNFA in fixed format.
>
> > Yeah, I think that makes sense. The transformation code in trgm_regexp.c
> > would probably be more readable too, if it didn't have to deal with the
> > regex guts representation of the CNFA. Also, once you have intermediate
> > representation of the original CNFA, you could do some of the
> > transformation work on that representation, before building the
> > "tranformed graph" containing trigrams. You could eliminate any
> > non-alphanumeric characters, joining states connected by arcs with
> > non-alphanumeric characters, for example.
>
> It's not just the CNFA though; the other big API problem is with mapping
> colors back to characters.  Right now, that not only knows way too much
> about a part of the regex internals we have ambitions to change soon,
> but it also requires pg_wchar2mb_with_len() and lowerstr(), neither of
> which should be known to the regex library IMO.  So I'm not sure how we
> divvy that up sanely.  To be clear: I'm not going to insist that we have
> to have a clean API factorization before we commit this at all.  But it
> worries me if we don't even know how we could get to that, because we
> are going to need it eventually.
>

Now, we probably don't have enough of time before 9.3 to solve an API
problem :(. It's likely we have to choose either commit to 9.3 without
clean API factorization or postpone it to 9.4.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] Support for REINDEX CONCURRENTLY

2013-03-06 Thread Andres Freund

On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
> Please find attached updated patch realigned with your comments. You can
> find my answers inline...
> The only thing that needs clarification is the comment about
> UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
> corrected or adapted to what you wanted. I am also including now tests for
> matviews.
> 
> On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund wrote:
> 
> > On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
> >
> > > > + for (count = 0; count < num_indexes; count++)
> > > > > + index_insert(toastidxs[count], t_values,
> > t_isnull,
> > > > > +  &(toasttup->t_self),
> > > > > +  toastrel,
> > > > > +
> > > >  toastidxs[count]->rd_index->indisunique ?
> > > > > +  UNIQUE_CHECK_YES :
> > > > UNIQUE_CHECK_NO);
> > > >
> > > > The indisunique check looks like a copy & pasto to me, albeit not
> > > > yours...
> > > >
> > > Yes it is the same for all the indexes normally, but it looks more solid
> > to
> > > me to do that as it is. So unchanged.
> >
> > Hm, if the toast indexes aren't unique anymore loads of stuff would be
> > broken. Anyway, not your "fault".
> >
> I definitely cannot understand where you are going here. Could you be more
> explicit? Why could this be a problem? Without my patch a similar check is
> used for toast indexes.

There's no problem. I just dislike the pointless check which caters for
a situation that doesn't exist...
Forget it, sorry.

> > > > > + if (count == 0)
> > > > > + snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index",
> > > > > +  OIDOldHeap);
> > > > > + else
> > > > > + snprintf(NewToastName,
> > > > NAMEDATALEN, "pg_toast_%u_index_cct%d",
> > > > > +  OIDOldHeap,
> > > > count);
> > > > > + RenameRelationInternal(lfirst_oid(lc),
> > > > > +
> > > >  NewToastName);
> > > > > + count++;
> > > > > + }
> > > >
> > > > Hm. It seems wrong that this layer needs to know about _cct.
> > > >
> > >  Any other idea? For the time being I removed cct and added only a suffix
> > > based on the index number...
> >
> > Hm. It seems like throwing an error would be sufficient, that path is
> > only entered for shared catalogs, right? Having multiple toast indexes
> > would be a bug.
> >
> Don't think so. Even if now those APIs are used only for catalog tables, I
> do not believe that this function has been designed to be used only with
> shared catalogs. Removing the cct suffix makes sense though...

Forget what I said.

> > > > > + /*
> > > > > +  * Index is considered as a constraint if it is PRIMARY KEY or
> > > > EXCLUSION.
> > > > > +  */
> > > > > + isconstraint =  indexRelation->rd_index->indisprimary ||
> > > > > + indexRelation->rd_index->indisexclusion;
> > > >
> > > > unique constraints aren't mattering here?
> > > >
> > > No they are not. Unique indexes are not counted as constraints in the
> > case
> > > of index_create. Previous versions of the patch did that but there are
> > > issues with unique indexes using expressions.
> >
> > Hm. index_create's comment says:
> >  * isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
> > constraint
> >
> > There are unique indexes that are constraints and some that are
> > not. Looking at ->indisunique is not sufficient to determine whether its
> > one or not.
> >
> Hum... OK. I changed that using a method based on get_index_constraint for
> a given index. So if the constraint Oid is invalid, it means that this
> index has no constraints and its concurrent entry won't create an index in
> consequence. It is more stable this way.

Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);

> > > > > + /*
> > > > > +  * Phase 3 of REINDEX CONCURRENTLY
> > > > > +  *
> > > > > +  * During this phase the concurrent indexes catch up with the
> > > > INSERT that
> > > > > +  * might have occurred in the parent table and are marked as
> > valid
> > > > once done.
> > > > > +  *
> > > > > +  * We once again wait until no transaction can have the table
> > open
> > > > with
> > > > > +  * the index marked as read-only for updates. Each index
> > > > validation is done
> > > > > +  * with a separate transaction to avoid opening transaction
> > for an
> > > > > +  * unnecessary too long time.
> > > > > +  */

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Simon Riggs

On 5 March 2013 18:02, Jeff Davis  wrote:

> Fletcher is probably significantly faster than CRC-16, because I'm just
> doing int32 addition in a tight loop.
>
> Simon originally chose Fletcher, so perhaps he has more to say.

IIRC the research showed Fletcher was significantly faster for only a
small loss in error detection rate.

It was sufficient to make our error detection > 1 million times
better, possibly more. That seems sufficient to enable early detection
of problems, since if we missed the first error, a second is very
likely to be caught (etc). So I am assuming that we're trying to catch
a pattern of errors early, rather than guarantee we can catch the very
first error.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling Checksums

2013-03-06 Thread Simon Riggs

On 5 March 2013 09:35, Heikki Linnakangas  wrote:

>> Are there objectors?
>
>
> In addition to my hostility towards this patch in general, there are some
> specifics in the patch I'd like to raise (read out in a grumpy voice):

;-)  We all want to make the right choice here, so all viewpoints
gratefully received so we can decide.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Materialized views WIP patch

2013-03-06 Thread Simon Riggs

On 5 March 2013 22:02, Tom Lane  wrote:

> FWIW, my opinion is that doing anything like this in the planner is
> going to be enormously expensive.  Index matching is already pretty
> expensive, and that has the saving grace that we only do it once per
> base relation.  Your sketch above implies trying to match to MVs once
> per considered join relation, which will be combinatorially worse.
> Even with a lot of sweat over reducing the cost of the matching, it
> will hurt.

As we already said: no MVs => zero overhead => no problem. It costs in
the cases where time savings are possible and not otherwise. The type
of queries and their typical run times are different to the OLTP case,
so any additional planning time is likely to be acceptable. I'm sure
we can have a deep disussion about how to optimise the planner for
this; I'm pretty sure there are reasonable answers to the not-small
difficulties along that road.

Most importantly, I want to make sure we don't swing the door shut on
improvements here, especially if you (Tom) are not personally
convinced enough to do the work yourself.

Making transactional MVs work would be in part justified by the
possible existence of automatic lookaside planning, so yes, the two
things are not linked but certainly closely related.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 9.2.3 crashes during archive recovery

2013-03-06 Thread Heikki Linnakangas


On 05.03.2013 14:09, KONDO Mitsumasa wrote:

Hi,

Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
Attempt patch records minRecoveryPoint.
[crash recovery -> record minRecoveryPoint in control file -> archive
recovery]
I think that this is an original intention of Heikki's patch.


Yeah. That fix isn't right, though; XLogPageRead() is supposed to return 
true on success, and false on error, and the patch makes it return 
'true' on error, if archive recovery was requested but we're still in 
crash recovery. The real issue here is that I missed the two "return 
NULL;"s in ReadRecord(), so the code that I put in the 
next_record_is_invalid codepath isn't run if XLogPageRead() doesn't find 
the file at all. Attached patch is the proper fix for this.



I also found a bug in latest 9.2_stable. It does not get latest timeline
and
recovery history file in archive recovery when master and standby
timeline is different.


Works for me.. Can you create a test script for that? Remember to set 
"recovery_target_timeline='latest'".


- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 92adc4e..74a54f6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -4010,7 +4010,7 @@ ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
 retry:
 	/* Read the page containing the record */
 	if (!XLogPageRead(RecPtr, emode, fetching_ckpt, randAccess))
-		return NULL;
+		goto next_record_is_invalid;
 
 	pageHeaderSize = XLogPageHeaderSize((XLogPageHeader) readBuf);
 	targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ;
@@ -4168,7 +4168,7 @@ retry:
 			}
 			/* Wait for the next page to become available */
 			if (!XLogPageRead(&pagelsn, emode, false, false))
-return NULL;
+goto next_record_is_invalid;
 
 			/* Check that the continuation record looks valid */
 			if (!(((XLogPageHeader) readBuf)->xlp_info & XLP_FIRST_IS_CONTRECORD))

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

84 matches

Mail list logo