Re: [HACKERS] [BUGS] BUG #8542: Materialized View with another column_name does not work?

2013-11-01 Thread Kevin Grittner
Michael Paquier michael.paqu...@gmail.com wrote:

 I am not sure that adding a boolean flag introducing a concept
 related to matview inside checkRuleResultList is the best
 approach to solve that. checkRuleResultList is something related
 only to rules, and has nothing related to matviews in it yet.

Well, I was tempted to keep that concept in DefineQueryRewrite()
where the call is made, by calling  the new bool something like
requireColumnNameMatch and not having checkRuleResultList()
continue to use isSelect for that purpose at all. 
DefineQueryRewrite() already references RELKIND_RELATION once,
RELKIND_MATVIEW twice, and RELKIND_VIEW three times, so it would
hardly be introducing a new concept there.  I did it the way I did
in the posted patch because it seemed to more nearly match what Tom
was suggesting.

 I have been pondering myself about this issue, and wouldn't it be
 better to directly enforce targetList in checkRuleResultList call
 at an upper level with the column name list given by users if
 there is any instead of the list of columns of the SELECT query?

 After looking closely at the code this looks to be a better
 approach from a general viewpoint. I didn't got the time to write
 a patch but this is the conclusion I reached after some analysis
 at least...

That doesn't sound like something we could back-patch.

Anyway, I'm not sure why that would be better.  We have rewrite
rules on three kinds of relations -- tables, views, and
materialized views.  In general, a materialized view tends to
support the properties of both a view and a table, with a few
points in the rewrite code that need to pay attention to which kind
of relation the rule applies to.  Among the various validations for
the rewrite, this one validation (matching column names) does not
apply to a non-SELECT rule with a RETURNING clause or to a SELECT
rule which populates a materialized view.  If you were to move the
code to exclude this validation for the latter case, wouldn't you
also need to move the validation for the former case?  Where would
you put that?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #8542: Materialized View with another column_name does not work?

2013-11-01 Thread Kevin Grittner
Ashutosh Bapat ashutosh.ba...@enterprisedb.com wrote:

 CREATE MATERIALIZED VIEW statement ends up being CREATE TABLE AS
 statement underneath with table type matview. In that case, why
 don't I see special treatment only for materialized view and not
 CTAS in general, which allows column names to specified like the
 case in the bug reported.

While the initial population of the matview is a lot like CTAS (and
so far shares some code), the critical difference is that CTAS
doesn't store a rule to support repopulating the table.  This issue
comes up in the validation of that rule for the matview.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How should row-security affects ON UPDATE RESTRICT / CASCADE ?

2013-11-01 Thread Craig Ringer
On 10/30/2013 11:25 AM, Kohei KaiGai wrote:

 +
 +   /*
 +* Row-level security should be disabled in case when foreign-key
 +* relation is queried to check existence of tuples that references
 +* the primary-key being modified.
 +*/
 +   temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE;
 +   if (source_is_pk)
 +   temp_sec_context |= SECURITY_ROW_LEVEL_DISABLED;
 +
 SetUserIdAndSecContext(RelationGetForm(query_rel)-relowner,
 -  save_sec_context | SECURITY_LOCAL_USERID_CHANGE);
 +  temp_sec_context);

This isn't sufficient, as it doesn't kick in when the _target_ is a
primary key. For example, if you run this against the rowsecurity
regression test schema you should get a nice one-row insert, but instead
row-security prevents even the superuser from seeing the target row from
within the FK check, leading to counter-intuitive behaviour like:

test=#  insert into document(did,cid,dlevel,dauthor,dtitle) select 1000,
cid, dlevel, dauthor, dtitle from document where did = 8;
ERROR:  insert or update on table document violates foreign key
constraint document_cid_fkey
DETAIL:  Key (cid)=(44) is not present in table category.

So: what's the logic behind restricting this to source_is_pk ? Shouldn't
*all* FKs be exempt from rowsecurity checks? I think the code should
just be:


 +   temp_sec_context = save_sec_context | SECURITY_LOCAL_USERID_CHANGE
| SECURITY_ROW_LEVEL_DISABLED;

i.e. ALL foreign key checks are exempt from row security filters.

This isn't a complete solution as it doesn't solve another hole. The
owner of the RI target table can create a trigger that runs in this
security context when an ON DELETE CASCADE, ON UPDATE CASCADE or ON
DELETE SET NULL trigger is invoked. They can abuse this trigger to see
rows in other tables, even ones they don't own.



Rather than completely disabling row security in triggers, we might
instead need to disable row security for *tables owned by the user the
RI trigger is being invoked as*. Or just for the table that's the
subject of the RI check.

Otherwise we get this nasty security hole:


SET SESSION AUTHORIZATION rls_regress_user2;

CREATE TABLE secrets(
  id integer primary key,
  cid integer not null,
  dummy text
);

INSERT INTO secrets (id, cid, dummy)
VALUES (1, 11, 'a'), (2, 22, 'b');

ALTER TABLE secrets SET ROW SECURITY FOR ALL TO (cid = 11);

-- rls_regress_user2 only wants user0 to see the rows the row
-- security policy permits

GRANT SELECT ON secrets TO rls_regress_user0;



-- Now, naughty rls_regress_user0 wants to know what's so secret
-- in rls_regress_user2's table, so it sets up a cascade delete
-- between two of its OWN tables, completely unrelated, so it can
-- run a trigger with SECURITY_ROW_LEVEL_DISABLED rights.

SET SESSION AUTHORIZATION rls_regress_user0;

CREATE TABLE leaked (LIKE secrets);

CREATE OR REPLACE FUNCTION malicious_trigger() RETURNS TRIGGER AS $$
BEGIN
  -- Should only see cid=11, but sees both! oops!
  INSERT INTO leaked SELECT * FROM secrets;
  IF tg_op = 'INSERT' OR tg_op = 'UPDATE' THEN
RETURN new;
  ELSE
RETURN old;
  END IF;
END;
$$ LANGUAGE plpgsql;

CREATE TABLE parent(id integer primary key);
INSERT INTO parent(id) VALUES (1);

CREATE TABLE child(
  parent_id integer not null
  references parent(id) on delete cascade on update cascade
);
INSERT INTO child(parent_id) values (1);

CREATE TRIGGER malicious
BEFORE UPDATE OR DELETE ON child
FOR EACH ROW EXECUTE PROCEDURE malicious_trigger();

-- Now rls_regress_user0 only needs to delete the parent row, triggering
-- a cascade via the RI trigger to the child, which invokes the
-- malicious trigger and copies the contents of the target table.

test= DELETE FROM parent;
DELETE 1
test= SELECT * FROM leaked;
 id | cid | dummy
+-+---
  1 |  11 | a
  2 |  22 | b
(2 rows)

oops! rls_regress_user0 should only be able to see id=1, but it bypassed
RLS using the security context of the cascade to see id=2 as well!




-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pg_receivexlog: fixed to work with logical segno 0

2013-11-01 Thread Mika Eloranta
pg_receivexlog calculated the xlog segment number incorrectly
when started after the previous instance was interrupted.

Resuming streaming only worked when the physical wal segment
counter was zero, i.e. for the first 256 segments or so.
---
 src/bin/pg_basebackup/pg_receivexlog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/pg_basebackup/pg_receivexlog.c 
b/src/bin/pg_basebackup/pg_receivexlog.c
index 031ec1a..6f9fcf4 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -171,7 +171,7 @@ FindStreamingStart(uint32 *tli)
progname, dirent-d_name);
disconnect_and_exit(1);
}
-   segno = ((uint64) log)  32 | seg;
+   segno = (((uint64) log)  8) | seg;
 
/*
 * Check that the segment has the right size, if it's supposed 
to be
-- 
1.8.0.1



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-01 Thread Etsuro Fujita
 From: Fujii Masao [mailto:masao.fu...@gmail.com]

 This is what I'm looking for! This feature is really useful for tuning
work_mem
 when using full text search with pg_trgm.
 
 I'm not sure if it's good idea to show the number of the fetches because it
 seems difficult to tune work_mem from that number. How can we calculate how
 much to increase work_mem to avoid lossy bitmap from the number of the fetches
 in EXPLAIN output?

We can calculate that from the following equation in tbm_create():

  nbuckets = maxbytes /
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)),

where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
designated by work_mem, and nbuckets is the estimated number of hashtable
entries we can have within maxbytes.  From this, the size of work_mem within
which we can have every hashtable entry as an exact bitmap is calculated as
follows:

  work_mem = (the number of exact pages + the number of lossy pages) *
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)) /
(1024 * 1024).

I'll show you an example.  The following is the result for work_mem = 1MB:

  postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
 0.02;
 QUERY PLAN
  --
  --
  
   Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766
  width=34) (actual
  time=24.907..1119.961 rows=100047 loops=1)
 Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
  0.02::double
  precision))
 Rows Removed by Index Recheck: 5484114
 Heap Blocks: exact=11975 lossy=46388
 -  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
  width=0) (actual time=22.821..22.821 rows=100047 loops=1)
   Index Cond: ((col2 = 0.01::double precision) AND (col2 =
  0.02::double
  precision))
   Total runtime: 1129.334 ms
  (7 rows)

So, by setting work_mem to

  work_mem = (11975 + 46388) *
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)) /
(1024 * 1024),

which is about 5MB, we have the following (Note that no lossy heap pages!):

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
   QUERY PLAN




 Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) (actual
time=42.981..120.252 rows=1
00047 loops=1)
   Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
precision))
   Heap Blocks: exact=58363
   -  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
(actual time=26.023..26.023 r
ows=100047 loops=1)
 Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
precision))
 Total runtime: 129.304 ms
(6 rows)

BTW, as the EXPLAIN ANALYZE output, the number of exact/lossy heap pages would
be fine with me.

 Anyway, could you add the patch into next CF?

Done.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Shave a few instructions from child-process startup sequence

2013-11-01 Thread Gurjeet Singh
On Thu, Oct 31, 2013 at 11:20 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Amit Kapila amit.kapil...@gmail.com writes:
  On Thu, Oct 31, 2013 at 2:41 AM, Gurjeet Singh gurj...@singh.im wrote:
  Just a small patch; hopefully useful.

  This is valid saving as we are filling array ListenSocket[] in
  StreamServerPort() serially, so during ClosePostmasterPorts() once if
  it encountered PGINVALID_SOCKET, it is valid to break the loop.
  Although savings are small considering this doesn't occur in any
  performance path, still I think this is right thing to do in code.

  It is better to register this patch in CF app list, unless someone
  feels this is not right.

 I think this is adding fragility for absolutely no meaningful savings.
 The existing code does not depend on the assumption that the array
 is filled consecutively and no entries are closed early.  Why should
 we add such an assumption here?


IMHO, typical configurations have one, or maybe two of these array elements
populated; one for TCP 5432 (for localhost or *), and the other for Unix
Domain Sockets. Saving 62 iterations and comparisons in startup sequence
may not be much, but IMHO it does match logic elsewhere.

In fact, this was inspired by the following code in ServerLoop():

ServerLoop()
{
...
/*
 * New connection pending on any of our sockets? If so, fork a child
 * process to deal with it.
 */
if (selres  0)
{
int i;

for (i = 0; i  MAXLISTEN; i++)
{
if (ListenSocket[i] == PGINVALID_SOCKET)
break;
if (FD_ISSET(ListenSocket[i], rmask))
{


And looking for other precedences, I found that initMasks() function also
relies on the array being filled consecutively and the first
PGINVALID_SOCKET marks end of valid array members.

Best regards,
-- 
Gurjeet Singh

http://gurjeet.singh.im/

EnterpriseDB Inc.


[HACKERS] Save Hash Indexes

2013-11-01 Thread Dimitri Fontaine
Hi,

Here's an idea: when a user ask for an Hash Index transparently build a
BTree index over an hash function instead.

Advantages:

  - it works
  - it's crash safe
  - it's (much?) faster than a hash index anyways

Drawbacks:

  - root access concurrency
  - we need a hash_any function stable against pg_upgrade

After talking about it with Heikki, we don't seem to find ways in which
the root access concurrency pattern would be visible enough to matter.

Also, talking with Peter Geoghegan, it's unclear that there's a use case
where a hash index would be faster than a btree index over the hash
function.

Comments?
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread k...@rice.edu
On Fri, Nov 01, 2013 at 01:31:10PM +, Dimitri Fontaine wrote:
 Hi,
 
 Here's an idea: when a user ask for an Hash Index transparently build a
 BTree index over an hash function instead.
 
 Advantages:
 
   - it works
   - it's crash safe
   - it's (much?) faster than a hash index anyways
 
 Drawbacks:
 
   - root access concurrency
   - we need a hash_any function stable against pg_upgrade
 
 After talking about it with Heikki, we don't seem to find ways in which
 the root access concurrency pattern would be visible enough to matter.
 
 Also, talking with Peter Geoghegan, it's unclear that there's a use case
 where a hash index would be faster than a btree index over the hash
 function.
 
 Comments?
 -- 
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
 

Hi Dimitri,

This use of a function index as a safe hash index has been the substitute
for a while. Check the previous threads. It is not a true substitute because
a hash index is O(1) for lookups but a btree is O(log n) so hash indexes have
an advantage for very large numbers on entries. In fact a recent post compared
both the btree substitute and the current hash index and for large indexes the
hash allowed 2X the lookups than the equivalent btree, which is what you would
expect. The use-case is exactly for very large tables/indexes where the index
does not fit in memory, to say nothing of the data itself.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 Here's an idea: when a user ask for an Hash Index transparently build a
 BTree index over an hash function instead.

-1.  If someone asks for a hash index, they should get a hash index.
If you feel the documentation isn't sufficiently clear about the problems
involved, we can work on that.

The bigger picture here is that such an approach amounts to deciding that
no one will ever be allowed to fix hash indexes.  I'm not for that, even
if I'm not volunteering to be the fixer myself.

I also don't believe your claim that this would always be faster than a
real hash index.  What happened to O(1) vs O(log N)?

Lastly: what real-world problem are we solving by kicking that code
to the curb?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 -1.  If someone asks for a hash index, they should get a hash index.
 If you feel the documentation isn't sufficiently clear about the problems
 involved, we can work on that.

Fair enough.

 Lastly: what real-world problem are we solving by kicking that code
 to the curb?

Well how long did we wait for the fix already? The lack of crash safety
and replication ability is not just a drawback, it's about killing the
user adoption.

Maybe we should implement UNLOGGED indexes and then Hash Indexes would
only exist in the UNLOGGED fashion?

But well, yeah ok, my idea is a non-starter.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Andrew Dunstan


On 11/01/2013 09:49 AM, Tom Lane wrote:

Dimitri Fontaine dimi...@2ndquadrant.fr writes:

Here's an idea: when a user ask for an Hash Index transparently build a
BTree index over an hash function instead.

-1.  If someone asks for a hash index, they should get a hash index.
If you feel the documentation isn't sufficiently clear about the problems
involved, we can work on that.

The bigger picture here is that such an approach amounts to deciding that
no one will ever be allowed to fix hash indexes.  I'm not for that, even
if I'm not volunteering to be the fixer myself.

I also don't believe your claim that this would always be faster than a
real hash index.  What happened to O(1) vs O(log N)?

Lastly: what real-world problem are we solving by kicking that code
to the curb?





Yeah, and there's this: I've had at least one client who switched to 
using hash indexes and got a significant benefit from it precisely 
because they aren't WAL logged. They could afford to rebuild the indexes 
in the unlikely event of a crash, but the IO gain was worth it to them. 
Neither could they have tolerated unlogged tables - they wanted crash 
safety for their data. After talking through the various options with 
them they decided this was the best choice, and it might be sad to 
remove that choice from people.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 Yeah, and there's this: I've had at least one client who switched to 
 using hash indexes and got a significant benefit from it precisely 
 because they aren't WAL logged. They could afford to rebuild the indexes 
 in the unlikely event of a crash, but the IO gain was worth it to them. 
 Neither could they have tolerated unlogged tables - they wanted crash 
 safety for their data. After talking through the various options with 
 them they decided this was the best choice, and it might be sad to 
 remove that choice from people.

That's an interesting story, but it seems like what it points to is the
need for a general unlogged index feature, rather than depending on
what's universally agreed to be an implementation deficiency of hash
indexes.  So it sounds like an independent topic.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Cannot create matview when referencing another not-populated-yet matview in subquery

2013-11-01 Thread Kevin Grittner
Laurent Sartran lsart...@gmail.com wrote:

 CREATE MATERIALIZED VIEW t1 AS SELECT text 'foo' AS col1
   WITH NO DATA;
 CREATE MATERIALIZED VIEW t2b AS SELECT * FROM t1
   WHERE col1 = (SELECT LEAST(col1) FROM t1)
   WITH NO DATA;

 ERROR:  materialized view t1 has not been populated
 HINT:  Use the REFRESH MATERIALIZED VIEW command.

 Is this behavior expected?

No, and git bisect shows that it worked until commit
5194024d72f33fb209e10f9ab0ada7cc67df45b7.

Moving to -hackers list for discussion of how to fix it.

It looks like the above commit missed a trick here:

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 791f336..0b47106 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -865,7 +865,8 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 * it is a parameterless subplan (not initplan), we suggest 
that it be
 * prepared to handle REWIND efficiently; otherwise there is no 
need.
 */
-   sp_eflags = eflags  EXEC_FLAG_EXPLAIN_ONLY;
+   sp_eflags = eflags
+    (EXEC_FLAG_EXPLAIN_ONLY | EXEC_FLAG_WITH_NO_DATA);
    if (bms_is_member(i, plannedstmt-rewindPlanIDs))
    sp_eflags |= EXEC_FLAG_REWIND;

The test case provided works with this change.  Does anyone see a
problem with that?  If not, I'll push it with the above test case
added to the regression tests.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Andres Freund
On 2013-11-01 09:49:57 -0400, Tom Lane wrote:
 Lastly: what real-world problem are we solving by kicking that code
 to the curb?

It makes hashed lookups much easier to use. Currently if you want
indexed access over wide columns and equality is all you need you need
to write rather awkward queries to make that happen in each query,
something like:
WHERE hash(column) = hash('value') AND column = 'value'.

So some magic that hides having to do that would be nice, even it
doesn't have O(log(n)) properties. The interesting part is that the
index gets much denser and smaller, and that the individual comparisons
are much cheaper.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API bug in DetermineTimeZoneOffset()

2013-11-01 Thread Tom Lane
I wrote:
 The second attached patch, to be applied after the first, removes the
 existing checks of HasCTZSet in the backend.  The only visible effect of
 this, AFAICT, is that to_char's TZ format spec now delivers something
 useful instead of an empty string when a brute-force timezone is in use.
 I could be persuaded either way as to whether to back-patch this part.
 From one standpoint, this to_char behavioral change is clearly a bug fix
 --- but it's barely possible that somebody out there thought that
 returning an empty string for TZ was actually the intended/desirable
 behavior.

Any opinions about whether to back-patch this part or not?  It seems
like a bug fix, but on the other hand, I don't recall any complaints
from the field about to_char's TZ spec not working with brute-force zones.
So maybe the prudent thing is to leave well enough alone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Atri Sharma
On Friday, November 1, 2013, k...@rice.edu wrote:

 On Fri, Nov 01, 2013 at 01:31:10PM +, Dimitri Fontaine wrote:
  Hi,
 
  Here's an idea: when a user ask for an Hash Index transparently build a
  BTree index over an hash function instead.
 
  Advantages:
 
- it works
- it's crash safe
- it's (much?) faster than a hash index anyways
 
  Drawbacks:
 
- root access concurrency
- we need a hash_any function stable against pg_upgrade
 
  After talking about it with Heikki, we don't seem to find ways in which
  the root access concurrency pattern would be visible enough to matter.
 
  Also, talking with Peter Geoghegan, it's unclear that there's a use case
  where a hash index would be faster than a btree index over the hash
  function.
 
  Comments?
  --
  Dimitri Fontaine
  http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
 

 Hi Dimitri,

 This use of a function index as a safe hash index has been the substitute
 for a while. Check the previous threads. It is not a true substitute
 because
 a hash index is O(1) for lookups but a btree is O(log n) so hash indexes
 have
 an advantage for very large numbers on entries. In fact a recent post
 compared
 both the btree substitute and the current hash index and for large indexes
 the
 hash allowed 2X the lookups than the equivalent btree, which is what you
 would
 expect. The use-case is exactly for very large tables/indexes where the
 index
 does not fit in memory, to say nothing of the data itself.

 Regards,
 Ken

 Hi,


I agree too.Theoretically, hash tables are constant time lookup while btree
are logarithmic. There may be cases where we may get better performance
with btree, but replacing hash indexes with them completely is not really a
good idea,IMHO.

Is the motivation behind this idea drawn from workloads where btree perform
better, or are you trying to fix issues with hash indexes?

 --
 Sent via pgsql-hackers mailing list 
 (pgsql-hackers@postgresql.orgjavascript:;
 )
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Jeff Janes
On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Hi,

 Here's an idea: when a user ask for an Hash Index transparently build a
 BTree index over an hash function instead.



Could something be added to the planner so that you can just build a btree
index on a hash expression, and have the planner automatically realize it
can used for an equality query because if a=b, then md5(a)=md5(b), at least
for some data types?  A general ability to recognize things like that seems
like it might have other uses as well.

Cheers,

Jeff


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Daniel Farina
On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Hi,

 Here's an idea: when a user ask for an Hash Index transparently build a
 BTree index over an hash function instead.

 Advantages:

   - it works
   - it's crash safe
   - it's (much?) faster than a hash index anyways

 Drawbacks:

   - root access concurrency
   - we need a hash_any function stable against pg_upgrade

 After talking about it with Heikki, we don't seem to find ways in which
 the root access concurrency pattern would be visible enough to matter.

 Also, talking with Peter Geoghegan, it's unclear that there's a use case
 where a hash index would be faster than a btree index over the hash
 function.

 Comments?

I haven't met a single Heroku user that has stumbled into this
landmine.  Normally I am very weary of such landmine laden features,
but this one is there and doesn't seem to have detonated at any point.
 I guess the obscure nature of those indexes and the stern warning in
the documentation has sufficed to discourage their use.

Given that experience, I think foreclosing fixing hash indexes is
premature, and doesn't seem to be hurting inexperienced users of this
stripe.  Maybe there are yet other reasons to argue the subject,
though...it's not like the current user base relies on the behavior
as-is either, having seemingly steered clear just about altogether.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Something fishy happening on frogmouth

2013-11-01 Thread Noah Misch
On Fri, Nov 01, 2013 at 12:27:31AM -0400, Robert Haas wrote:
 On Thu, Oct 31, 2013 at 7:48 PM, Heikki Linnakangas hlinnakan...@vmware.com 
 wrote:
  On 31.10.2013 16:43, Robert Haas wrote:
  There should be no cases where the main shared memory
  segment gets cleaned up and the dynamic shared memory segments do not.
 
  1. initdb -D data1
  2. initdb -D data2
  3. postgres -D data1
  4. killall -9 postgres
  5. postgres -D data2
 
  The system V shmem segment orphaned at step 4 will be cleaned up at step 5.
  The DSM segment will not.

Note that dynamic_shared_memory_type='mmap' will give the desired behavior.

 If we want the behavior, we could mimic what the main shared memory
 code does here: instead of choosing a random value for the control
 segment identifier and saving it in a state file, start with something
 like port * 100 + 100 (the main shared memory segment uses port *
 100, so we'd want something at least slightly different) and search
 forward one value at a time from there until we find an unused ID.

This approach used for the main sysv segment has its own problems.  If the
first postmaster had to search forward but the second postmaster does not, the
second postmaster will not reach the old segment to clean it up.

It might be suitably-cheap insurance to store the DSM control segment handle
in PGShmemHeader.  Then if, by whatever means good or bad, we find a main sysv
segment to clean up, we can always clean up the associated DSM segment(s).

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Daniel Farina
On Fri, Nov 1, 2013 at 8:52 AM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Nov 1, 2013 at 6:31 AM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Also, talking with Peter Geoghegan, it's unclear that there's a use case
 where a hash index would be faster than a btree index over the hash
 function.

 Comments?

 I haven't met a single Heroku user that has stumbled into this
 landmine.  Normally I am very weary of such landmine laden features,
 but this one is there and doesn't seem to have detonated at any point.
  I guess the obscure nature of those indexes and the stern warning in
 the documentation has sufficed to discourage their use.

 Given that experience, I think foreclosing fixing hash indexes is
 premature, and doesn't seem to be hurting inexperienced users of this
 stripe.  Maybe there are yet other reasons to argue the subject,
 though...it's not like the current user base relies on the behavior
 as-is either, having seemingly steered clear just about altogether.

In addendum, though, some users *have* done the hash-function + btree
trick to make keys smaller.  They tend to resort to full blown
cryptographic hashes and assume/enforce non-collision, but it's
somewhat awkward and finicky.  So while perhaps commandeering the
'hash' index type is a bit over-aggressive, making that use case work
better would probably carry positive impact.

I can say most of my indexes over text are *never* used for range
queries, so hashing them down one would think would be great.  The
fiddliness of applying expression indexes over all that forecloses
getting the benefits that might be possible, there: only certain heavy
workloads will receive that level of care.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL 9.3 beta breaks some extensions make install

2013-11-01 Thread Marti Raudsepp
Hi Andrew,

On Mon, Sep 23, 2013 at 6:43 PM, Andrew Dunstan and...@dunslane.net wrote:
 I'm working on it. It appears to have a slight problem or two I want to fix
 at the same time, rather than backpatch something broken.

Any progress on this? I notice that the fixes didn't make it into 9.3.1.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Hannu Krosing
On 11/01/2013 03:49 PM, Andres Freund wrote:
 On 2013-11-01 09:49:57 -0400, Tom Lane wrote:
 Lastly: what real-world problem are we solving by kicking that code
 to the curb?
 It makes hashed lookups much easier to use. Currently if you want
 indexed access over wide columns and equality is all you need you need
 to write rather awkward queries to make that happen in each query,
 something like:
 WHERE hash(column) = hash('value') AND column = 'value'.

 So some magic that hides having to do that would be nice, even it
 doesn't have O(log(n)) properties. 
Is'nt there a way to do this using a special operator class ?
 The interesting part is that the
 index gets much denser and smaller, and that the individual comparisons
 are much cheaper.

 Greetings,

 Andres Freund



-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] buffile.c resource owner breakage on segment extension

2013-11-01 Thread Andres Freund
Hi,

The attached testcase demonstrates that it currently is possible that
buffile.c segments get created belonging to the wrong resource owner
leading to WARNINGs ala temporary file leak: File %d still referenced,
ERRORs like write failed, asserts and segfaults.

The problem is that while BufFileCreateTemp() callers like tuplestore
take care to use proper resource owners for it, they don't during
BufFileWrite()-BufFileDumpBuffer()-extendBufFile(). The last in that
chain creates a new tempfile which then gets owned by
CurrentResourceOwner. Which, like in the testcase, might a
subtransaction's one.

While not particularly nice, given the API, it seems best for buffile.c
to remember the resource owner used for the original segment and
temporarily set that during the extension.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/storage/file/buffile.c b/src/backend/storage/file/buffile.c
index ac8cd1d..feb7011 100644
--- a/src/backend/storage/file/buffile.c
+++ b/src/backend/storage/file/buffile.c
@@ -38,6 +38,7 @@
 #include storage/fd.h
 #include storage/buffile.h
 #include storage/buf_internals.h
+#include utils/resowner.h
 
 /*
  * We break BufFiles into gigabyte-sized segments, regardless of RELSEG_SIZE.
@@ -67,6 +68,7 @@ struct BufFile
 	bool		isTemp;			/* can only add files if this is TRUE */
 	bool		isInterXact;	/* keep open over transactions? */
 	bool		dirty;			/* does buffer need to be written? */
+	ResourceOwner resowner;		/* Resowner to use when extending */
 
 	/*
 	 * current pos is position of start of buffer within the logical file.
@@ -118,11 +120,21 @@ static void
 extendBufFile(BufFile *file)
 {
 	File		pfile;
+	ResourceOwner saved = CurrentResourceOwner;
+
+	/*
+	 * Temporarily set resource owner to the one used for
+	 * BufFileCreateTemp so the new segment has the same lifetime as
+	 * the first.
+	 */
+	CurrentResourceOwner = file-resowner;
 
 	Assert(file-isTemp);
 	pfile = OpenTemporaryFile(file-isInterXact);
 	Assert(pfile = 0);
 
+	CurrentResourceOwner = saved;
+
 	file-files = (File *) repalloc(file-files,
 	(file-numFiles + 1) * sizeof(File));
 	file-offsets = (off_t *) repalloc(file-offsets,
@@ -155,7 +167,7 @@ BufFileCreateTemp(bool interXact)
 	file = makeBufFile(pfile);
 	file-isTemp = true;
 	file-isInterXact = interXact;
-
+	file-resowner = CurrentResourceOwner;
 	return file;
 }
 
SET work_mem = '100MB';
DROP TABLE IF EXISTS data CASCADE;
CREATE TABLE data(data text);
CREATE FUNCTION leak_file() RETURNS SETOF data LANGUAGE plpgsql AS $$
DECLARE
i int;
BEGIN
FOR i IN 1..3 LOOP
DECLARE
r record;
BEGIN
-- make sure we need to extend a tempfile past one segment in a 
subtransaction
FOR i IN 1..30 LOOP
r :=  ('('||repeat('quite a bit of stupid text', 
100)||')')::data;
RETURN NEXT r;
END LOOP;
RAISE EXCEPTION 'frakkedifrak';
EXCEPTION WHEN others THEN
RAISE NOTICE 'got exception %', sqlerrm;
END;
END LOOP;
END
$$

COPY (SELECT * FROM leak_file()) TO '/dev/null';

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] missing RelationCloseSmgr in FreeFakeRelcacheEntry?

2013-11-01 Thread Andres Freund
Hi Heikki, All,

On 2013-10-29 02:16:23 +0100, Andres Freund wrote:
 Looking a bit closer it seems to me that the fake relcache
 infrastructure seems to neglect the chance that something used the fake
 entry to read something which will have done a RelationOpenSmgr(). Which
 in turn will have set rel-rd_smgr-smgr_owner to rel.
 When we then pfree() the fake relation in FreeFakeRelcacheEntry we have
 a dangling pointer.
 
 It confuses me a bit that this doesn't cause issues during recovery more
 frequently... The code seems to have been that way since
 a213f1ee6c5a1bbe1f074ca201975e76ad2ed50c which introduced fake relcache
 entries. It looks like XLogCloseRelationCache() indirectly has done a
 RelationCloseSmgr().

This looks like it was caused by
a213f1ee6c5a1bbe1f074ca201975e76ad2ed50c which you committed, any chance
you could commit the trivial fix?

 diff --git a/src/backend/access/transam/xlogutils.c 
 b/src/backend/access/transam/xlogutils.c
 index 5429d5e..ee7c892 100644
 --- a/src/backend/access/transam/xlogutils.c
 +++ b/src/backend/access/transam/xlogutils.c
 @@ -433,6 +433,7 @@ CreateFakeRelcacheEntry(RelFileNode rnode)
  void
  FreeFakeRelcacheEntry(Relation fakerel)
  {
 +   RelationCloseSmgr(fakerel);
 pfree(fakerel);
  }

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Save Hash Indexes

2013-11-01 Thread Robert Haas
On Fri, Nov 1, 2013 at 9:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The bigger picture here is that such an approach amounts to deciding that
 no one will ever be allowed to fix hash indexes.  I'm not for that, even
 if I'm not volunteering to be the fixer myself.

Yeah.

I have thought about doing some work on this, although I don't know
when I'll ever have the time.  As far as I can see, the basic problem
is that we need a better way of splitting buckets.  Right now,
splitting a bucket means locking it, scanning the entire bucket chain
and moving ~1/2 the tuples to the new bucket, and then unlocking it.
Since this is a long operation, we have to use heavyweight locks to
protect the buckets, which is bad for concurrency.  Since it involves
a modification to an arbitrarily large number of pages that has to be
atomic, it's not possible to WAL-log it sensibly  -- and in fact, I
think this is really the major obstacle to being able to implementing
WAL-logging for hash indexes.

I think we could work around this problem as follows.   Suppose we
have buckets 1..N and the split will create bucket N+1.  We need a
flag that can be set and cleared on the metapage, call it
split-in-progress, and a flag that can optionally be set on particular
index tuples, call it moved-by-split.  We will allow scans of all
buckets and insertions into all buckets while the split is in
progress, but (as now) we will not allow more than one split to be in
progress at the same time.  We start the split by updating the
metapage to incrementing the number of buckets and set the
split-in-progress flag.  While the split-in-progress flag is set, any
scans of bucket N+1 first scan that bucket, ignoring any tuples
flagged moved-by-split, and then ALSO scan bucket (N+1)/2.  The
moved-by-split flag never has any effect except when scanning the
highest-numbered bucket that existed at the start of that particular
scan, and then only if the split-in-progress flag was also set at that
time.

Once the split operation has set the split-in-progress flag, it will
begin scanning bucket (N+1)/2.  Every time it finds a tuple that
properly belongs in bucket N+1, it will insert the tuple into bucket
N+1 with the moved-by-split flag set.  Tuples inserted by anything
other than a split operation will leave this flag clear, and tuples
inserted while the split is in progress will target the same bucket
that they would hit if the split were already complete.  Thus, bucket
N+1 will end up with a mix of moved-by-split tuples, coming from
bucket (N+1)/2, and unflagged tuples coming from parallel insertion
activity.  When the scan of bucket (N+1)/2 is complete, we know that
bucket N+1 now contains all the tuples that are supposed to be there,
so we clear the split-in-progress flag.  Future scans of both buckets
can proceed normally.

Of course, there's a bit of a problem here: we haven't actually
removed any tuples from bucket (N+1)/2.  That's not a correctness
problem; we'll check for an exact hash-value match before returning
any tuples from this bucket from the scan, so tuples that aren't
present in the bucket but should be will just ignored anyway.  But
it's clearly something that needs to be fixed before we can really
declare victory.  We need to wait lfor the completion of any scans
that began before we finished populating bucket N+1, because otherwise
we might remove tuples that they're still expecting to find in bucket
(N+1)/2.  I'm not sure if there's some clever trick we can use to
accomplish this; e.g. if the scan will always maintain a pin, we could
take a buffer cleanup lock on all the buffers in buckets N+1 and
(N+1)/2 in the same sequence a scan would visit them.  Of course, even
if that works, it might be a while if somebody's left a cursor lying
around.  And even if scans can't be counted on to maintain a pin, a
point about which I'm unsure off-hand, then things get even trickier.
But maybe there's some solution.

Anyway, once we out-wait concurrent scans, we can go back and nuke
everything from (N+1)/2 that no longer maps onto that bucket.  It
might be possible to make this work opportunistically, like HOT
cleanup: if we're scanning a buffer, and RecentGlobalXmin has
progressed enough that we know there can't be any pre-split scans in
progress (or if we can recognize in some other inexpensive way that
old scans must be gone), and we see a tuple there that no longer maps
onto that bucket, we scan the page and remove all such tuples.

This is full of hand-waving, but I'd be curious to know whether the
approach is perceived to have any merit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] BUG #8573: int4range memory consumption

2013-11-01 Thread Tom Lane
g.vanluffe...@qipc.com writes:
 int4range ( and any other range function) consumes much memory when used in
 a select statement on a big table.

The problem is that range_out leaks memory, as a consequence of creating a
number of intermediate strings that it doesn't bother to free.  I don't
believe it's the only output function that leaks memory, but it does
so with particular vim: now that we've increased the initial size of
StringInfo buffers, it's probably wasting upwards of 2K per call.

While we could doubtless hack range_out to release those strings again,
it seems to me that that's just sticking a finger in the dike.  I'm
inclined to think that we really ought to solve this class of problems
once and for all by fixing printtup.c to run the output functions in a
temporary memory context, which we could reset once per row to recover all
memory used.  That would also let us get rid of the inadequate kluges that
printtup currently uses to try to minimize leakage, namely forced
detoasting of varlena fields and forced pfree'ing of the strings returned
by output functions.  (There is no other context in which we imagine that
it's safe to pfree a function's result, and there are a number of
datatypes for which it'd make sense to return constant strings, were it
not for this kluge.)

It's possible that this would result in some net slowdown in tuple output;
but it's also possible that eliminating the retail pfree's in favor of a
single context reset per tuple would make for a net savings.  In any case,
we're already using a reset-per-row approach to memory management of
output function calls in COPY OUT, and I know for a fact that we've
squeezed that code path as hard as we could.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] API bug in DetermineTimeZoneOffset()

2013-11-01 Thread Robert Haas
On Fri, Nov 1, 2013 at 10:50 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 The second attached patch, to be applied after the first, removes the
 existing checks of HasCTZSet in the backend.  The only visible effect of
 this, AFAICT, is that to_char's TZ format spec now delivers something
 useful instead of an empty string when a brute-force timezone is in use.
 I could be persuaded either way as to whether to back-patch this part.
 From one standpoint, this to_char behavioral change is clearly a bug fix
 --- but it's barely possible that somebody out there thought that
 returning an empty string for TZ was actually the intended/desirable
 behavior.

 Any opinions about whether to back-patch this part or not?  It seems
 like a bug fix, but on the other hand, I don't recall any complaints
 from the field about to_char's TZ spec not working with brute-force zones.
 So maybe the prudent thing is to leave well enough alone.

I vote for leaving it alone until somebody complains.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] buffile.c resource owner breakage on segment extension

2013-11-01 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 While not particularly nice, given the API, it seems best for buffile.c
 to remember the resource owner used for the original segment and
 temporarily set that during the extension.

Hm, yeah, that seems right.  It's just like repalloc keeping the memory
chunk in its original context.  The comments here are a bit inadequate...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-01 Thread Jim Nasby
On Oct 31, 2013, at 11:04 AM, Joe Love j...@primoweb.com wrote:
 In postgres 9.2 I have a function that is relatively expensive.  When I write 
 a query such as:
 
 select expensive_function(o.id),o.* from offeirng o where valid='Y' order by 
 name limit 1;
 
 the query runs slow and appears to be running the function on each ID, which 
 in this case should be totally unnecessary as it really only needs to run on 
 1 row.
 
 When I rewrite the query like so:
 
 select expensive_function(o.id), o.*
 from (select *offering where valid='Y' order by name limit 1) o;
 
 the expensive function only runs once and thus, much faster. I would think 
 that the optimizer could handle this situation, especially when limit or 
 offset is used and the expensive function is not used in a group by, order by 
 or where.

Does anyone know what the SQL standard says about this, if anything? I can't 
see any way that this would change the result set, but of course if the 
function has external effects this would make a difference...

Re: [HACKERS] [BUGS] BUG #8573: int4range memory consumption

2013-11-01 Thread Jim Nasby
On Nov 1, 2013, at 2:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 g.vanluffe...@qipc.com writes:
 int4range ( and any other range function) consumes much memory when used in
 a select statement on a big table.
 
 The problem is that range_out leaks memory, as a consequence of creating a
 number of intermediate strings that it doesn't bother to free.  I don't
 believe it's the only output function that leaks memory, but it does
 so with particular vim: now that we've increased the initial size of
 StringInfo buffers, it's probably wasting upwards of 2K per call.
 
 While we could doubtless hack range_out to release those strings again,
 it seems to me that that's just sticking a finger in the dike.  I'm
 inclined to think that we really ought to solve this class of problems
 once and for all by fixing printtup.c to run the output functions in a
 temporary memory context,
...
 we're already using a reset-per-row approach to memory management of
 output function calls in COPY OUT, and I know for a fact that we've
 squeezed that code path as hard as we could.

+1. COPY is actually the case I was worried about… if you're dealing with large 
amounts of data in other clients ISTM that other things will bottleneck before 
the extra memory context.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Optimizer improvement

2013-11-01 Thread Jim Nasby
On Oct 31, 2013, at 2:57 PM, Kevin Grittner kgri...@ymail.com wrote:
 Joe Love j...@primoweb.com wrote:
 
 In postgres 9.2 I have a function that is relatively expensive.
 
 What did you specify in the COST clause on the CREATE FUNCTION
 statement?

Should that really matter in this case? ISTM we should always handle LIMIT 
before moving on to the SELECT clause…?

Re: Handle LIMIT/OFFSET before select clause (was: [HACKERS] Feature request: optimizer improvement)

2013-11-01 Thread Tom Lane
Jim Nasby j...@nasby.net writes:
 On Oct 31, 2013, at 11:04 AM, Joe Love j...@primoweb.com wrote:
 In postgres 9.2 I have a function that is relatively expensive.  When I 
 write a query such as:
 
 select expensive_function(o.id),o.* from offeirng o where valid='Y' order by 
 name limit 1;

 Does anyone know what the SQL standard says about this, if anything?

The computational model is that you evaluate the SELECT list before
sorting; this must be so since you can write

 select somefunc(x) as y from tab order by y;

In the general case, therefore, it's impossible to avoid evaluating the
function at all rows.  I'm not sure what the spec says about whether it's
okay to skip evaluation of functions that would be evaluated in a naive
implementation of the computational model, so it's possible that what
the OP is asking for is directly contrary to spec.  But more likely they
permit implementations to skip unnecessary calls, if indeed they address
this at all.

As far as PG goes, I think the excess calls would only occur if the plan
includes an explicit sort step, since the select list would be evaluated
before the sort step.  If you've got an index on name (or maybe you'd
need (valid, name) if there aren't many rows with valid = 'Y') I'd expect
it to pick out the minimal name row with the index, avoiding any sort,
and then the function would only be evaluated on the single fetched row.
But these are implementation details not anything you're going to find
support for in the spec.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Optimizer improvement

2013-11-01 Thread Atri Sharma
On Friday, November 1, 2013, Jim Nasby wrote:

 On Oct 31, 2013, at 2:57 PM, Kevin Grittner 
 kgri...@ymail.comjavascript:_e({}, 'cvml', 'kgri...@ymail.com');
 wrote:

 Joe Love j...@primoweb.com javascript:_e({}, 'cvml',
 'j...@primoweb.com'); wrote:

 In postgres 9.2 I have a function that is relatively expensive.


 What did you specify in the COST clause on the CREATE FUNCTION
 statement?


 Should that really matter in this case? ISTM we should always handle LIMIT
 before moving on to the SELECT clause…?


+1

It's sounds straight logical


-- 
Regards,

Atri
*l'apprenant*


Re: [HACKERS] [BUGS] BUG #8573: int4range memory consumption

2013-11-01 Thread Tom Lane
I wrote:
 It's possible that this would result in some net slowdown in tuple output;
 but it's also possible that eliminating the retail pfree's in favor of a
 single context reset per tuple would make for a net savings.  In any case,
 we're already using a reset-per-row approach to memory management of
 output function calls in COPY OUT, and I know for a fact that we've
 squeezed that code path as hard as we could.

It appears that indeed, the reset-per-row approach is marginally faster
than the existing code.  It's just barely faster with a couple of columns
of output, for instance I get about 660 vs 665 msec for
   select x,x from generate_series(1,100) x;
but the advantage grows for more columns, which is expected since we're
getting rid of more pfree's.  With ten integer columns I see 1650 vs
1710 msec, for example.

So I see no downside to fixing it like this, and will work on a complete
patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] appendPQExpBufferVA vs appendStringInfoVA

2013-11-01 Thread David Rowley
Tom commited some changes to appendStringInfoVA a few weeks ago which
allows it to return the required buffer size if the current buffer is not
big enough.

On looking at appendPQExpBufferVA I'm thinking it would be nice if it could
make use of the new pvsnprintf function to bring the same potential
performance improvement in to there too. My vision of how
appendPQExpBufferVA would look after the change is pretty much exactly the
same as appendStringInfoVA, which make me think... Why do we even have
appendPQExpBufferVA ? The only reason that I can think of is that it is
used in front end applications and allocates memory differently... Is this
the only reason or is there some other special reason for this function
that I can't think of?

If someone wants to give me some guidance on how or if all this should
re-factored, I'll happily supply a patch.

Regards

David Rowley


Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA

2013-11-01 Thread Tom Lane
David Rowley dgrowle...@gmail.com writes:
 Tom commited some changes to appendStringInfoVA a few weeks ago which
 allows it to return the required buffer size if the current buffer is not
 big enough.

 On looking at appendPQExpBufferVA I'm thinking it would be nice if it could
 make use of the new pvsnprintf function to bring the same potential
 performance improvement in to there too.

Uh ... it does contain pretty much the same algorithm now.  We can't
simply use pvsnprintf there because exit-on-error is no good for
libpq's purposes, so unless we want to rethink that, a certain
amount of code duplication is unavoidable.  But they both understand
about C99 vsnprintf semantics now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] appendPQExpBufferVA vs appendStringInfoVA

2013-11-01 Thread Robert Haas
On Fri, Nov 1, 2013 at 9:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 David Rowley dgrowle...@gmail.com writes:
 Tom commited some changes to appendStringInfoVA a few weeks ago which
 allows it to return the required buffer size if the current buffer is not
 big enough.

 On looking at appendPQExpBufferVA I'm thinking it would be nice if it could
 make use of the new pvsnprintf function to bring the same potential
 performance improvement in to there too.

 Uh ... it does contain pretty much the same algorithm now.  We can't
 simply use pvsnprintf there because exit-on-error is no good for
 libpq's purposes, so unless we want to rethink that, a certain
 amount of code duplication is unavoidable.  But they both understand
 about C99 vsnprintf semantics now.

I have often found it frustrating that we have appendStringInfo* for
the backend and appendPQExpBuffer* for the frontend.  It'd be nice to
have one API that could be used in both places, somehow.  There seems
to be a lot of interest (including on my part) in writing code that
can be compiled in either environment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature request: Optimizer improvement

2013-11-01 Thread David Johnston
Jim Nasby-2 wrote
 Should that really matter in this case? ISTM we should always handle LIMIT
 before moving on to the SELECT clause…?

SELECT generate_series(1,10) LIMIT 1

David J.



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Feature-request-Optimizer-improvement-tp5776589p5776707.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dsm use of uint64

2013-11-01 Thread Peter Eisentraut
On Mon, 2013-10-28 at 12:17 -0400, Robert Haas wrote:
 On Sun, Oct 27, 2013 at 11:34 PM, Noah Misch n...@leadboat.com wrote:
  On Fri, Oct 25, 2013 at 10:11:41PM -0400, Robert Haas wrote:
  When I wrote the dynamic shared memory patch, I used uint64 everywhere
  to measure sizes - rather than, as we do for the main shared memory
  segment, Size.  This now seems to me to have been the wrong decision;

This change is now causing compiler warnings on 32-bit platforms.  You
can see them here, for example:
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwingdt=2013-11-01%2020%3A45%3A01stg=make



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers