[HACKERS] Question: CREATE EXTENSION and create schema permission?

2011-08-20 Thread Kohei KaiGai
CreateExtension() possibly creates a new schema when the supplied
extension was not relocatable and the target schema was given by
control file of the extension.
However, it allows users to create a new schema with his ownership,
even if current user does not have permission to create a new schema.

Oid extowner = GetUserId();
  :
else if (control->schema != NULL)
{
/*
 * The extension is not relocatable and the author gave us a schema
 * for it.  We create the schema here if it does not already exist.
 */
schemaName = control->schema;
schemaOid = get_namespace_oid(schemaName, true);

if (schemaOid == InvalidOid)
{
schemaOid = NamespaceCreate(schemaName, extowner);
/* Advance cmd counter to make the namespace visible */
CommandCounterIncrement();
}
}

It seems to me that we should inject permission checks here like as
CreateSchemaCommand() doing.

/*
 * To create a schema, must have schema-create privilege on the current
 * database and must be able to become the target role (this does not
 * imply that the target role itself must have create-schema privilege).
 * The latter provision guards against "giveaway" attacks.  Note that a
 * superuser will always have both of these privileges a fortiori.
 */
aclresult = pg_database_aclcheck(MyDatabaseId, saved_uid, ACL_CREATE);
if (aclresult != ACLCHECK_OK)
aclcheck_error(aclresult, ACL_KIND_DATABASE,
   get_database_name(MyDatabaseId));

I didn't follow the discussion about extension so much when it got merged.
Please tell me, if it was a topic already discussed before.

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
> > By your argument, we can say that no-one will create a index with a
> function
> > like random(), time(), date(), broken operators etc. Its hard to imagine
> a
> > index in which a a user will only want to insert and never select on it.
>
> The whole point of this optimization is to make index access cheaper,
> not more expensive.  You seem convinced it's done the opposite, but
> you haven't offered much evidence (or any test results) to support
> that proposition.
>

I hope you are referring to thick indexes/covering indexes/indexes with
snapshot. Why do you think its done the opposite? In fact all the other
databases like Oracle, SQL-Server, Sybase implement the indexes with
snapshot (that's the only one they support). It makes the index tuple larger
by 8 bytes, but avoids the heap-fetch. I think, i ran a couple of
benchmarks, when i submitted the patch and showed the improvement. The
trade-off in that case was simple. Size of the index Vs avoiding a disk I/O.
User still has the choice of creating indexes without snapshot( it was
provided as an optional index).


>
> What we decided NOT to do is put xmin/xmax/cid into the index tuple,
> for precisely the reasons you mention.  That would be catastrophic
> both for write operations to the table, and for the size of the index.
>
>
Why it would be catastrophic for write operations on table?? -please explain
me.
The trade-off in that case was simple. Size of the index Vs avoiding a disk
I/O. There was no catastrophic damage on the size of the index, as far as i
can see.

I made this point, because Heikki pointed out that since no-one is
complaining about some performance problem, and so we can assume that it
doesn't exist. But the thick index proposal was shot down on the context,
some one might create a index on a function like random(), time(). date() or
with broken operators, effectively meaning that you can insert into the
index and cannot select back. We are already doing unique checks and
referential integrity checks on that kind of indexes(which would all be
wrong), but still we should not be working in that area, to help user not
make that mistake of creating such indexes. So we should apply the same
principle for decision making here also.

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram

>  wrote:
> > by your argument, if WALInserLock is held for 't' seconds, you should
> > definitely be holding visibility map lock for more than time frame 't'.
>
> Nope, that's not how it works.  Perhaps you should read the code.
> See, e.g., heap_update().
>
> --
>
OK. I took a look at the patch you have supplied in
http://postgresql.1045698.n5.nabble.com/crash-safe-visibility-map-take-five-td4377235.html
There is a code like this.

 {
 all_visible_cleared = true;
 PageClearAllVisible(BufferGetPage(buffer));
+visibilitymap_clear(relation,
+ItemPointerGetBlockNumber(&(heaptup->t_self)),
+&vmbuffer);
 }

Here, you are not making an entry into the WAL. then there is an assumption
that the two bits will be in sync without any WAL entry. There is a chance
that the visibility map might be affected by partial page writes, where
clearing of a particular bit might not have been changed. And i am thinking
a lot of such issues. Can you just explain the background logic behind
ignoring the principle of WAL logging? What are the implemented principles,
that protect the Visibility map pages??

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
   I think that you have switched gears here and are in this paragraph

> talking about the 8.4-era visibility map changes rather than my recent
> work.  There is zero evidence that those changes caused any
> performance problem.  I've spent a large chunk of the last four months
> looking for scalability problems and I haven't come across any
> evidence that this is an issue.  If you think it is, let's have a test
> case.
>

Consider the TPC-C benchmark. Currently on a four core box, Oracle does
29 transactions per minute. Say we are doing something around half of
this. So a page should get quickly filled. If a vacuum runs and the
transactions are not touched by the MakePayment transaction, then it will be
marked as AllVisible. When the MakePayment transaction updates, it should
clear that bit. If we test it out, with too little warehouses, we may not
see the effect. Of Course the PGPROC infrastructure for generating
transaction ids might be the biggest culprit, but if you ignore that this
should be visible.

Maybe.  Of course, we're only talking about cases where the index-only
> scan optimization is in use (which is not all of them).

But are you saying that, the optimization of looking at visibility map will
happen only for Index-only scans and will not use the visibility map
infrastructure for the normal index scans? That's definitely a good idea and
improvement.

>> d) In addition, currently there is no WAL Logging, while the bit is
cleared,
>> which would not be the case in future and hence the exclusive lock held
on
>> the visibility map is going to be held for a longer time.

> This is false and has been false since the visibility map was first
implemented.

I can't understand this. If you are not doing this, then it would cause
consistency issues. Are you saying, we have a crash safe visibility map, but
you don't follow "log the change before changing the contents"/ WAL
principle. If so, please explain in detail. If you are doing it in the
normal way, then you should be logging the changes before making the changes
to the buffer and during that timeframe, you should be holding the lock on
the buffer. Heikki specifically pointed out, that you have brought in the
WAL Logging of visibility map, within the critical section.

Heikki's comments, i am quoting:
 "I believe Robert's changes to make the visibility map crash-safe covers
that. Clearing the bit in the visibility map now happens within the same
critical section as clearing the flag on the heap page and writing the WAL
record."

I would be able to respond to your other statements, once we are clear here.

Thanks,
Gokul.


Re: [HACKERS] CONGRATULATIONS, David!

2011-08-20 Thread Lou Picciano
Congratulations, David Fetter - on his new arrival! It's a big day! 


Re: [HACKERS] [PL/pgSQL] %TYPE and array declaration - patch

2011-08-20 Thread Wojciech Muła
On Sun, 7 Aug 2011 14:57:36 +0200 Wojciech Muła
 wrote:

> Hi all, does anybody work on this TODO item?
> http://wiki.postgresql.org/wiki/Todo#PL.2FpgSQL
> 
> I didn't find any related posting/bug report.

Hi, I've prepared simple patch, please review.

Since exact array defintion isn't needed anywhere, code
detects only if %TYPE is followed by tokens matching
regexp ('[' ICONST ']')+. This information is used during
type construction.

w.
diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y
index 92b54dd..efdc833 100644
--- a/src/pl/plpgsql/src/gram.y
+++ b/src/pl/plpgsql/src/gram.y
@@ -76,6 +76,8 @@ static	PLpgSQL_expr	*read_sql_expression2(int until, int until2,
 			  int *endtoken);
 static	PLpgSQL_expr	*read_sql_stmt(const char *sqlstart);
 static	PLpgSQL_type	*read_datatype(int tok);
+static	int read_array_dims(int tok);
+static	int read_single_array_dim(int tok);
 static	PLpgSQL_stmt	*make_execsql_stmt(int firsttoken, int location);
 static	PLpgSQL_stmt_fetch *read_fetch_direction(void);
 static	void			 complete_direction(PLpgSQL_stmt_fetch *fetch,
@@ -2503,7 +2505,13 @@ read_datatype(int tok)
 			if (tok_is_keyword(tok, &yylval,
 			   K_TYPE, "type"))
 			{
-result = plpgsql_parse_wordtype(dtname);
+/*
+ * Now try to determine if there are any array indicators
+ * [] or [ICONST] follow type definition, and then use
+ * this information during type construction.
+ */
+const int arrndim = read_array_dims(tok);
+result = plpgsql_parse_wordtype(dtname, arrndim);
 if (result)
 	return result;
 			}
@@ -2527,7 +2535,8 @@ read_datatype(int tok)
 			if (tok_is_keyword(tok, &yylval,
 			   K_TYPE, "type"))
 			{
-result = plpgsql_parse_cwordtype(dtnames);
+const int arrndim = read_array_dims(tok);
+result = plpgsql_parse_cwordtype(dtnames, arrndim);
 if (result)
 	return result;
 			}
@@ -2582,6 +2591,67 @@ read_datatype(int tok)
 	return result;
 }
 
+/*
+ * Parse all items of array declaration: ('[' ICONST? ']')*
+ */
+static int
+read_array_dims(int tok) {
+	int n = 0;
+	int k;
+
+	while (true) {
+		tok = yylex();
+		k = read_single_array_dim(tok);
+		switch (k) {
+			case 1: /* ok */
+n += 1;
+break;
+
+			case 0: /* probably wrong syntax, but don't panic here */
+plpgsql_push_back_token(tok);
+return 0;
+
+			case -1: /* end of array def */
+plpgsql_push_back_token(tok);
+return n;
+
+			default:
+Assert(false);
+		}
+	}
+}
+
+/* Parse single item of array declaration: '[' ']' or '[' ICONST ']' */
+static int
+read_single_array_dim(int tok) {
+	int tok2;
+	if (tok == '[') {
+		tok = yylex();
+		if (tok == ']') {
+			/* variant [] */
+			return 1;
+		}
+		else {
+			/* variant [ICONST] */
+			if (tok == ICONST) {
+/* valid only if ICONST > 0 */
+if (yylval.ival <= 0)
+	return 0;
+
+tok2 = yylex();
+if (tok2 == ']')
+	return 1;
+else
+	plpgsql_push_back_token(tok2);
+			}
+			return 0;
+		}
+	}
+	else
+		return -1;
+}
+
+
 static PLpgSQL_stmt *
 make_execsql_stmt(int firsttoken, int location)
 {
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index d22fa68..826eabe 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -1588,6 +1588,35 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3,
 
 
 /* --
+ * build_arraydatatype	Try to construct array type from a type
+ *
+ * Function is used internally by plpgsql_parse_wordtype and
+ * plpgsql_parse_cwordtype to handle syntax word%TYPE[].
+ * 
+ * Returns datatype struct, or NULL if no match found for word.
+ * --
+ */
+static PLpgSQL_type *
+build_arraydatatype(Oid typoid, int32 typmod, Oid collation) {
+	Oid elem_typoid;
+
+	if (type_is_array(typoid))
+		/* copy array type */
+		return plpgsql_build_datatype(typoid, typmod, collation);
+	else
+	{
+		/* get array type for given scalar type */
+		elem_typoid = get_array_type(typoid);
+		if (elem_typoid != InvalidOid) 
+			return plpgsql_build_datatype(elem_typoid, typmod, collation);
+		else
+			/* typoid can't be element type of an array */
+			return NULL;
+	}
+}
+
+
+/* --
  * plpgsql_parse_wordtype	The scanner found word%TYPE. word can be
  *a variable name or a basetype.
  *
@@ -1595,7 +1624,7 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3,
  * --
  */
 PLpgSQL_type *
-plpgsql_parse_wordtype(char *ident)
+plpgsql_parse_wordtype(char *ident, int arrndim)
 {
 	PLpgSQL_type *dtype;
 	PLpgSQL_nsitem *nse;
@@ -1613,7 +1642,12 @@ plpgsql_parse_wordtype(char *ident)
 		switch (nse->itemtype)
 		{
 			case PLPGSQL_NSTYPE_VAR:
-return ((PLpgSQL_var *) (plpgsql_Datums[nse->itemno]))->datatype;
+dtype = ((PLpgSQL_var *) (plpgsql_Datums[nse->itemno]))->datatype;
+if (arrndim <= 0)
+	return dtype;
+else
+	return build_arraydatatype(dtype->typoid,
+-1, plpgsql_curr_compile->fn_input_collation);
 
 /* XXX perhaps al

Re: [HACKERS] two index bitmap scan of a big table & hash_seq_search

2011-08-20 Thread Tom Lane
"Sergey E. Koposov"  writes:
> Yes, it turns out that the problem was in lossify'ing the bitmap to 
> intensely.

Yeah, I had just been coming to the same conclusion.  Your table has
about 134M pages, and if the planner estimate of 62M rows was right
(and there's no reason it shouldn't be pretty close on that) then
we're talking about a bitmap that's going to contain about one bit
set in each of about half of the pages.  The page structures are
50-some bytes apiece so a non-lossy representation would run to
3-plus GB, well beyond your work_mem limit.  So it would fill up
to work_mem and then start lossifying pages ... one at a time.
I had suspected that there might be a performance issue there,
as per the comment at line 954, but we hadn't actually seen it
reported from the field before.

> After that I changed the check in tbm_lossify()
> from:
>  if (tbm->nentries <= tbm->maxentries)
> to:
>  if (tbm->nentries <= (0.8*tbm->maxentries))
> which allowed the query finish in 75 seconds (comparing to 3hours).

I was about to propose using tbm->maxentries/2, which is in the same
spirit but a tad cheaper to calculate.

I think that we also need to consider the possibility that tbm_lossify
finishes its loop without ever getting under maxentries --- that could
only happen with very large tables and very small work_mem, but it could
happen.  If it did, then all subsequent operations would keep on calling
tbm_lossify, and it would keep scanning the entire hashtable and
probably not accomplishing much, and taking forever to do it.  Unless
somebody has a better idea, what I think we should do then is just
artificially inflate maxentries --- that is, accept that we are not
going to fit in the originally requested work_mem, and we might as well
set a more realistic goal.

> Do you think that this should be fixed ?

Yes, definitely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two index bitmap scan of a big table & hash_seq_search

2011-08-20 Thread Sergey E. Koposov

On Fri, 19 Aug 2011, Tom Lane wrote:

I might be reading too much into the mention of tbm_lossify, but
I wonder if the problem is repeated invocations of tbm_lossify()
as the bitmap gets larger.  Maybe that function needs to be more
aggressive about how much information it deletes per call.

Thanks for idea, Tom.

Yes, it turns out that the problem was in lossify'ing the bitmap to 
intensely. I've put the elogs around the lossification in tbm_add_tuples()

  if (tbm->nentries > tbm->maxentries)
{
elog(WARNING, "lossifying %d %d", tbm->nentries, 
tbm->maxentries);

tbm_lossify(tbm);
elog(WARNING, "lossified %d", tbm->nentries);
}

And I saw in my log 
koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING:  lossifying 13421773 13421772

koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING:  lossified 13421772
issued with a rate of 2 per second. E.g. it lossifies one page per 
lossify call (and does a lot of hash_seq_search operations too) ...


After that I changed the check in tbm_lossify()
from:
if (tbm->nentries <= tbm->maxentries)
to:
if (tbm->nentries <= (0.8*tbm->maxentries))

which allowed the query finish in 75 seconds (comparing to 3hours).

I'm not entirely sure that my fix of the tbm_lossify function is a proper 
one, but it looks all right.

Do you think that this should be fixed ?

Sergey

***
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge/Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: m...@sai.msu.ru

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?

2011-08-20 Thread Peter Geoghegan
On 20 August 2011 15:34, Bruce Momjian  wrote:
> Actually, using UNION instead of UNION ALL does prevent some infinite
> loops:

While that is worth pointing out, it cannot be recommended as a way of
preventing infinite recursion; after all, all 5 WITH RECURSIVE
examples in the docs use UNION ALL. It's just a different way of
specifying a terminating condition that isn't likely to be applicable
to more complicated rCTEs.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?

2011-08-20 Thread Bruce Momjian
Greg Stark wrote:
> On Mon, Aug 15, 2011 at 9:31 PM, Tom Lane  wrote:
> > ... and that would be a seriously bad API. ?There are not SUSET
> > restrictions on other resources such as work_mem. ?Why do we need
> > one for this?
> 
> I think a better analogy would be imposing a maximum number of rows a
> query can output. That might be a sane thing to have for some
> circumstances but it's not useful in general.
> 
> Consider for instance my favourite recursive query application,
> displaying the lock dependency graph for pg_locks. What arbitrary
> maximum number of locks would you like to impose at which the query
> should error out?
> 
> There is a situation though that I think is motivating this though
> where it would be nice to detect a problem: when the query is such
> that it *can't* produce a record because there's an infinite loop
> before the first record. Ideally you want some way to detect that
> you've recursed and haven't changed anything that could lead to a
> change in the recursion condition. But that seems like a pretty hard
> thing to detect, probably impossible.

Actually, using UNION instead of UNION ALL does prevent some infinite
loops:

WITH RECURSIVE source AS (
SELECT 'Hello'
UNION
SELECT 'Hello' FROM source
)
SELECT * FROM source;

Change that to UNION ALL and you have an infinite loop.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2011-08-20 Thread Bruce Momjian
Peter Eisentraut wrote:
> On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
> > In fact, now that I think about it, setting the transaction snapshot
> > from a utility statement would be functionally useful because then you
> > could take locks beforehand.
> 
> Another issue is that in some client interfaces, BEGIN and COMMIT are
> hidden behind API calls, which cannot easily be changed or equipped with
> new parameters.  So in order to have this functionality available
> through those interfaces, we'd need a separately callable command.

How do they set a transaction to SERIALIZABLE?  Seem the same syntax
should be used here.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 5:06 AM, Gokulakannan Somasundaram
 wrote:
> By your argument, we can say that no-one will create a index with a function
> like random(), time(), date(), broken operators etc. Its hard to imagine a
> index in which a a user will only want to insert and never select on it.

The whole point of this optimization is to make index access cheaper,
not more expensive.  You seem convinced it's done the opposite, but
you haven't offered much evidence (or any test results) to support
that proposition.

> Even C++ provides std::map infrastructure for objects, where the user owns
> the responsibility of writing the operator< correctly. Other databases do
> the same. Even going one step ahead, we are already going back to such
> indexes, if there is unique constraint/ referential integrity constraints.
> But with all such caveats, it was decided we should not allow covering
> indexes, as they require going back to the index for updates/deletes.

What we decided NOT to do is put xmin/xmax/cid into the index tuple,
for precisely the reasons you mention.  That would be catastrophic
both for write operations to the table, and for the size of the index.
 This approach is appealing precisely because a single visibility map
page can cover a very large chunk of the heap.  Is there a potential
problem buried somewhere in there?  Maybe.  But if there is, I'm
pretty sure it's something far less obvious than what you seem to be
worried about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram
 wrote:
> by your argument, if WALInserLock is held for 't' seconds, you should
> definitely be holding visibility map lock for more than time frame 't'.

Nope, that's not how it works.  Perhaps you should read the code.
See, e.g., heap_update().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:48 AM, Gokulakannan Somasundaram
 wrote:
> a) First of all, it(Visibility Map) should have definitely affected the
> scalability of postgres in scenarios where in updates occur during a time
> batch window. May be the increase in speed of vacuums negate that effect.

I think that you have switched gears here and are in this paragraph
talking about the 8.4-era visibility map changes rather than my recent
work.  There is zero evidence that those changes caused any
performance problem.  I've spent a large chunk of the last four months
looking for scalability problems and I haven't come across any
evidence that this is an issue.  If you think it is, let's have a test
case.

> b) Second, currently the index scans  don't touch the visibility map and in
> future they are going to acquire share lock on that. This should increase
> the contention.

Maybe.  Of course, we're only talking about cases where the index-only
scan optimization is in use (which is not all of them).  And even
then, if bringing in the heap pages would have caused buffer evictions
and the index-only scan avoids that, then you're trading contention
for exclusive locks on the BufFreelistLock and buf mapping locks for
shared-lock contention on the visibility map page.  Furthermore, the
latter will (except in very large relations) only need to be
shared-locked and pinned once per scan, whereas you might easily have
a case where each index probe forces replacement of a heap page.

What I am slightly worried about is that our machinery for taking and
releasing buffer pins is going to become a scalability bottleneck at
some point, and certainly very hot pages like root index pages and
visibility map pages could become hot-spots for such contention.  But
the experiments I've done so far suggest that there are other more
serious bottlenecks that have to be addressed first.  If it does rise
to the list, we'll find a way to fix it, but I think skipping the
index-only scan optimization is going to be a cure worse than the
disease.

> c) Your statement : "The contention on the VM buffer would be just as bad
> whether you hold the heap page lock at the same time or not."
> I am talking about the contention time frame of the heap page. It will be
> increased Consider my statement in conjunction with the scenario 2.

Sure, but here again, what is your evidence that this actually
matters?  It's not increasing by very much.

> d) In addition, currently there is no WAL Logging, while the bit is cleared,
> which would not be the case in future and hence the exclusive lock held on
> the visibility map is going to be held for a longer time.

This is false and has been false since the visibility map was first implemented.

> You might definitely see some performance improvement, if you are testing
> this in anything less than 4 cores. Bump up the core count and processor
> count and check whether this affects the load-throughput curve.

I'm fairly certain I already did that experiment, on a 32-core
machine, but since the patch you're worried about went in two months
ago, I'm a little fuzzy on the details.  Maybe you should test it out
yourself

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:48 PM, Gokulakannan Somasundaram <
gokul...@gmail.com> wrote:

>
> The above could already happen in 8.4, where the visibility map was
>> introduced. The contention on the VM buffer would be just as bad whether you
>> hold the heap page lock at the same time or not. I have not heard any
>> complaints of contention on VM buffers.
>>
>> --
>>  Heikki Linnakangas
>>
>>
>> a) First of all, it(Visibility Map) should have definitely affected the
> scalability of postgres in scenarios where in updates occur during a time
> batch window. May be the increase in speed of vacuums negate that effect.
> b) Second, currently the index scans  don't touch the visibility map and in
> future they are going to acquire share lock on that. This should increase
> the contention.
> c) Your statement : "The contention on the VM buffer would be just as bad
> whether you hold the heap page lock at the same time or not."
> I am talking about the contention time frame of the heap page. It will be
> increased Consider my statement in conjunction with the scenario 2.
> d) In addition, currently there is no WAL Logging, while the bit is
> cleared, which would not be the case in future and hence the exclusive lock
> held on the visibility map is going to be held for a longer time.
>
> You might definitely see some performance improvement, if you are testing
> this in anything less than 4 cores. Bump up the core count and processor
> count and check whether this affects the load-throughput curve.
>
>
By your argument, we can say that no-one will create a index with a function
like random(), time(), date(), broken operators etc. Its hard to imagine a
index in which a a user will only want to insert and never select on it.
Even C++ provides std::map infrastructure for objects, where the user owns
the responsibility of writing the operator< correctly. Other databases do
the same. Even going one step ahead, we are already going back to such
indexes, if there is unique constraint/ referential integrity constraints.
But with all such caveats, it was decided we should not allow covering
indexes, as they require going back to the index for updates/deletes.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
> Hmm, you have a point.  If 100 backends simultaneously write to 100
> different pages, and all of those pages are all-visible, then it's
> possible that they could end up fighting over the buffer content lock
> on the visibility map page.  But why would you expect that to matter?
> In a heavily updated table, the proportion of visibility map bits that
> are set figures to be quite low, since they're only set during VACUUM.
>  To have 100 backends simultaneously pick different pages to write
> each of which is all-visible seems really unlucky.   Even if it does
> happen from time to time, I suspect the effects would be largely
> masked by WALInsertLock contention.  The visibility map content lock
> is only taken very briefly, whereas the operations protected by
> WALInsertLock are much more complex.
>

by your argument, if WALInserLock is held for 't' seconds, you should
definitely be holding visibility map lock for more than time frame 't' . So
the index scans have to wait for acquiring the lock on visibility map to
check visibility. What we are trading off here is Synchronization Vs I/O.
Should we lose the scalability for performance??

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
> The above could already happen in 8.4, where the visibility map was
> introduced. The contention on the VM buffer would be just as bad whether you
> hold the heap page lock at the same time or not. I have not heard any
> complaints of contention on VM buffers.
>
> --
>  Heikki Linnakangas
>
>
> a) First of all, it(Visibility Map) should have definitely affected the
scalability of postgres in scenarios where in updates occur during a time
batch window. May be the increase in speed of vacuums negate that effect.
b) Second, currently the index scans  don't touch the visibility map and in
future they are going to acquire share lock on that. This should increase
the contention.
c) Your statement : "The contention on the VM buffer would be just as bad
whether you hold the heap page lock at the same time or not."
I am talking about the contention time frame of the heap page. It will be
increased Consider my statement in conjunction with the scenario 2.
d) In addition, currently there is no WAL Logging, while the bit is cleared,
which would not be the case in future and hence the exclusive lock held on
the visibility map is going to be held for a longer time.

You might definitely see some performance improvement, if you are testing
this in anything less than 4 cores. Bump up the core count and processor
count and check whether this affects the load-throughput curve.

Thanks,
Gokul.