Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
 The above could already happen in 8.4, where the visibility map was
 introduced. The contention on the VM buffer would be just as bad whether you
 hold the heap page lock at the same time or not. I have not heard any
 complaints of contention on VM buffers.

 --
  Heikki Linnakangas


 a) First of all, it(Visibility Map) should have definitely affected the
scalability of postgres in scenarios where in updates occur during a time
batch window. May be the increase in speed of vacuums negate that effect.
b) Second, currently the index scans  don't touch the visibility map and in
future they are going to acquire share lock on that. This should increase
the contention.
c) Your statement : The contention on the VM buffer would be just as bad
whether you hold the heap page lock at the same time or not.
I am talking about the contention time frame of the heap page. It will be
increased Consider my statement in conjunction with the scenario 2.
d) In addition, currently there is no WAL Logging, while the bit is cleared,
which would not be the case in future and hence the exclusive lock held on
the visibility map is going to be held for a longer time.

You might definitely see some performance improvement, if you are testing
this in anything less than 4 cores. Bump up the core count and processor
count and check whether this affects the load-throughput curve.

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
 Hmm, you have a point.  If 100 backends simultaneously write to 100
 different pages, and all of those pages are all-visible, then it's
 possible that they could end up fighting over the buffer content lock
 on the visibility map page.  But why would you expect that to matter?
 In a heavily updated table, the proportion of visibility map bits that
 are set figures to be quite low, since they're only set during VACUUM.
  To have 100 backends simultaneously pick different pages to write
 each of which is all-visible seems really unlucky.   Even if it does
 happen from time to time, I suspect the effects would be largely
 masked by WALInsertLock contention.  The visibility map content lock
 is only taken very briefly, whereas the operations protected by
 WALInsertLock are much more complex.


by your argument, if WALInserLock is held for 't' seconds, you should
definitely be holding visibility map lock for more than time frame 't' . So
the index scans have to wait for acquiring the lock on visibility map to
check visibility. What we are trading off here is Synchronization Vs I/O.
Should we lose the scalability for performance??

Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:48 PM, Gokulakannan Somasundaram 
gokul...@gmail.com wrote:


 The above could already happen in 8.4, where the visibility map was
 introduced. The contention on the VM buffer would be just as bad whether you
 hold the heap page lock at the same time or not. I have not heard any
 complaints of contention on VM buffers.

 --
  Heikki Linnakangas


 a) First of all, it(Visibility Map) should have definitely affected the
 scalability of postgres in scenarios where in updates occur during a time
 batch window. May be the increase in speed of vacuums negate that effect.
 b) Second, currently the index scans  don't touch the visibility map and in
 future they are going to acquire share lock on that. This should increase
 the contention.
 c) Your statement : The contention on the VM buffer would be just as bad
 whether you hold the heap page lock at the same time or not.
 I am talking about the contention time frame of the heap page. It will be
 increased Consider my statement in conjunction with the scenario 2.
 d) In addition, currently there is no WAL Logging, while the bit is
 cleared, which would not be the case in future and hence the exclusive lock
 held on the visibility map is going to be held for a longer time.

 You might definitely see some performance improvement, if you are testing
 this in anything less than 4 cores. Bump up the core count and processor
 count and check whether this affects the load-throughput curve.


By your argument, we can say that no-one will create a index with a function
like random(), time(), date(), broken operators etc. Its hard to imagine a
index in which a a user will only want to insert and never select on it.
Even C++ provides std::map infrastructure for objects, where the user owns
the responsibility of writing the operator correctly. Other databases do
the same. Even going one step ahead, we are already going back to such
indexes, if there is unique constraint/ referential integrity constraints.
But with all such caveats, it was decided we should not allow covering
indexes, as they require going back to the index for updates/deletes.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:48 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 a) First of all, it(Visibility Map) should have definitely affected the
 scalability of postgres in scenarios where in updates occur during a time
 batch window. May be the increase in speed of vacuums negate that effect.

I think that you have switched gears here and are in this paragraph
talking about the 8.4-era visibility map changes rather than my recent
work.  There is zero evidence that those changes caused any
performance problem.  I've spent a large chunk of the last four months
looking for scalability problems and I haven't come across any
evidence that this is an issue.  If you think it is, let's have a test
case.

 b) Second, currently the index scans  don't touch the visibility map and in
 future they are going to acquire share lock on that. This should increase
 the contention.

Maybe.  Of course, we're only talking about cases where the index-only
scan optimization is in use (which is not all of them).  And even
then, if bringing in the heap pages would have caused buffer evictions
and the index-only scan avoids that, then you're trading contention
for exclusive locks on the BufFreelistLock and buf mapping locks for
shared-lock contention on the visibility map page.  Furthermore, the
latter will (except in very large relations) only need to be
shared-locked and pinned once per scan, whereas you might easily have
a case where each index probe forces replacement of a heap page.

What I am slightly worried about is that our machinery for taking and
releasing buffer pins is going to become a scalability bottleneck at
some point, and certainly very hot pages like root index pages and
visibility map pages could become hot-spots for such contention.  But
the experiments I've done so far suggest that there are other more
serious bottlenecks that have to be addressed first.  If it does rise
to the list, we'll find a way to fix it, but I think skipping the
index-only scan optimization is going to be a cure worse than the
disease.

 c) Your statement : The contention on the VM buffer would be just as bad
 whether you hold the heap page lock at the same time or not.
 I am talking about the contention time frame of the heap page. It will be
 increased Consider my statement in conjunction with the scenario 2.

Sure, but here again, what is your evidence that this actually
matters?  It's not increasing by very much.

 d) In addition, currently there is no WAL Logging, while the bit is cleared,
 which would not be the case in future and hence the exclusive lock held on
 the visibility map is going to be held for a longer time.

This is false and has been false since the visibility map was first implemented.

 You might definitely see some performance improvement, if you are testing
 this in anything less than 4 cores. Bump up the core count and processor
 count and check whether this affects the load-throughput curve.

I'm fairly certain I already did that experiment, on a 32-core
machine, but since the patch you're worried about went in two months
ago, I'm a little fuzzy on the details.  Maybe you should test it out
yourself

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 by your argument, if WALInserLock is held for 't' seconds, you should
 definitely be holding visibility map lock for more than time frame 't'.

Nope, that's not how it works.  Perhaps you should read the code.
See, e.g., heap_update().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Robert Haas
On Sat, Aug 20, 2011 at 5:06 AM, Gokulakannan Somasundaram
gokul...@gmail.com wrote:
 By your argument, we can say that no-one will create a index with a function
 like random(), time(), date(), broken operators etc. Its hard to imagine a
 index in which a a user will only want to insert and never select on it.

The whole point of this optimization is to make index access cheaper,
not more expensive.  You seem convinced it's done the opposite, but
you haven't offered much evidence (or any test results) to support
that proposition.

 Even C++ provides std::map infrastructure for objects, where the user owns
 the responsibility of writing the operator correctly. Other databases do
 the same. Even going one step ahead, we are already going back to such
 indexes, if there is unique constraint/ referential integrity constraints.
 But with all such caveats, it was decided we should not allow covering
 indexes, as they require going back to the index for updates/deletes.

What we decided NOT to do is put xmin/xmax/cid into the index tuple,
for precisely the reasons you mention.  That would be catastrophic
both for write operations to the table, and for the size of the index.
 This approach is appealing precisely because a single visibility map
page can cover a very large chunk of the heap.  Is there a potential
problem buried somewhere in there?  Maybe.  But if there is, I'm
pretty sure it's something far less obvious than what you seem to be
worried about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] synchronized snapshots

2011-08-20 Thread Bruce Momjian
Peter Eisentraut wrote:
 On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote:
  In fact, now that I think about it, setting the transaction snapshot
  from a utility statement would be functionally useful because then you
  could take locks beforehand.
 
 Another issue is that in some client interfaces, BEGIN and COMMIT are
 hidden behind API calls, which cannot easily be changed or equipped with
 new parameters.  So in order to have this functionality available
 through those interfaces, we'd need a separately callable command.

How do they set a transaction to SERIALIZABLE?  Seem the same syntax
should be used here.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?

2011-08-20 Thread Bruce Momjian
Greg Stark wrote:
 On Mon, Aug 15, 2011 at 9:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  ... and that would be a seriously bad API. ?There are not SUSET
  restrictions on other resources such as work_mem. ?Why do we need
  one for this?
 
 I think a better analogy would be imposing a maximum number of rows a
 query can output. That might be a sane thing to have for some
 circumstances but it's not useful in general.
 
 Consider for instance my favourite recursive query application,
 displaying the lock dependency graph for pg_locks. What arbitrary
 maximum number of locks would you like to impose at which the query
 should error out?
 
 There is a situation though that I think is motivating this though
 where it would be nice to detect a problem: when the query is such
 that it *can't* produce a record because there's an infinite loop
 before the first record. Ideally you want some way to detect that
 you've recursed and haven't changed anything that could lead to a
 change in the recursion condition. But that seems like a pretty hard
 thing to detect, probably impossible.

Actually, using UNION instead of UNION ALL does prevent some infinite
loops:

WITH RECURSIVE source AS (
SELECT 'Hello'
UNION
SELECT 'Hello' FROM source
)
SELECT * FROM source;

Change that to UNION ALL and you have an infinite loop.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?

2011-08-20 Thread Peter Geoghegan
On 20 August 2011 15:34, Bruce Momjian br...@momjian.us wrote:
 Actually, using UNION instead of UNION ALL does prevent some infinite
 loops:

While that is worth pointing out, it cannot be recommended as a way of
preventing infinite recursion; after all, all 5 WITH RECURSIVE
examples in the docs use UNION ALL. It's just a different way of
specifying a terminating condition that isn't likely to be applicable
to more complicated rCTEs.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two index bitmap scan of a big table hash_seq_search

2011-08-20 Thread Sergey E. Koposov

On Fri, 19 Aug 2011, Tom Lane wrote:

I might be reading too much into the mention of tbm_lossify, but
I wonder if the problem is repeated invocations of tbm_lossify()
as the bitmap gets larger.  Maybe that function needs to be more
aggressive about how much information it deletes per call.

Thanks for idea, Tom.

Yes, it turns out that the problem was in lossify'ing the bitmap to 
intensely. I've put the elogs around the lossification in tbm_add_tuples()

  if (tbm-nentries  tbm-maxentries)
{
elog(WARNING, lossifying %d %d, tbm-nentries, 
tbm-maxentries);

tbm_lossify(tbm);
elog(WARNING, lossified %d, tbm-nentries);
}

And I saw in my log 
koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING:  lossifying 13421773 13421772

koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING:  lossified 13421772
issued with a rate of 2 per second. E.g. it lossifies one page per 
lossify call (and does a lot of hash_seq_search operations too) ...


After that I changed the check in tbm_lossify()
from:
if (tbm-nentries = tbm-maxentries)
to:
if (tbm-nentries = (0.8*tbm-maxentries))

which allowed the query finish in 75 seconds (comparing to 3hours).

I'm not entirely sure that my fix of the tbm_lossify function is a proper 
one, but it looks all right.

Do you think that this should be fixed ?

Sergey

***
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge/Sternberg Astronomical Institute
Web: http://lnfm1.sai.msu.ru/~math
E-mail: m...@sai.msu.ru

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] two index bitmap scan of a big table hash_seq_search

2011-08-20 Thread Tom Lane
Sergey E. Koposov m...@sai.msu.ru writes:
 Yes, it turns out that the problem was in lossify'ing the bitmap to 
 intensely.

Yeah, I had just been coming to the same conclusion.  Your table has
about 134M pages, and if the planner estimate of 62M rows was right
(and there's no reason it shouldn't be pretty close on that) then
we're talking about a bitmap that's going to contain about one bit
set in each of about half of the pages.  The page structures are
50-some bytes apiece so a non-lossy representation would run to
3-plus GB, well beyond your work_mem limit.  So it would fill up
to work_mem and then start lossifying pages ... one at a time.
I had suspected that there might be a performance issue there,
as per the comment at line 954, but we hadn't actually seen it
reported from the field before.

 After that I changed the check in tbm_lossify()
 from:
  if (tbm-nentries = tbm-maxentries)
 to:
  if (tbm-nentries = (0.8*tbm-maxentries))
 which allowed the query finish in 75 seconds (comparing to 3hours).

I was about to propose using tbm-maxentries/2, which is in the same
spirit but a tad cheaper to calculate.

I think that we also need to consider the possibility that tbm_lossify
finishes its loop without ever getting under maxentries --- that could
only happen with very large tables and very small work_mem, but it could
happen.  If it did, then all subsequent operations would keep on calling
tbm_lossify, and it would keep scanning the entire hashtable and
probably not accomplishing much, and taking forever to do it.  Unless
somebody has a better idea, what I think we should do then is just
artificially inflate maxentries --- that is, accept that we are not
going to fit in the originally requested work_mem, and we might as well
set a more realistic goal.

 Do you think that this should be fixed ?

Yes, definitely.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PL/pgSQL] %TYPE and array declaration - patch

2011-08-20 Thread Wojciech Muła
On Sun, 7 Aug 2011 14:57:36 +0200 Wojciech Muła
wojciech_m...@poczta.onet.pl wrote:

 Hi all, does anybody work on this TODO item?
 http://wiki.postgresql.org/wiki/Todo#PL.2FpgSQL
 
 I didn't find any related posting/bug report.

Hi, I've prepared simple patch, please review.

Since exact array defintion isn't needed anywhere, code
detects only if %TYPE is followed by tokens matching
regexp ('[' ICONST ']')+. This information is used during
type construction.

w.
diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y
index 92b54dd..efdc833 100644
--- a/src/pl/plpgsql/src/gram.y
+++ b/src/pl/plpgsql/src/gram.y
@@ -76,6 +76,8 @@ static	PLpgSQL_expr	*read_sql_expression2(int until, int until2,
 			  int *endtoken);
 static	PLpgSQL_expr	*read_sql_stmt(const char *sqlstart);
 static	PLpgSQL_type	*read_datatype(int tok);
+static	int read_array_dims(int tok);
+static	int read_single_array_dim(int tok);
 static	PLpgSQL_stmt	*make_execsql_stmt(int firsttoken, int location);
 static	PLpgSQL_stmt_fetch *read_fetch_direction(void);
 static	void			 complete_direction(PLpgSQL_stmt_fetch *fetch,
@@ -2503,7 +2505,13 @@ read_datatype(int tok)
 			if (tok_is_keyword(tok, yylval,
 			   K_TYPE, type))
 			{
-result = plpgsql_parse_wordtype(dtname);
+/*
+ * Now try to determine if there are any array indicators
+ * [] or [ICONST] follow type definition, and then use
+ * this information during type construction.
+ */
+const int arrndim = read_array_dims(tok);
+result = plpgsql_parse_wordtype(dtname, arrndim);
 if (result)
 	return result;
 			}
@@ -2527,7 +2535,8 @@ read_datatype(int tok)
 			if (tok_is_keyword(tok, yylval,
 			   K_TYPE, type))
 			{
-result = plpgsql_parse_cwordtype(dtnames);
+const int arrndim = read_array_dims(tok);
+result = plpgsql_parse_cwordtype(dtnames, arrndim);
 if (result)
 	return result;
 			}
@@ -2582,6 +2591,67 @@ read_datatype(int tok)
 	return result;
 }
 
+/*
+ * Parse all items of array declaration: ('[' ICONST? ']')*
+ */
+static int
+read_array_dims(int tok) {
+	int n = 0;
+	int k;
+
+	while (true) {
+		tok = yylex();
+		k = read_single_array_dim(tok);
+		switch (k) {
+			case 1: /* ok */
+n += 1;
+break;
+
+			case 0: /* probably wrong syntax, but don't panic here */
+plpgsql_push_back_token(tok);
+return 0;
+
+			case -1: /* end of array def */
+plpgsql_push_back_token(tok);
+return n;
+
+			default:
+Assert(false);
+		}
+	}
+}
+
+/* Parse single item of array declaration: '[' ']' or '[' ICONST ']' */
+static int
+read_single_array_dim(int tok) {
+	int tok2;
+	if (tok == '[') {
+		tok = yylex();
+		if (tok == ']') {
+			/* variant [] */
+			return 1;
+		}
+		else {
+			/* variant [ICONST] */
+			if (tok == ICONST) {
+/* valid only if ICONST  0 */
+if (yylval.ival = 0)
+	return 0;
+
+tok2 = yylex();
+if (tok2 == ']')
+	return 1;
+else
+	plpgsql_push_back_token(tok2);
+			}
+			return 0;
+		}
+	}
+	else
+		return -1;
+}
+
+
 static PLpgSQL_stmt *
 make_execsql_stmt(int firsttoken, int location)
 {
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index d22fa68..826eabe 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -1588,6 +1588,35 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3,
 
 
 /* --
+ * build_arraydatatype	Try to construct array type from a type
+ *
+ * Function is used internally by plpgsql_parse_wordtype and
+ * plpgsql_parse_cwordtype to handle syntax word%TYPE[].
+ * 
+ * Returns datatype struct, or NULL if no match found for word.
+ * --
+ */
+static PLpgSQL_type *
+build_arraydatatype(Oid typoid, int32 typmod, Oid collation) {
+	Oid elem_typoid;
+
+	if (type_is_array(typoid))
+		/* copy array type */
+		return plpgsql_build_datatype(typoid, typmod, collation);
+	else
+	{
+		/* get array type for given scalar type */
+		elem_typoid = get_array_type(typoid);
+		if (elem_typoid != InvalidOid) 
+			return plpgsql_build_datatype(elem_typoid, typmod, collation);
+		else
+			/* typoid can't be element type of an array */
+			return NULL;
+	}
+}
+
+
+/* --
  * plpgsql_parse_wordtype	The scanner found word%TYPE. word can be
  *a variable name or a basetype.
  *
@@ -1595,7 +1624,7 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3,
  * --
  */
 PLpgSQL_type *
-plpgsql_parse_wordtype(char *ident)
+plpgsql_parse_wordtype(char *ident, int arrndim)
 {
 	PLpgSQL_type *dtype;
 	PLpgSQL_nsitem *nse;
@@ -1613,7 +1642,12 @@ plpgsql_parse_wordtype(char *ident)
 		switch (nse-itemtype)
 		{
 			case PLPGSQL_NSTYPE_VAR:
-return ((PLpgSQL_var *) (plpgsql_Datums[nse-itemno]))-datatype;
+dtype = ((PLpgSQL_var *) (plpgsql_Datums[nse-itemno]))-datatype;
+if (arrndim = 0)
+	return dtype;
+else
+	return build_arraydatatype(dtype-typoid,
+-1, plpgsql_curr_compile-fn_input_collation);
 
 /* XXX 

Re: [HACKERS] CONGRATULATIONS, David!

2011-08-20 Thread Lou Picciano
Congratulations, David Fetter - on his new arrival! It's a big day! 


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
   I think that you have switched gears here and are in this paragraph

 talking about the 8.4-era visibility map changes rather than my recent
 work.  There is zero evidence that those changes caused any
 performance problem.  I've spent a large chunk of the last four months
 looking for scalability problems and I haven't come across any
 evidence that this is an issue.  If you think it is, let's have a test
 case.


Consider the TPC-C benchmark. Currently on a four core box, Oracle does
29 transactions per minute. Say we are doing something around half of
this. So a page should get quickly filled. If a vacuum runs and the
transactions are not touched by the MakePayment transaction, then it will be
marked as AllVisible. When the MakePayment transaction updates, it should
clear that bit. If we test it out, with too little warehouses, we may not
see the effect. Of Course the PGPROC infrastructure for generating
transaction ids might be the biggest culprit, but if you ignore that this
should be visible.

Maybe.  Of course, we're only talking about cases where the index-only
 scan optimization is in use (which is not all of them).

But are you saying that, the optimization of looking at visibility map will
happen only for Index-only scans and will not use the visibility map
infrastructure for the normal index scans? That's definitely a good idea and
improvement.

 d) In addition, currently there is no WAL Logging, while the bit is
cleared,
 which would not be the case in future and hence the exclusive lock held
on
 the visibility map is going to be held for a longer time.

 This is false and has been false since the visibility map was first
implemented.

I can't understand this. If you are not doing this, then it would cause
consistency issues. Are you saying, we have a crash safe visibility map, but
you don't follow log the change before changing the contents/ WAL
principle. If so, please explain in detail. If you are doing it in the
normal way, then you should be logging the changes before making the changes
to the buffer and during that timeframe, you should be holding the lock on
the buffer. Heikki specifically pointed out, that you have brought in the
WAL Logging of visibility map, within the critical section.

Heikki's comments, i am quoting:
 I believe Robert's changes to make the visibility map crash-safe covers
that. Clearing the bit in the visibility map now happens within the same
critical section as clearing the flag on the heap page and writing the WAL
record.

I would be able to respond to your other statements, once we are clear here.

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram

 gokul...@gmail.com wrote:
  by your argument, if WALInserLock is held for 't' seconds, you should
  definitely be holding visibility map lock for more than time frame 't'.

 Nope, that's not how it works.  Perhaps you should read the code.
 See, e.g., heap_update().

 --

OK. I took a look at the patch you have supplied in
http://postgresql.1045698.n5.nabble.com/crash-safe-visibility-map-take-five-td4377235.html
There is a code like this.

 {
 all_visible_cleared = true;
 PageClearAllVisible(BufferGetPage(buffer));
+visibilitymap_clear(relation,
+ItemPointerGetBlockNumber((heaptup-t_self)),
+vmbuffer);
 }

Here, you are not making an entry into the WAL. then there is an assumption
that the two bits will be in sync without any WAL entry. There is a chance
that the visibility map might be affected by partial page writes, where
clearing of a particular bit might not have been changed. And i am thinking
a lot of such issues. Can you just explain the background logic behind
ignoring the principle of WAL logging? What are the implemented principles,
that protect the Visibility map pages??

Thanks,
Gokul.


Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
  By your argument, we can say that no-one will create a index with a
 function
  like random(), time(), date(), broken operators etc. Its hard to imagine
 a
  index in which a a user will only want to insert and never select on it.

 The whole point of this optimization is to make index access cheaper,
 not more expensive.  You seem convinced it's done the opposite, but
 you haven't offered much evidence (or any test results) to support
 that proposition.


I hope you are referring to thick indexes/covering indexes/indexes with
snapshot. Why do you think its done the opposite? In fact all the other
databases like Oracle, SQL-Server, Sybase implement the indexes with
snapshot (that's the only one they support). It makes the index tuple larger
by 8 bytes, but avoids the heap-fetch. I think, i ran a couple of
benchmarks, when i submitted the patch and showed the improvement. The
trade-off in that case was simple. Size of the index Vs avoiding a disk I/O.
User still has the choice of creating indexes without snapshot( it was
provided as an optional index).



 What we decided NOT to do is put xmin/xmax/cid into the index tuple,
 for precisely the reasons you mention.  That would be catastrophic
 both for write operations to the table, and for the size of the index.


Why it would be catastrophic for write operations on table?? -please explain
me.
The trade-off in that case was simple. Size of the index Vs avoiding a disk
I/O. There was no catastrophic damage on the size of the index, as far as i
can see.

I made this point, because Heikki pointed out that since no-one is
complaining about some performance problem, and so we can assume that it
doesn't exist. But the thick index proposal was shot down on the context,
some one might create a index on a function like random(), time(). date() or
with broken operators, effectively meaning that you can insert into the
index and cannot select back. We are already doing unique checks and
referential integrity checks on that kind of indexes(which would all be
wrong), but still we should not be working in that area, to help user not
make that mistake of creating such indexes. So we should apply the same
principle for decision making here also.

Gokul.