Re: [HACKERS] the big picture for index-only scans
The above could already happen in 8.4, where the visibility map was introduced. The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I have not heard any complaints of contention on VM buffers. -- Heikki Linnakangas a) First of all, it(Visibility Map) should have definitely affected the scalability of postgres in scenarios where in updates occur during a time batch window. May be the increase in speed of vacuums negate that effect. b) Second, currently the index scans don't touch the visibility map and in future they are going to acquire share lock on that. This should increase the contention. c) Your statement : The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I am talking about the contention time frame of the heap page. It will be increased Consider my statement in conjunction with the scenario 2. d) In addition, currently there is no WAL Logging, while the bit is cleared, which would not be the case in future and hence the exclusive lock held on the visibility map is going to be held for a longer time. You might definitely see some performance improvement, if you are testing this in anything less than 4 cores. Bump up the core count and processor count and check whether this affects the load-throughput curve. Thanks, Gokul.
Re: [HACKERS] the big picture for index-only scans
Hmm, you have a point. If 100 backends simultaneously write to 100 different pages, and all of those pages are all-visible, then it's possible that they could end up fighting over the buffer content lock on the visibility map page. But why would you expect that to matter? In a heavily updated table, the proportion of visibility map bits that are set figures to be quite low, since they're only set during VACUUM. To have 100 backends simultaneously pick different pages to write each of which is all-visible seems really unlucky. Even if it does happen from time to time, I suspect the effects would be largely masked by WALInsertLock contention. The visibility map content lock is only taken very briefly, whereas the operations protected by WALInsertLock are much more complex. by your argument, if WALInserLock is held for 't' seconds, you should definitely be holding visibility map lock for more than time frame 't' . So the index scans have to wait for acquiring the lock on visibility map to check visibility. What we are trading off here is Synchronization Vs I/O. Should we lose the scalability for performance?? Gokul.
Re: [HACKERS] the big picture for index-only scans
On Sat, Aug 20, 2011 at 4:48 PM, Gokulakannan Somasundaram gokul...@gmail.com wrote: The above could already happen in 8.4, where the visibility map was introduced. The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I have not heard any complaints of contention on VM buffers. -- Heikki Linnakangas a) First of all, it(Visibility Map) should have definitely affected the scalability of postgres in scenarios where in updates occur during a time batch window. May be the increase in speed of vacuums negate that effect. b) Second, currently the index scans don't touch the visibility map and in future they are going to acquire share lock on that. This should increase the contention. c) Your statement : The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I am talking about the contention time frame of the heap page. It will be increased Consider my statement in conjunction with the scenario 2. d) In addition, currently there is no WAL Logging, while the bit is cleared, which would not be the case in future and hence the exclusive lock held on the visibility map is going to be held for a longer time. You might definitely see some performance improvement, if you are testing this in anything less than 4 cores. Bump up the core count and processor count and check whether this affects the load-throughput curve. By your argument, we can say that no-one will create a index with a function like random(), time(), date(), broken operators etc. Its hard to imagine a index in which a a user will only want to insert and never select on it. Even C++ provides std::map infrastructure for objects, where the user owns the responsibility of writing the operator correctly. Other databases do the same. Even going one step ahead, we are already going back to such indexes, if there is unique constraint/ referential integrity constraints. But with all such caveats, it was decided we should not allow covering indexes, as they require going back to the index for updates/deletes.
Re: [HACKERS] the big picture for index-only scans
On Sat, Aug 20, 2011 at 4:48 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: a) First of all, it(Visibility Map) should have definitely affected the scalability of postgres in scenarios where in updates occur during a time batch window. May be the increase in speed of vacuums negate that effect. I think that you have switched gears here and are in this paragraph talking about the 8.4-era visibility map changes rather than my recent work. There is zero evidence that those changes caused any performance problem. I've spent a large chunk of the last four months looking for scalability problems and I haven't come across any evidence that this is an issue. If you think it is, let's have a test case. b) Second, currently the index scans don't touch the visibility map and in future they are going to acquire share lock on that. This should increase the contention. Maybe. Of course, we're only talking about cases where the index-only scan optimization is in use (which is not all of them). And even then, if bringing in the heap pages would have caused buffer evictions and the index-only scan avoids that, then you're trading contention for exclusive locks on the BufFreelistLock and buf mapping locks for shared-lock contention on the visibility map page. Furthermore, the latter will (except in very large relations) only need to be shared-locked and pinned once per scan, whereas you might easily have a case where each index probe forces replacement of a heap page. What I am slightly worried about is that our machinery for taking and releasing buffer pins is going to become a scalability bottleneck at some point, and certainly very hot pages like root index pages and visibility map pages could become hot-spots for such contention. But the experiments I've done so far suggest that there are other more serious bottlenecks that have to be addressed first. If it does rise to the list, we'll find a way to fix it, but I think skipping the index-only scan optimization is going to be a cure worse than the disease. c) Your statement : The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I am talking about the contention time frame of the heap page. It will be increased Consider my statement in conjunction with the scenario 2. Sure, but here again, what is your evidence that this actually matters? It's not increasing by very much. d) In addition, currently there is no WAL Logging, while the bit is cleared, which would not be the case in future and hence the exclusive lock held on the visibility map is going to be held for a longer time. This is false and has been false since the visibility map was first implemented. You might definitely see some performance improvement, if you are testing this in anything less than 4 cores. Bump up the core count and processor count and check whether this affects the load-throughput curve. I'm fairly certain I already did that experiment, on a 32-core machine, but since the patch you're worried about went in two months ago, I'm a little fuzzy on the details. Maybe you should test it out yourself -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: by your argument, if WALInserLock is held for 't' seconds, you should definitely be holding visibility map lock for more than time frame 't'. Nope, that's not how it works. Perhaps you should read the code. See, e.g., heap_update(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] the big picture for index-only scans
On Sat, Aug 20, 2011 at 5:06 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: By your argument, we can say that no-one will create a index with a function like random(), time(), date(), broken operators etc. Its hard to imagine a index in which a a user will only want to insert and never select on it. The whole point of this optimization is to make index access cheaper, not more expensive. You seem convinced it's done the opposite, but you haven't offered much evidence (or any test results) to support that proposition. Even C++ provides std::map infrastructure for objects, where the user owns the responsibility of writing the operator correctly. Other databases do the same. Even going one step ahead, we are already going back to such indexes, if there is unique constraint/ referential integrity constraints. But with all such caveats, it was decided we should not allow covering indexes, as they require going back to the index for updates/deletes. What we decided NOT to do is put xmin/xmax/cid into the index tuple, for precisely the reasons you mention. That would be catastrophic both for write operations to the table, and for the size of the index. This approach is appealing precisely because a single visibility map page can cover a very large chunk of the heap. Is there a potential problem buried somewhere in there? Maybe. But if there is, I'm pretty sure it's something far less obvious than what you seem to be worried about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] synchronized snapshots
Peter Eisentraut wrote: On tis, 2011-08-16 at 20:35 -0400, Tom Lane wrote: In fact, now that I think about it, setting the transaction snapshot from a utility statement would be functionally useful because then you could take locks beforehand. Another issue is that in some client interfaces, BEGIN and COMMIT are hidden behind API calls, which cannot easily be changed or equipped with new parameters. So in order to have this functionality available through those interfaces, we'd need a separately callable command. How do they set a transaction to SERIALIZABLE? Seem the same syntax should be used here. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
Greg Stark wrote: On Mon, Aug 15, 2011 at 9:31 PM, Tom Lane t...@sss.pgh.pa.us wrote: ... and that would be a seriously bad API. ?There are not SUSET restrictions on other resources such as work_mem. ?Why do we need one for this? I think a better analogy would be imposing a maximum number of rows a query can output. That might be a sane thing to have for some circumstances but it's not useful in general. Consider for instance my favourite recursive query application, displaying the lock dependency graph for pg_locks. What arbitrary maximum number of locks would you like to impose at which the query should error out? There is a situation though that I think is motivating this though where it would be nice to detect a problem: when the query is such that it *can't* produce a record because there's an infinite loop before the first record. Ideally you want some way to detect that you've recursed and haven't changed anything that could lead to a change in the recursion condition. But that seems like a pretty hard thing to detect, probably impossible. Actually, using UNION instead of UNION ALL does prevent some infinite loops: WITH RECURSIVE source AS ( SELECT 'Hello' UNION SELECT 'Hello' FROM source ) SELECT * FROM source; Change that to UNION ALL and you have an infinite loop. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Should we have an optional limit on the recursion depth of recursive CTEs?
On 20 August 2011 15:34, Bruce Momjian br...@momjian.us wrote: Actually, using UNION instead of UNION ALL does prevent some infinite loops: While that is worth pointing out, it cannot be recommended as a way of preventing infinite recursion; after all, all 5 WITH RECURSIVE examples in the docs use UNION ALL. It's just a different way of specifying a terminating condition that isn't likely to be applicable to more complicated rCTEs. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] two index bitmap scan of a big table hash_seq_search
On Fri, 19 Aug 2011, Tom Lane wrote: I might be reading too much into the mention of tbm_lossify, but I wonder if the problem is repeated invocations of tbm_lossify() as the bitmap gets larger. Maybe that function needs to be more aggressive about how much information it deletes per call. Thanks for idea, Tom. Yes, it turns out that the problem was in lossify'ing the bitmap to intensely. I've put the elogs around the lossification in tbm_add_tuples() if (tbm-nentries tbm-maxentries) { elog(WARNING, lossifying %d %d, tbm-nentries, tbm-maxentries); tbm_lossify(tbm); elog(WARNING, lossified %d, tbm-nentries); } And I saw in my log koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING: lossifying 13421773 13421772 koposov:wsdb:2011-08-20 17:31:46 BST:21524 WARNING: lossified 13421772 issued with a rate of 2 per second. E.g. it lossifies one page per lossify call (and does a lot of hash_seq_search operations too) ... After that I changed the check in tbm_lossify() from: if (tbm-nentries = tbm-maxentries) to: if (tbm-nentries = (0.8*tbm-maxentries)) which allowed the query finish in 75 seconds (comparing to 3hours). I'm not entirely sure that my fix of the tbm_lossify function is a proper one, but it looks all right. Do you think that this should be fixed ? Sergey *** Sergey E. Koposov, PhD Institute for Astronomy, Cambridge/Sternberg Astronomical Institute Web: http://lnfm1.sai.msu.ru/~math E-mail: m...@sai.msu.ru -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] two index bitmap scan of a big table hash_seq_search
Sergey E. Koposov m...@sai.msu.ru writes: Yes, it turns out that the problem was in lossify'ing the bitmap to intensely. Yeah, I had just been coming to the same conclusion. Your table has about 134M pages, and if the planner estimate of 62M rows was right (and there's no reason it shouldn't be pretty close on that) then we're talking about a bitmap that's going to contain about one bit set in each of about half of the pages. The page structures are 50-some bytes apiece so a non-lossy representation would run to 3-plus GB, well beyond your work_mem limit. So it would fill up to work_mem and then start lossifying pages ... one at a time. I had suspected that there might be a performance issue there, as per the comment at line 954, but we hadn't actually seen it reported from the field before. After that I changed the check in tbm_lossify() from: if (tbm-nentries = tbm-maxentries) to: if (tbm-nentries = (0.8*tbm-maxentries)) which allowed the query finish in 75 seconds (comparing to 3hours). I was about to propose using tbm-maxentries/2, which is in the same spirit but a tad cheaper to calculate. I think that we also need to consider the possibility that tbm_lossify finishes its loop without ever getting under maxentries --- that could only happen with very large tables and very small work_mem, but it could happen. If it did, then all subsequent operations would keep on calling tbm_lossify, and it would keep scanning the entire hashtable and probably not accomplishing much, and taking forever to do it. Unless somebody has a better idea, what I think we should do then is just artificially inflate maxentries --- that is, accept that we are not going to fit in the originally requested work_mem, and we might as well set a more realistic goal. Do you think that this should be fixed ? Yes, definitely. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PL/pgSQL] %TYPE and array declaration - patch
On Sun, 7 Aug 2011 14:57:36 +0200 Wojciech Muła wojciech_m...@poczta.onet.pl wrote: Hi all, does anybody work on this TODO item? http://wiki.postgresql.org/wiki/Todo#PL.2FpgSQL I didn't find any related posting/bug report. Hi, I've prepared simple patch, please review. Since exact array defintion isn't needed anywhere, code detects only if %TYPE is followed by tokens matching regexp ('[' ICONST ']')+. This information is used during type construction. w. diff --git a/src/pl/plpgsql/src/gram.y b/src/pl/plpgsql/src/gram.y index 92b54dd..efdc833 100644 --- a/src/pl/plpgsql/src/gram.y +++ b/src/pl/plpgsql/src/gram.y @@ -76,6 +76,8 @@ static PLpgSQL_expr *read_sql_expression2(int until, int until2, int *endtoken); static PLpgSQL_expr *read_sql_stmt(const char *sqlstart); static PLpgSQL_type *read_datatype(int tok); +static int read_array_dims(int tok); +static int read_single_array_dim(int tok); static PLpgSQL_stmt *make_execsql_stmt(int firsttoken, int location); static PLpgSQL_stmt_fetch *read_fetch_direction(void); static void complete_direction(PLpgSQL_stmt_fetch *fetch, @@ -2503,7 +2505,13 @@ read_datatype(int tok) if (tok_is_keyword(tok, yylval, K_TYPE, type)) { -result = plpgsql_parse_wordtype(dtname); +/* + * Now try to determine if there are any array indicators + * [] or [ICONST] follow type definition, and then use + * this information during type construction. + */ +const int arrndim = read_array_dims(tok); +result = plpgsql_parse_wordtype(dtname, arrndim); if (result) return result; } @@ -2527,7 +2535,8 @@ read_datatype(int tok) if (tok_is_keyword(tok, yylval, K_TYPE, type)) { -result = plpgsql_parse_cwordtype(dtnames); +const int arrndim = read_array_dims(tok); +result = plpgsql_parse_cwordtype(dtnames, arrndim); if (result) return result; } @@ -2582,6 +2591,67 @@ read_datatype(int tok) return result; } +/* + * Parse all items of array declaration: ('[' ICONST? ']')* + */ +static int +read_array_dims(int tok) { + int n = 0; + int k; + + while (true) { + tok = yylex(); + k = read_single_array_dim(tok); + switch (k) { + case 1: /* ok */ +n += 1; +break; + + case 0: /* probably wrong syntax, but don't panic here */ +plpgsql_push_back_token(tok); +return 0; + + case -1: /* end of array def */ +plpgsql_push_back_token(tok); +return n; + + default: +Assert(false); + } + } +} + +/* Parse single item of array declaration: '[' ']' or '[' ICONST ']' */ +static int +read_single_array_dim(int tok) { + int tok2; + if (tok == '[') { + tok = yylex(); + if (tok == ']') { + /* variant [] */ + return 1; + } + else { + /* variant [ICONST] */ + if (tok == ICONST) { +/* valid only if ICONST 0 */ +if (yylval.ival = 0) + return 0; + +tok2 = yylex(); +if (tok2 == ']') + return 1; +else + plpgsql_push_back_token(tok2); + } + return 0; + } + } + else + return -1; +} + + static PLpgSQL_stmt * make_execsql_stmt(int firsttoken, int location) { diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index d22fa68..826eabe 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -1588,6 +1588,35 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3, /* -- + * build_arraydatatype Try to construct array type from a type + * + * Function is used internally by plpgsql_parse_wordtype and + * plpgsql_parse_cwordtype to handle syntax word%TYPE[]. + * + * Returns datatype struct, or NULL if no match found for word. + * -- + */ +static PLpgSQL_type * +build_arraydatatype(Oid typoid, int32 typmod, Oid collation) { + Oid elem_typoid; + + if (type_is_array(typoid)) + /* copy array type */ + return plpgsql_build_datatype(typoid, typmod, collation); + else + { + /* get array type for given scalar type */ + elem_typoid = get_array_type(typoid); + if (elem_typoid != InvalidOid) + return plpgsql_build_datatype(elem_typoid, typmod, collation); + else + /* typoid can't be element type of an array */ + return NULL; + } +} + + +/* -- * plpgsql_parse_wordtype The scanner found word%TYPE. word can be *a variable name or a basetype. * @@ -1595,7 +1624,7 @@ plpgsql_parse_tripword(char *word1, char *word2, char *word3, * -- */ PLpgSQL_type * -plpgsql_parse_wordtype(char *ident) +plpgsql_parse_wordtype(char *ident, int arrndim) { PLpgSQL_type *dtype; PLpgSQL_nsitem *nse; @@ -1613,7 +1642,12 @@ plpgsql_parse_wordtype(char *ident) switch (nse-itemtype) { case PLPGSQL_NSTYPE_VAR: -return ((PLpgSQL_var *) (plpgsql_Datums[nse-itemno]))-datatype; +dtype = ((PLpgSQL_var *) (plpgsql_Datums[nse-itemno]))-datatype; +if (arrndim = 0) + return dtype; +else + return build_arraydatatype(dtype-typoid, +-1, plpgsql_curr_compile-fn_input_collation); /* XXX
Re: [HACKERS] CONGRATULATIONS, David!
Congratulations, David Fetter - on his new arrival! It's a big day!
Re: [HACKERS] the big picture for index-only scans
I think that you have switched gears here and are in this paragraph talking about the 8.4-era visibility map changes rather than my recent work. There is zero evidence that those changes caused any performance problem. I've spent a large chunk of the last four months looking for scalability problems and I haven't come across any evidence that this is an issue. If you think it is, let's have a test case. Consider the TPC-C benchmark. Currently on a four core box, Oracle does 29 transactions per minute. Say we are doing something around half of this. So a page should get quickly filled. If a vacuum runs and the transactions are not touched by the MakePayment transaction, then it will be marked as AllVisible. When the MakePayment transaction updates, it should clear that bit. If we test it out, with too little warehouses, we may not see the effect. Of Course the PGPROC infrastructure for generating transaction ids might be the biggest culprit, but if you ignore that this should be visible. Maybe. Of course, we're only talking about cases where the index-only scan optimization is in use (which is not all of them). But are you saying that, the optimization of looking at visibility map will happen only for Index-only scans and will not use the visibility map infrastructure for the normal index scans? That's definitely a good idea and improvement. d) In addition, currently there is no WAL Logging, while the bit is cleared, which would not be the case in future and hence the exclusive lock held on the visibility map is going to be held for a longer time. This is false and has been false since the visibility map was first implemented. I can't understand this. If you are not doing this, then it would cause consistency issues. Are you saying, we have a crash safe visibility map, but you don't follow log the change before changing the contents/ WAL principle. If so, please explain in detail. If you are doing it in the normal way, then you should be logging the changes before making the changes to the buffer and during that timeframe, you should be holding the lock on the buffer. Heikki specifically pointed out, that you have brought in the WAL Logging of visibility map, within the critical section. Heikki's comments, i am quoting: I believe Robert's changes to make the visibility map crash-safe covers that. Clearing the bit in the visibility map now happens within the same critical section as clearing the flag on the heap page and writing the WAL record. I would be able to respond to your other statements, once we are clear here. Thanks, Gokul.
Re: [HACKERS] the big picture for index-only scans
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: by your argument, if WALInserLock is held for 't' seconds, you should definitely be holding visibility map lock for more than time frame 't'. Nope, that's not how it works. Perhaps you should read the code. See, e.g., heap_update(). -- OK. I took a look at the patch you have supplied in http://postgresql.1045698.n5.nabble.com/crash-safe-visibility-map-take-five-td4377235.html There is a code like this. { all_visible_cleared = true; PageClearAllVisible(BufferGetPage(buffer)); +visibilitymap_clear(relation, +ItemPointerGetBlockNumber((heaptup-t_self)), +vmbuffer); } Here, you are not making an entry into the WAL. then there is an assumption that the two bits will be in sync without any WAL entry. There is a chance that the visibility map might be affected by partial page writes, where clearing of a particular bit might not have been changed. And i am thinking a lot of such issues. Can you just explain the background logic behind ignoring the principle of WAL logging? What are the implemented principles, that protect the Visibility map pages?? Thanks, Gokul.
Re: [HACKERS] the big picture for index-only scans
By your argument, we can say that no-one will create a index with a function like random(), time(), date(), broken operators etc. Its hard to imagine a index in which a a user will only want to insert and never select on it. The whole point of this optimization is to make index access cheaper, not more expensive. You seem convinced it's done the opposite, but you haven't offered much evidence (or any test results) to support that proposition. I hope you are referring to thick indexes/covering indexes/indexes with snapshot. Why do you think its done the opposite? In fact all the other databases like Oracle, SQL-Server, Sybase implement the indexes with snapshot (that's the only one they support). It makes the index tuple larger by 8 bytes, but avoids the heap-fetch. I think, i ran a couple of benchmarks, when i submitted the patch and showed the improvement. The trade-off in that case was simple. Size of the index Vs avoiding a disk I/O. User still has the choice of creating indexes without snapshot( it was provided as an optional index). What we decided NOT to do is put xmin/xmax/cid into the index tuple, for precisely the reasons you mention. That would be catastrophic both for write operations to the table, and for the size of the index. Why it would be catastrophic for write operations on table?? -please explain me. The trade-off in that case was simple. Size of the index Vs avoiding a disk I/O. There was no catastrophic damage on the size of the index, as far as i can see. I made this point, because Heikki pointed out that since no-one is complaining about some performance problem, and so we can assume that it doesn't exist. But the thick index proposal was shot down on the context, some one might create a index on a function like random(), time(). date() or with broken operators, effectively meaning that you can insert into the index and cannot select back. We are already doing unique checks and referential integrity checks on that kind of indexes(which would all be wrong), but still we should not be working in that area, to help user not make that mistake of creating such indexes. So we should apply the same principle for decision making here also. Gokul.