Re: [HACKERS] Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

2012-04-29 Thread Gokulakannan Somasundaram
There might be a patch available for this already. In the worst case articulated above (less than 64 columns), if all the nulls are trailing nulls, the bitmap need not be saved. Actually it is not 64(actually 72), as postgres heaptupleheader is only 23 bytes and one byte is left for the start of

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram
I feel sad, that i followed this topic very late. But i still want to put forward my views. Have we thought on the lines of how Robert has implemented relation level locks. In short it should go like this a) The locks for enforcing Referential integrity should be taken only when the rarest of the

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram
Insert, Update and Delete don't take locks they simply mark the tuples they change with an xid. Anybody else wanting to wait on the lock just waits on the xid. We do insert a lock row for each xid, but not one per row changed. I mean the foreign key checks here. They take a Select for Share

Re: [HACKERS] foreign key locks, 2nd attempt

2012-03-07 Thread Gokulakannan Somasundaram
Please explain in detail your idea of how it will work. OK. I will try to explain the abstract idea, i have. a) Referential integrity gets violated, when there are referencing key values, not present in the referenced key values. We are maintaining the integrity by taking a Select for Share

Re: [HACKERS] cheaper snapshots redux

2011-08-28 Thread Gokulakannan Somasundaram
No, I don't think it will all be in memory - but that's part of the performance calculation. If you need to check on the status of an XID and find that you need to read a page of data in from disk, that's going to be many orders of magnitude slower than anything we do with s snapshot now.

Re: [HACKERS] cheaper snapshots redux

2011-08-26 Thread Gokulakannan Somasundaram
On Tue, Aug 23, 2011 at 5:25 AM, Robert Haas robertmh...@gmail.com wrote: I've been giving this quite a bit more thought, and have decided to abandon the scheme described above, at least for now. It has the advantage of avoiding virtually all locking, but it's extremely inefficient in its

Re: [HACKERS] the big picture for index-only scans

2011-08-24 Thread Gokulakannan Somasundaram
There are extensive comments on this in visibilitymap.c and, in heapam.c, heap_xlog_visible(). I went through the design again and again. I am convinced that this should not have any functional bugs and should not cause much performance issues. Nice thoughts on bypassing the WAL Logging.

Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Gokulakannan Somasundaram
The all_visible_cleared flag is included in the WAL record of the insert (or update or delete). Partial page writes are not a problem, because we always fetch the VM page and clear the bit, regardless of the LSN on the VM page. Two things a) First, my understanding of checkpoint is that it

Re: [HACKERS] the big picture for index-only scans

2011-08-21 Thread Gokulakannan Somasundaram
a) First, my understanding of checkpoint is that it flushes all the pages, that got changed below a particular LSN. If we are not having a LSN in the visibility map, how we will be sure, that a visibility map page is getting flushed/not? Please ignore this statement. I confused between the

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
The above could already happen in 8.4, where the visibility map was introduced. The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I have not heard any complaints of contention on VM buffers. -- Heikki Linnakangas a) First

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
Hmm, you have a point. If 100 backends simultaneously write to 100 different pages, and all of those pages are all-visible, then it's possible that they could end up fighting over the buffer content lock on the visibility map page. But why would you expect that to matter? In a heavily

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:48 PM, Gokulakannan Somasundaram gokul...@gmail.com wrote: The above could already happen in 8.4, where the visibility map was introduced. The contention on the VM buffer would be just as bad whether you hold the heap page lock at the same time or not. I have

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
I think that you have switched gears here and are in this paragraph talking about the 8.4-era visibility map changes rather than my recent work. There is zero evidence that those changes caused any performance problem. I've spent a large chunk of the last four months looking for

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 4:57 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: by your argument, if WALInserLock is held for 't' seconds, you should definitely be holding visibility map lock for more than time frame 't'. Nope, that's not how it works. Perhaps you should read

Re: [HACKERS] the big picture for index-only scans

2011-08-20 Thread Gokulakannan Somasundaram
By your argument, we can say that no-one will create a index with a function like random(), time(), date(), broken operators etc. Its hard to imagine a index in which a a user will only want to insert and never select on it. The whole point of this optimization is to make index access

Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
Note that we already have the visibility map, and the accesses needed to update it are already there. Granted, we'll have to change the logic slightly to make it crash safe, but I don't expect that to add any meaningful overhead - the changes are going to be where the bits are set, ie.

Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
Well, that would certainly be alarming if true, but I don't think it is. As far as I can see, the overhead of making the visibility map crash-safe is just (1) a very small percentage increase in the work being done by VACUUM and (2) a slight possibility of extra work done by a foreground

Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
Well, that would certainly be alarming if true, but I don't think it is. As far as I can see, the overhead of making the visibility map crash-safe is just (1) a very small percentage increase in the work being done by VACUUM and (2) a slight possibility of extra work done by a foreground

Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 19.08.2011 21:06, Gokulakannan Somasundaram wrote: If you are following the same design that Heikki put forward, then there is a problem with it in maintaining the bits in page and the bits

Re: [HACKERS] the big picture for index-only scans

2011-08-19 Thread Gokulakannan Somasundaram
On Sat, Aug 20, 2011 at 2:51 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: On Sat, Aug 20, 2011 at 2:25 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 19.08.2011 21:06, Gokulakannan Somasundaram wrote: If you are following the same design that Heikki put

Re: [HACKERS] Reduce WAL logging of INSERT SELECT

2011-08-05 Thread Gokulakannan Somasundaram
However, for small operations it's a net loss - you avoid writing a WAL record, but you have to fsync() the heap instead. If you only modify a few rows, the extra fsync (or fsyncs if there are indexes too) is more expensive than writing the WAL. We'd need a heuristic to decide whether to

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Gokulakannan Somasundaram
All operations that clear the bit area are already WAL-logged. Is it the case with visibility map also? Thanks.

Re: [HACKERS] crash-safe visibility map, take four

2011-03-23 Thread Gokulakannan Somasundaram
robertmh...@gmail.com wrote: On Wed, Mar 23, 2011 at 2:29 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: All operations that clear the bit area are already WAL-logged. Is it the case with visibility map also? Thanks. Yes. Look at the comment that the patch removes

Re: [HACKERS] crash-safe visibility map, take four

2011-03-22 Thread Gokulakannan Somasundaram
I took a crack at implementing the first approach described above, which seems to be by far the simplest idea we've come up with to date. Patch attached. It doesn't seem to be that complicated, which could mean either that it's not that complicated or that I'm missing something. Feel free

Re: [HACKERS] GSoC Query

2010-03-29 Thread Gokulakannan Somasundaram
Similarly using the no. of select hits on a table we can check that if maximum no. of times it is on a non-index field we can index on that field to make select faster. It's impractical to figure out where indexes should go at without simulating what the optimizer would then do with

[HACKERS] Proposal for Performance improvement for unique checks

2010-03-27 Thread Gokulakannan Somasundaram
I don't think this should involve much code change. But no-one interested On Sat, Mar 27, 2010 at 2:23 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: Hi, Since we insert a new entry into the index for every update that's being made into the table, we inevitably make a unique

Re: [HACKERS] Performance improvement for unique checks

2010-03-27 Thread Gokulakannan Somasundaram
Please write it, then test the performance and publish your results, with a detailed analysis of whether there is benefit and in which cases there is a loss. Thanks. Will do it. Just wanted to know, whether the idea will get rejected right away / worth trying out. Thanks, Gokul.

[HACKERS] A Bug in outDatum ?? (Not Sure )

2010-03-27 Thread Gokulakannan Somasundaram
Hi, The function outDatum doesn't take care of varlena structures which are stored externally. Is it the intention?? Thanks, Gokul.

[HACKERS] Performance improvement for unique checks

2010-03-26 Thread Gokulakannan Somasundaram
Hi, Since we insert a new entry into the index for every update that's being made into the table, we inevitably make a unique check against the older version of the newly inserted row, even when the values are not updated. Of course i am talking about non-HOT updates. (We will not go to the

[HACKERS] Performance Improvement for Unique Indexes

2010-03-24 Thread Gokulakannan Somasundaram
Hi, While i was studying the unique index checks very closely, i realized that what we need is to find out whether the tuple is deleted / not. So say a tuple is deleted by a transaction, but it is not dead( because of some long running transaction ), still we can mark a hint bit as deleted and

Re: [HACKERS] Performance Improvement for Unique Indexes

2010-03-24 Thread Gokulakannan Somasundaram
How are you going to unmark the hint bit in case of a rollback? Only after you find that the transaction is committed, this hint bit has to be set. It is equivalent to any other hint bit. Gokul.

Re: [HACKERS] Performance Improvement for Unique Indexes

2010-03-24 Thread Gokulakannan Somasundaram
it seems fairly unlikely to me that this would be useful enough to justify using up a precious hint bit. The applicability of the hint is very short-term --- as soon as the tuple is dead to all transactions, it can be marked with the existing LP_DEAD hint bit. And if it's only useful for

[HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
Hi, With the implementation of deferred unique constraints, we need to go back to the index second time to check whether the unique check is valid. Say a situation occurs like this a) the first session doing the unique check finds out that there is a unique check required second time and just

Re: [HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
Can you also explain how are we avoiding duplicates in this scenario? a) Say there are three pages(P,Q, R) full of duplicate tuples, that are deleted but not dead of id x(due to some long running transaction). b) Now Session A gets in and checks the duplicate tuples for their liveliness with the

Re: [HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
Are you talking about exclusion constraints or btree uniqueness constraints? This doesn't seem to be a particularly accurate description of the implementation of either one. The way btree deals with this is explained in _bt_doinsert: Unique constraints * NOTE: obviously,

Re: [HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
No, you don't understand how it works. All would-be inserters will hit the same target page to begin with, ie, the first one that the new key could legally be inserted on. The lock that protects against this problem is the lock on that page, regardless of which page the key actually ends up

Re: [HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
T2 : session 1 releases the lock on p1 (its waiting to acquire a ex lock on p2) That's not what we do. See _bt_findinsertloc. regards, tom lane I am really confused. Please keep the cool and explain me, if i am wrong. I could see this code in

Re: [HACKERS] Deadlock possibility in _bt_check_unique?

2010-03-23 Thread Gokulakannan Somasundaram
Oh! yeah, i got it. Thanks On Wed, Mar 24, 2010 at 12:26 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: T2 : session 1 releases the lock on p1 (its waiting to acquire a ex lock on p2) That's not what we do. See _bt_findinsertloc. regards, tom lane

[HACKERS] Proposal for Byte savings in VarBit structure

2010-03-21 Thread Gokulakannan Somasundaram
Hi, I was looking at the VarBit data structure and found out that instead of storing the number of bits in four bytes, we can save the number of bits that are valid in the last byte. Since we already store the number of bytes in Varlena Header, we can calculate the number of valid bits by doing

Re: [HACKERS] Proposal for Byte savings in VarBit structure

2010-03-21 Thread Gokulakannan Somasundaram
This might be worth considering in a release cycle where we were going to break on-disk data compatibility for some other reason. But I can hardly imagine wanting to do it by itself. Marginal space savings for the bit types just isn't that exciting. Maybe we should start a special section

Re: [HACKERS] An idle thought

2010-03-19 Thread Gokulakannan Somasundaram
Surely the VM is already update-friendly. If you update a tuple in a page with the visibility bit set, the bit must be unset or you will get wrong results. I was referring in the context of index only scans to skip visibility checks. I doubt, whether the visibility map feature to skip

Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
On Thu, Mar 18, 2010 at 2:50 AM, Tom Lane t...@sss.pgh.pa.us wrote: Jeff Davis pg...@j-davis.com writes: There are all kinds of challenges there, but it might be worth thinking about. Visibility information is highly compressible, and requires constant maintenance (updates, deletes,

Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
Secondly there's the whole retail vacuum problem -- any index entries referring to this page would be left dangling unless there's some kind of retail vacuum or perhaps a page version number. The issue, we can divide into two a)volatile functions b)broken datatypes For a) I think volatile

Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
I didn't mean that we'd want to compress it to the absolute minimum size. I had envisioned that it would be a simple scheme designed only to eliminate long runs of identical visibility information (perhaps only the frozen and always visible regions would be compressed). The extra level of

Re: [HACKERS] An idle thought

2010-03-18 Thread Gokulakannan Somasundaram
The visibility map itself is already an example of compression. If visibility information were randomly distributed among tuples, the visibility map would be nearly useless. I believe it is very difficult to make visibility map update friendly without compromising durability. But such a

Re: [HACKERS] Bug in 9.0Alpha4

2010-03-17 Thread Gokulakannan Somasundaram
When we were doing the ordered-aggregates patch, I considered passing all those values as explicit parameters to transformAggregateCall, and having it build the Aggref node from scratch and return it. However having seven or eight parameters to transformAggregateCall (and more in future if

[HACKERS] Bug in 9.0Alpha4

2010-03-16 Thread Gokulakannan Somasundaram
Hi, I noticed a problem with the source code of 9.0Alpha 4. In parse_agg.c, there is a call made to transformSortClause. 00098 torder = transformSortClause http://doxygen.postgresql.org/parse__clause_8c.html#53199c36a198b5acf15a26fbd7311f79(pstate, 00099

Re: [HACKERS] Bug in 9.0Alpha4

2010-03-16 Thread Gokulakannan Somasundaram
Hi, I think, this should be the probable fix. There is agg_order in ParseFuncOrColumn, which should get passed on to transformAggregateCall and that should be placed in this call, instead of agg-aggorder. Thanks, Gokul. On Tue, Mar 16, 2010 at 5:19 PM, Gokulakannan Somasundaram gokul

Re: [HACKERS] Bug in 9.0Alpha4

2010-03-16 Thread Gokulakannan Somasundaram
transformSortClause is passed the untransformed aggorder list, which is in fact a list of SortBy nodes, and it returns the transformed list (SortGroupClause nodes), which is stored back into the aggorder field a bit further down. There are a number of regression tests that would fail in

Re: [HACKERS] A thought on Index Organized Tables

2010-03-02 Thread Gokulakannan Somasundaram
I think you have to take up a simpler project as a first project. This is a major overhaul of transaction information and it depends on understanding how a lot of different areas work -- all of which are very complex tricky areas to understand. Greg, I just feel the fast full

Re: [HACKERS] A thought on Index Organized Tables

2010-03-01 Thread Gokulakannan Somasundaram
a) We are already going from table to index to do unique checks. This is the same thing, which we will do to go and update the snapshot in the indexes. No, it is not the same thing. Updating index snapshots requires being able to *re-find* a previously made index entry for the current

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Gokulakannan Somasundaram
No, it is not the same thing. Updating index snapshots requires being able to *re-find* a previously made index entry for the current row. And it has to be done 100% reliably. The worst that happens if an index entry is not found when it should be during a uniqueness check is that the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-28 Thread Gokulakannan Somasundaram
The transaction information on tuples take 18 bytes plus several info bits. It's possible just storing a subset of that would be useful but it's unclear. And I think it would complicate the code if it had to sometimes fetch the heap tuple to get the rest and sometimes doesn't. Visibility

Re: [HACKERS] A thought on Index Organized Tables

2010-02-27 Thread Gokulakannan Somasundaram
If i have got over excited in the previous update, please ignore that. a) We are already going from table to index to do unique checks. This is the same thing, which we will do to go and update the snapshot in the indexes. b) The way, it should work would be to have a check on whether the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
To be a bit more concrete: the typical sort of failure that you could get from broken btree operators is failure of transitivity, that is the comparators report A B and B C for some A, B, C, but do not say that A C when those two values are compared directly. I don't see any convenient

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
It definitely affects current indexes. We can't completely avoid bad user functions. That is why it is important to put limits on how much damage they can do. That's the motivation for the idea I mentioned before, of double-checking visibility data in an IndexTuple before letting it survive a

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
Much better to take on a simple project like enabling full sequential index scans which you claimed you had a solution for and which is in any case an important sub-problem for IOT. Greg, Well i don't think i am ready to take up a project of this size. But at the same time some

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
Missed the group.. On Sat, Feb 27, 2010 at 12:00 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: I definitely think thick indexes were too ambitious of a target for a first time patch. Sequential index scans is very ambitious itself despite being significantly simpler (if you have

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
IIRC, what was being talked about was shoehorning some hint bits into the line pointers by assuming that size and offset are multiples of 4. I'm not thrilled with having mutable status bits there for reliability reasons, but it could be done without breaking a lot of existing code. What I was

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
It does. The point is that the system is set up to limit the bad consequences. You might (will) get wrong query answers, but the heap data won't get corrupted. Again Tom, if there is an update based on index scan, then it takes the tupleid and updates the wrong heap data right? The only

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
It does. The point is that the system is set up to limit the bad consequences. You might (will) get wrong query answers, but the heap data won't get corrupted. Tom, if this is our goal - *can return wrong query answers, but should not corrupt the heap data.* and if we make Thick

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
No, what generally happens is it fails to find a matching index entry at all, because the search algorithm concludes there can be no match based on the limited set of comparisons it's done. Transitivity failures lead to searching the wrong subset of the index. Actually Tom, i am not able to

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
Wait a minute. Bingo So for unique checks we are already going to index from Heap. So it is the same thing i am doing with Thick index. So if we can trust our current unique checks, then we should trust the Thick index. Thanks Tom!!! for having this good conversation I think this

Re: [HACKERS] A thought on Index Organized Tables

2010-02-26 Thread Gokulakannan Somasundaram
Actually Tom, i am not able to understand that completely. But what you are saying that in the current scenario, when there is a broken data type based index, then it will return no results, but never will return wrong results. So never the update will corrupt the heap data. But i take it as

Re: [HACKERS] Lock Wait Statistics (next commitfest)

2010-02-26 Thread Gokulakannan Somasundaram
I am just adding my two cents, please ignore it, if its totally irrelevant. While we do performance testing/tuning of any applications, the important things, a standard monitoring requirement from a database are a) Different type of wait events and the time spent in each of them b) Top ten Queries

Re: [HACKERS] Testing of parallel restore with current snapshot

2010-02-26 Thread Gokulakannan Somasundaram
Tom, I just took the patch, but it seems to be in binary format. Can you send me the patch to me? Thanks, Gokul. On Sat, May 30, 2009 at 3:12 AM, Tom Lane t...@sss.pgh.pa.us wrote: Josh Berkus j...@agliodbs.com writes: Tom, Is anyone interested enough to try it if I code it? If

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
Yes. When a bit is cleared, that's OK, because a cleared bit just means you need to check visibility in the heap tuple. When a bit is set, however, it's important that it doesn't hit the disk before the corresponding heap page update. That's why visibilitymap_set() sets the LSN on the page.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
The replay of the heap insert/update/delete record updates the visibility map. So are you planning to make that section, which writes the xlog and updates the visibility map inside a PANIC section right?

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
The replay of the heap insert/update/delete record updates the visibility map. Say a checkpoint has occured in between and flushed the dirty pages into disk, while the updater waits to update the visibility map. Now there will be no replay for the insert/update/delete right?

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
1) transaction information in index This seems like a lot of bloat in indexes. It also means breaking a lot of other optimizations such as being able to read the tuples directly from the heap page without locking. I'm not sure how much those are worth though. But adding 24 bytes to every

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
Wait a second, which idea are we currently talking about? No heap at all, or just the ability to check visibility without visiting the heap? I was talking about the indexes with snapshot If it's a genuine IOT (ie no separate heap), then you are not going to be able to get away without a

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
I disagree with that, Gokul -- if the ordering operators are volatile or just incorrect, during DELETE, you could set xmax in the wrong IndexTuple. Then there will be another IndexTuple that says it's visible, but it points to a non-visible heap tuple. I think you should follow the pointers

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
Tom, Actually, if you need to squeeze a few more bits into that word, the thing to do would be to get rid of storing the tuple length there. This would involve adding the same type of indirection header that we use for HeapTuples, so that the length would be available at need without going

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
First of all, volatility is not the only issue. The ordering ops could also be incorrect, e.g., violate the transitivity property. there is no reliable way to determine if a function is volatile and/or incorrectly specified. No it is the only issue. If you create a datatype with volatile

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
No, it's far from harmless. As soon as that heap TID gets filled with an unrelated tuple, you run the risk of indexscans alighting on and perhaps modifying the wrong tuple. Tom, In the Function based indexes on those functions, which we are suspecting to be a volatile one Or in the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
On Fri, Feb 26, 2010 at 9:54 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: No, it's far from harmless. As soon as that heap TID gets filled with an unrelated tuple, you run the risk of indexscans alighting on and perhaps modifying the wrong tuple. Tom, i think

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
No, we're not going there. That breaks the fundamental page content manipulation algorithms, and falls down for tuples not yet stored in a page (or being examined without a pointer to the page readily at hand), and has no redeeming social value anyway compared to doing it in the proven

Re: [HACKERS] A thought on Index Organized Tables

2010-02-25 Thread Gokulakannan Somasundaram
Proving that a set of comparison operators are consistent just by examining their runtime behavior is probably equivalent to solving the halting problem. I can't see us doing it, or wanting to accept the overhead of checking it even if it could be done. The overhead of checking is very

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
I think Gokul was asking because he wanted to work on it, but wanted to check community approval first. Yes the problem is that we need to come to a consensus on broken data types. As Heikki pointed out, those data types, which is based on a unstable function like time, date, random etc.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
The fragility there is not an issue in a mostly read-only application, but it definitely would be a concern in other cases. While we accept that visibility map is good for read only application, why can't we make it optional? Atleast if there is a way for a person to drop the visibility map

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Forgot to include the group.. On Wed, Feb 24, 2010 at 5:38 PM, Gokulakannan Somasundaram gokul...@gmail.com wrote: I am not familiar with this term broken data types, and I just looked for it in the source code and couldn't find it. What exactly are you referring to? cheers andrew

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
If you have a scenario where the visibility map incurs a measurable overhead, let's hear it. I didn't see any in the tests I performed, but it's certainly possible that if the circumstances are just right it makes a difference. Heikki, The obvious one, i could observe is that it

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
I think you're a barking up the wrong tree. AFAIUI, the need for the visibility map has not very much to do with whether the table has indices, and everything to do with avoiding unnecessary VACUUMs. In any event, you've not shown that the visibility map HAS any overhead, so talking about

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
But adding 24 bytes to every index entry seems pretty unlikely to be a win anyways. We actually wanted to make it optional. Not every index will be like that. More than that we can make it into 16 bytes. Only commands within the same transaction will not be able to do a index only scan.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
With an IOT I don't understand how you get out of index corruption without data loss. That's a showstopper for practical use, I think. For simplicity, say we are storing all the non-leaf pages of the index in a seperate file, then the leaf pages are nothing but the table. So if we can

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
On Wed, Feb 24, 2010 at 10:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: So you are essentially proposing that rather than moving the heap data into the leaf tuples of the index in the index file, you will move the leaf index data into the

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Yes. The visibility map doesn't need any new WAL records to be written. We probably will need to add some WAL logging to close the holes with crash recovery, required for relying on it for index-only-scans, but AFAICS only for VACUUM and probably only one WAL record for a whole bunch of heap

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Missed the group... On Thu, Feb 25, 2010 at 12:34 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: On Thu, Feb 25, 2010 at 12:28 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: That doesn't work because when you split an index page any sequential scan in progress

Fwd: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
Missed the group So basically you want to have index-only scans, but you want them to be really slow? No Robert, i just asked him to make it optional, so that the crash safe visibility map is used only when its need is realized by the DB designer. When there is a table with no indexes/

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
It might be slightly easier given the assumption that you only want to scan leaf tuples. But there's an additional problem I didn't think of before. Currently we optimize index scans by copying all relevant tuples to local memory so we don't need to hold an index lock for an extended time or

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
That doesn't work because when you split an index page any sequential scan in progress will either see the same tuples twice or will miss some tuples depending on where the new page is allocated. Vacuum has a clever trick for solving this but it doesn't work for arbitrarily many concurrent

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
You also need to avoid scanning the same tuple twice. That's not a problem for VACUUM, but it is for full index scans. Heikki, Actually the logic, which i have explained earlier is to avoid scanning tuples twice. Gokul.

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
I haven't thought about whether this is sufficient but if it is then an initial useful thing to add would be to use it for queries where we have a qual that can be checked using the index key even though we can't do a range scan. For example if you have a btree index on a,b,c and you have a

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
The WAL record of the heap insert/update/delete contains a flag indicating that the visibility map needs to be updated too. Thus no need for a separate WAL record. Heikki, Have you considered these cases? a) The current WAL architecture makes sure that the WAL Log is written before

Re: [HACKERS] A thought on Index Organized Tables

2010-02-24 Thread Gokulakannan Somasundaram
On Thu, Feb 25, 2010 at 3:16 AM, Gokulakannan Somasundaram gokul...@gmail.com wrote: I haven't thought about whether this is sufficient but if it is then an initial useful thing to add would be to use it for queries where we have a qual that can be checked using the index key even though we

Re: [HACKERS] A thought on Index Organized Tables

2010-02-23 Thread Gokulakannan Somasundaram
On Tue, Feb 23, 2010 at 10:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp writes: Instead, how about excluding columns in primary keys from table data? How will you implement select * from mytable ? Or even select * from mytable where

Re: [HACKERS] A thought on Index Organized Tables

2010-02-22 Thread Gokulakannan Somasundaram
. On Mon, Feb 22, 2010 at 12:21 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Gokulakannan Somasundaram wrote: Hi, As you all know, Index Organized tables are a way by which we can automatically cluster the data based on the primary key. While i was thinking about

Re: [HACKERS] A thought on Index Organized Tables

2010-02-22 Thread Gokulakannan Somasundaram
Forgot to include the group... On Mon, Feb 22, 2010 at 4:31 PM, Gokulakannan Somasundaram gokul...@gmail.com wrote: These sound like the same point to me. I don't think we're concerned with footprint -- only with how much of that footprint actually needs to be scanned. So if we have

[HACKERS] A thought on Index Organized Tables

2010-02-21 Thread Gokulakannan Somasundaram
Hi, As you all know, Index Organized tables are a way by which we can automatically cluster the data based on the primary key. While i was thinking about an implementation for postgres, it looks like an impossible with the current ideologies. In an IOT, if a record gets updated, we need to

[HACKERS] Precedence of Postfix operators

2010-02-07 Thread Gokulakannan Somasundaram
Hi, Is there any reason why we have given lesser precedence for postfix operator compared to multiplication/division? Usually postfix operators have more precedence than the binary operations. Is this some kind of work around to provide user-defined operators? Can someone help me understand

  1   2   3   >