Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Nicolas Barbier
2013/11/12 Claudio Freire klaussfre...@gmail.com: On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier nicolas.barb...@gmail.com wrote: (Note that K B-trees can be merged by simply scanning all of them concurrently, and merging them just like a merge sort merges runs. Also, all B-trees except

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 12 November 2013 19:54, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Nov 12, 2013 at 7:14 PM, Simon Riggs si...@2ndquadrant.com wrote: I still think we need to look at this from a query perspective though. We need to check whether there is a class of real world queries that are not

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Nicolas Barbier
2013/11/12 Simon Riggs si...@2ndquadrant.com: On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote: Look-up speed is as follows: Each look-up must look through all B-trees. That can be optimised by using a min max approach, so we need only look at sub-trees that may

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote On 5 November 2013 14:28, Leonardo Francalanci lt; m_lists@ gt; wrote: Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. Please show the

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 06:07, Leonardo Francalanci m_li...@yahoo.it wrote: The use case is pretty simple. Think it as the NSA, as it would be much easier. Collect all calls made/received by any user on a mobile network. (in fact, it's something more than calls, so in fact is 2-10 rows per call).

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Jeremy Harris
On 13/11/13 09:07, Leonardo Francalanci wrote: Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote So in the use case you describe, the min max index would require a scan of only 25% of the table, not the 80% described earlier for random inserts. In my experience, people wish to keep data for much longer periods and so the percentage of scan required would drop lower than

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Jeremy Harris wrote Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to (a group of) the others, and halve the indexes on your main table? Yes; unfortunately not always both are available; but it's something we are thinking about (it requires logic

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: Or, in other words: what are you going to write in the minmax index documentation, try and see if they work better for you? That seems like good advice to me. Bacon Aristotle. There is very little written about

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote From our discussions here, IMHO there is a strong case for avoiding btrees completely for larger historical data tables. That isn't something I had even considered as desirable before this conversation but ISTM now that taking that approach will be more fruitful than

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Merlin Moncure
On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: I would like to see some numbers. Alvaro has given me some results for his patch. The figures I have are for a 2GB table. Index Build Time

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: I would like to see some numbers. Alvaro has given me some results for his

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Simon Riggs
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote: Load time MinMax no overhead, same as raw COPY BTree - considerably slower And just as a general comment, the min max index does not slow down COPY as the table gets larger, whereas the btree gets slower as the table gets

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-13 Thread Leonardo Francalanci
Simon Riggs wrote Can you CLUSTER against a minmax index? Not in this release, at least in my understanding. It's not yet possible to do an ordered fetch, so the cluster scan probably won't work. As per the patch I helped writing, CLUSTER should use the sequential heap scan+sort when it

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Simon Riggs
On 5 November 2013 14:28, Leonardo Francalanci m_li...@yahoo.it wrote: Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. Please show the real world SQL you intend

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Nicolas Barbier
2013/11/2 Simon Riggs si...@2ndquadrant.com: On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: Presumably someone will get around to implementing a btree index insertion buffer one day. I think that would be a particularly compelling optimization for us, because we could avoid

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Nicolas Barbier
2013/11/12 Nicolas Barbier nicolas.barb...@gmail.com: In conclusion, use a “B-forest” when: * The index entries are small (large fan-out). * The insertion throughput is high. * It’s OK for look-ups to be slow. * Extra points when the storage medium has high seek times. Oops, forgot the

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Simon Riggs
On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote: Look-up speed is as follows: Each look-up must look through all B-trees. That can be optimised by using a min max approach, so we need only look at sub-trees that may contain data. Index size: I think (didn’t

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-12 Thread Claudio Freire
On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier nicolas.barb...@gmail.com wrote: (Note that K B-trees can be merged by simply scanning all of them concurrently, and merging them just like a merge sort merges runs. Also, all B-trees except for the first level (of size S) can be compacted 100%

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-11 Thread Jeff Janes
On Tue, Nov 5, 2013 at 9:52 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index.

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-11 Thread Jeff Janes
On Thu, Oct 31, 2013 at 12:43 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Jeff Janes wrote True, but that is also true of indexes created in bulk. It all has to reach disk eventually-- [...] If the checkpoint interval is as long as the partitioning period, then hopefully the

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Andres Freund-3 wrote On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt; klaussfreire@ gt; wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote Everybody on this thread is advised to look closely at Min Max indexes before starting any further work. MinMax will give us access to many new kinds of plan, plus they are about as close to perfectly efficient, by which I mean almost zero overhead, with regard to inserts

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Simon Riggs
On 5 November 2013 08:32, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Everybody on this thread is advised to look closely at Min Max indexes before starting any further work. MinMax will give us access to many new kinds of plan, plus they are about as close to perfectly

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are trying to achieve and test them against minmax indexes. That way we can

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Simon Riggs
On 5 November 2013 09:57, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you are

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Simon Riggs wrote On 5 November 2013 09:57, Leonardo Francalanci lt; m_lists@ gt; wrote: While I do believe in testing (since In theory there is no difference between theory and practice. In practice there is), I would like to know the properties of the minmax index before trying it. What

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 6:57 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Simon Riggs wrote Minmax indexes seem to surprise many people, so broad generalisations aren't likely to be useful. I think the best thing to do is to publish some SQL requests that demonstrate in detail what you

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote Min-max indexes always require a sequential scan of the min-max index itself when querying. I'm worried about the number of heap pages that will be scanned. My understanding is that given the random input, the index will not be selective enough, and will end up requiring a

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 11:28 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I get 9000 pages for 49 values out of 50... which means scanning 90% of the table. Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote real data isn't truly random Well, let's try normal_rand??? create table t1 as select trunc(normal_rand(100, 50, 3)) as n, generate_series(1, 100) as i; with cte as (select min(n) as minn, max(n) as maxn, i/100 from t1 group by i/100),

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Jeff Janes
On Tue, Nov 5, 2013 at 12:25 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Andres Freund-3 wrote On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire lt; klaussfreire@ gt; wrote: Such a thing would help COPY, so maybe it's worth a look

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 12:22 PM, Leonardo Francalanci m_li...@yahoo.it wrote: Claudio Freire wrote you haven't really analyzed update cost, which is what we were talking about in that last post. I don't care for a better update cost if the cost to query is a table scan. Otherwise, I'll just

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Claudio Freire wrote Well, of course, they're not magic pixie dust. Of course they aren't. I think they can make a difference in a sequential input scenario. But I'm not the one who said that they are fit to solve the problems me and other people are talking about in this thread. Claudio

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Leonardo Francalanci
Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index. Maybe even less. Cool!!! Do you have any idea/hint on how I could try and

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Claudio Freire
On Tue, Nov 5, 2013 at 2:52 PM, Leonardo Francalanci m_li...@yahoo.it wrote: Jeff Janes wrote Some experiments I did a few years ago showed that applying sorts to the data to be inserted could be helpful even when the sort batch size was as small as one tuple per 5 pages of existing index.

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Greg Stark
On Tue, Nov 5, 2013 at 5:30 PM, Claudio Freire klaussfre...@gmail.comwrote: Maybe there's value in minmax indexes for sequential data, but not for random data, which is the topic of this thread. Well, of course, they're not magic pixie dust. But is your data really random? (or normal?)

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-05 Thread Alvaro Herrera
Greg Stark escribió: I think minmax indexes are more akin to bitmap indexes. They will be very effective for columns with low-cardinality, especially for columns that are very clustered. In the extreme if all the values in some regions of the table are the same then minmax indexes would be

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote: On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't see much interest in insert-efficient indexes. Presumably someone

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 1:09 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote: On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help performance on some workloads, though I suspect the buffer would have to be pretty

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:31 AM, Claudio Freire klaussfre...@gmail.com wrote: On Mon, Nov 4, 2013 at 1:27 PM, Robert Haas robertmh...@gmail.com wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com wrote: Such a thing would help COPY, so maybe it's worth a look I

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Andres Freund
On 2013-11-04 11:27:33 -0500, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:24 AM, Claudio Freire klaussfre...@gmail.com wrote: Such a thing would help COPY, so maybe it's worth a look I have little doubt that a deferred insertion buffer of some kind could help performance on some

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Robert Haas
On Mon, Nov 4, 2013 at 11:32 AM, Andres Freund and...@2ndquadrant.com wrote: I think doing this outside of s_b will make stuff rather hard for physical replication and crash recovery since we either will need to flush the whole buffer at checkpoints - which is hard since the checkpointer

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Gavin Flower
On 05/11/13 05:35, Robert Haas wrote: On Mon, Nov 4, 2013 at 11:32 AM, Andres Freund and...@2ndquadrant.com wrote: I think doing this outside of s_b will make stuff rather hard for physical replication and crash recovery since we either will need to flush the whole buffer at checkpoints - which

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 4 November 2013 16:09, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote: On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Jeff Janes
On Mon, Nov 4, 2013 at 8:09 AM, Robert Haas robertmh...@gmail.com wrote: On Sat, Nov 2, 2013 at 6:07 AM, Simon Riggs si...@2ndquadrant.com wrote: On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote:

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Claudio Freire
On Mon, Nov 4, 2013 at 5:01 PM, Simon Riggs si...@2ndquadrant.com wrote: Of course, it's possible that even we do get a shared memory allocator, a hypothetical person working on this project might prefer to make the data block-structured anyway and steal storage from shared_buffers. So my

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 4 November 2013 19:55, Gavin Flower gavinflo...@archidevsys.co.nz wrote: How about having a 'TRANSIENT INDEX' that only exists in memory, so there is no requirement to write it to disk or to replicate directly? This type of index would be very fast and easier to implement. Recovery would

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-04 Thread Simon Riggs
On 30 October 2013 14:34, Yann Fontana yann.font...@gmail.com wrote: On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote: In terms of generality, do you think its worth a man year of developer effort to replicate what you have already achieved? Who would pay? I work on

Re: [HACKERS] Fast insertion indexes: why no developments

2013-11-02 Thread Simon Riggs
On 29 October 2013 16:10, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't see much interest in insert-efficient indexes. Presumably someone will get around to implementing a btree index insertion buffer one day. I

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Jeff Janes wrote True, but that is also true of indexes created in bulk. It all has to reach disk eventually-- [...] If the checkpoint interval is as long as the partitioning period, then hopefully the active index buffers get re-dirtied while protected in shared_buffers, and only get

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-31 Thread Leonardo Francalanci
Gavin Flower-2 wrote How about being able to mark indexes: 'MEMORY ONLY' to make them not go to disk and 'PERSISTENT | TRANSIENT' to mark if they should be recreated on machine bootup? I would love that. But: 1) I'd like to make some tests with a memory drive, and confirm that in

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a way of reducing the size of indexes, yet still allowing uniqueness

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 07:55, Leonardo Francalanci m_li...@yahoo.it wrote: Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Presumably the data you are inserting isn't actually random. Please describe the use case you are considering in more detail and some view on how frequent that is, with some examples. Once we understand the use case and agree it is important, we might solve problems. Collecting calls data

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 10:35, Leonardo Francalanci m_li...@yahoo.it wrote: Presumably the data you are inserting isn't actually random. Please describe the use case you are considering in more detail and some view on how frequent that is, with some examples. Once we understand the use case and

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
What is the reason for needing such fast access to individual groups of records? Sure sounds like the NSA or similar ;-) Users need to search all calls originated from/to a user or from/to a specific mobile phone to answer/analyze customers' probl... ok, I give up: I work for the NSA ;) In

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Kaare Rasmussen
On 2013-10-30 12:08, Simon Riggs wrote: effort to replicate what you have already achieved? Who would pay? The NSA, obviously ;-) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Simon Riggs
On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote: What is the reason for needing such fast access to individual groups of records? Sure sounds like the NSA or similar ;-) Users need to search all calls originated from/to a user or from/to a specific mobile phone to

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
LSM-trees seem patent free I'm no expert, and I gave it just a look some time ago: it looked to me very complicated to get right... and as far as I remember you don't get that much gain, unless you go multi-level which would complicate things further Please somebody advise patent status of

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Yann Fontana
On 30 October 2013 11:23, Leonardo Francalanci m_li...@yahoo.it wrote: In terms of generality, do you think its worth a man year of developer effort to replicate what you have already achieved? Who would pay? I work on an application that does exactly what Leonardo described. We hit the

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Merlin Moncure
On Wed, Oct 30, 2013 at 9:23 AM, Leonardo Francalanci m_li...@yahoo.it wrote: LSM-trees seem patent free I'm no expert, and I gave it just a look some time ago: it looked to me very complicated to get right... and as far as I remember you don't get that much gain, unless you go multi-level

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 3:35 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Presumably the data you are inserting isn't actually random. Please describe the use case you are considering in more detail and some view on how frequent that is, with some examples. Once we understand the use

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 4:23 AM, Leonardo Francalanci m_li...@yahoo.itwrote: 1) I haven't achieved what I need: realtime indexing. I can't query the current 15 minutes table efficiently. Plus, K*log(N) is not that great when you have a lot of K. Are partitions read-only once time has moved

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote The index insertions should be fast until the size of the active part of the indexes being inserted into exceeds shared_buffers by some amount (what that amount is would depend on how much dirty data the kernel is willing to allow in the page cache before it starts suffering

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote Are partitions read-only once time has moved on, or can stragglers show up that need to be inserted into older partitions? You could periodically merge older partitions into larger tables, index those aggregated tables, then transactionally disinherit the old partitions

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Point being: hardware is marching along pretty fast (after 20+ years of stagnation) and it's dangerous (IMO) to make big software investments based on the situation on the ground *today*. Yes, that's a good point. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Leonardo Francalanci
Jeff Janes wrote You could periodically merge older partitions into larger tables, index those aggregated tables, then transactionally disinherit the old partitions and inherit the new aggregated one. This would keep the value of K down, at the expense of re-writing data multiple times (but

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Jeff Janes
On Wed, Oct 30, 2013 at 9:54 AM, Leonardo Francalanci m_li...@yahoo.itwrote: Jeff Janes wrote The index insertions should be fast until the size of the active part of the indexes being inserted into exceeds shared_buffers by some amount (what that amount is would depend on how much dirty

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Claudio Freire
On Wed, Oct 30, 2013 at 10:53 AM, Simon Riggs si...@2ndquadrant.com wrote: LSM-tree also covers the goal of maintaining just 2 sub-trees and a concurrent process of merging sub-trees. That sounds like it would take a lot of additional time to get right and would need some off-line process to

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-30 Thread Gavin Flower
On 31/10/13 06:46, Jeff Janes wrote: On Wed, Oct 30, 2013 at 9:54 AM, Leonardo Francalanci m_li...@yahoo.it mailto:m_li...@yahoo.it wrote: Jeff Janes wrote The index insertions should be fast until the size of the active part of the indexes being inserted into exceeds

[HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hi, I don't see much interest in insert-efficient indexes. These are the ones I've found: - LSM-tree (used by Cassandra and SQLite4?) - Y-Tree (http://www.bossconsulting.com/oracle_dba/white_papers/DW%20in%20oracle/P23%20(ytree%20index%20structure%20for%20DWs).pdf ) - Fractal indexes

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Craig Ringer
On 10/29/2013 03:53 PM, Leonardo Francalanci wrote: 5) something else??? Quite likely nobody has had the enthusiasm and interest to implement a viable, quality implementation and stick with it long enough to get it committed. There are a great many good ideas for improvements to Pg that just

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes: On 10/29/2013 03:53 PM, Leonardo Francalanci wrote: 5) something else??? Quite likely nobody has had the enthusiasm and interest to implement a viable, quality implementation and stick with it long enough to get it committed. There are a great

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Merlin Moncure
On Tue, Oct 29, 2013 at 2:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Hi, I don't see much interest in insert-efficient indexes. These are the ones I've found: - LSM-tree (used by Cassandra and SQLite4?) - Y-Tree

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on them.  The non-btree index types that have been getting love are the ones

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Alvaro Herrera
Leonardo Francalanci wrote: Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on them.  The non-btree index types that have

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread k...@rice.edu
On Tue, Oct 29, 2013 at 02:53:37PM +, Leonardo Francalanci wrote: Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Nobody's bothered to add WAL support, let alone do any other real work on

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Leonardo Francalanci m_li...@yahoo.it writes: Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Aren't hash indexes in a poor state because they are not faster than btree in every condition? They

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Another point to add: I don't really see btree as a barrier to performance for most of the problems I face.  The real barriers to database performance are storage, contention, and query planning. Ehm that's true for regular OLTP stuff, which I understand is what most (95%?) of people

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
They should, in theory, be faster than btrees -- O(1) not O(log N) page fetches per lookup.  In practice they don't seem to be faster, and nobody's bothered to find out exactly why.  Again, this isn't a terribly encouraging precedent for implementing some other index type that's supposed to

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Claudio Freire
On Tue, Oct 29, 2013 at 1:10 PM, Peter Geoghegan p...@heroku.com wrote: On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't see much interest in insert-efficient indexes. Presumably someone will get around to implementing a btree index insertion buffer one

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Merlin Moncure
On Tue, Oct 29, 2013 at 10:49 AM, Leonardo Francalanci m_li...@yahoo.it wrote: Another point to add: I don't really see btree as a barrier to performance for most of the problems I face. The real barriers to database performance are storage, contention, and query planning. Ehm that's true

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Peter Geoghegan
On Tue, Oct 29, 2013 at 7:53 AM, Leonardo Francalanci m_li...@yahoo.it wrote: I don't see much interest in insert-efficient indexes. Presumably someone will get around to implementing a btree index insertion buffer one day. I think that would be a particularly compelling optimization for us,

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
I bet you've mis-diagnosed the problem.  Btrees don't have a problem keeping up with 50m records; you're problem is that after a certain point your page cache can't keep up with the pseudo-random i/o patterns and you start seeing faults to storage. [...]  This has nothing to do the btree

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Jeff Janes
On Tue, Oct 29, 2013 at 8:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Leonardo Francalanci m_li...@yahoo.it writes: Before getting too excited about some new academic index type, it's worth noting the sad state in which hash indexes have languished for years. Aren't hash indexes in a poor

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes: Robert removed the lmgr lock on the meta page by using a retry loop with lightweight locks. I've outlined how to remove the heavyweight lock on the bucket page as well, but it would require an on-disk change to the index so that each page knows how far

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Simon Riggs
On 29 October 2013 07:53, Leonardo Francalanci m_li...@yahoo.it wrote: I don't see much interest in insert-efficient indexes. Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large

Re: [HACKERS] Fast insertion indexes: why no developments

2013-10-29 Thread Leonardo Francalanci
Hmm, you realise Alvaro is working on MinMax indexes in this release? They are very efficient with regard to index inserts and specially designed for use on large tables. Prior work by Heikki on Grouped Item Tuples was a way of reducing the size of indexes, yet still allowing uniqueness