Re: [HACKERS] Is it possible to have a fast-write Index?
El vie., 19 jun. 2015 a las 15:06, Simon Riggs (si...@2ndquadrant.com) escribió: It doesn't say anything about their being only one index buffer per table, nor do I think it would make sense to do it that way. So ISTM that the foreground process still has to insert serially into N index buffers, with each insert being WAL logged. So the only saving for the foreground process is the random I/O from inserting into the indexes, which means the value of the technique is in the case where you have many very large secondary indexes - which is now covered by BRIN. I'm still learning how postgresql, but, you're assuming when inserting in bulk into an insert would require the same amount of CPU cycles and the same amount of kB written compared to doing it row-by-row. Most memory-based indexes have a bulk load technique that relies in having the data pre-sorted. Sorting pure random data and then bulk-inserting it into the index is faster than the classic insertion. (less CPU time, no idea about IO) Database indexes are disk-based and there are some points (regarding IO performance) that are hard for me to fully understand. But seems logic that would be faster to scan the index only once from begin to end and do something like a merge sort between pre-sorted input and the index. So I guess I missed something. Maybe is WAL logging the problem? If so, could this work for TEMP/UNLOGGED tables? Lots of tables that are heavily written are materialized views (or they perform more or less the same), so they could be refreshed in case of server failure. I hope bulk inserts could double the performance; otherwise, this idea may not be worth it. About BRIN indexes, i'm really impressed. They are several times faster than I could imagine. Also, on select they perform very well. I have to test them more, with more complex queries (they would work when used on JOIN clauses?). If select times are good enough even in those cases, then there's no need for doing bulk-inserts with btree.
Re: [HACKERS] Is it possible to have a fast-write Index?
On 19 June 2015 at 14:30, Robert Haas robertmh...@gmail.com wrote: So I really doubt that anyone would have any enthusiasm for saddling btree with a similar mechanism. It's complicated (and has been the cause of multiple bugs); it's hard to figure out when is the optimal time to flush the pending insertions; and it slows down searches in favor of making inserts cheap, which is generally not the way to bet --- if that's the tradeoff you want, why not drop the index altogether? I'm not sure you're right about that. MySQL has a feature called secondary index buffering: https://dev.mysql.com/doc/refman/5.0/en/innodb-insert-buffering.html Now that might not be exactly what we want to do for one reason or another, but I think it would be silly to think that they implemented that for any reason other than performance, so there may be some performance to be gained there. Consider that on a table with multiple indexes, we've got to insert into all of them. If it turns out that the first leaf page we need isn't in shared buffers, we'll wait for it to be read in. We won't start the second index insertion until we've completed the first one, and so on. So the whole thing is serial. In the system MySQL has implemented, the foreground process would proceed unimpeded and any indexes whose pages were not in the buffer pool would get updated in the background. Ignoring for the moment the complexities of whether they've got the right design and how to implement it, that's sort of cool. Interesting. Reading that URL it shows that they would need to write WAL to insert into the buffer and then again to insert into the index. You might get away with skipping WAL logs on the index buffer if you had a special WAL record to record the event all indexes updated for xid , but since that would be written lazily it would significantly complicate the lazy update mechanism to track that. It doesn't say anything about their being only one index buffer per table, nor do I think it would make sense to do it that way. So ISTM that the foreground process still has to insert serially into N index buffers, with each insert being WAL logged. So the only saving for the foreground process is the random I/O from inserting into the indexes, which means the value of the technique is in the case where you have many very large secondary indexes - which is now covered by BRIN. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Is it possible to have a fast-write Index?
On Thu, Jun 11, 2015 at 9:32 PM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote: So I really doubt that anyone would have any enthusiasm for saddling btree with a similar mechanism. It's complicated (and has been the cause of multiple bugs); it's hard to figure out when is the optimal time to flush the pending insertions; and it slows down searches in favor of making inserts cheap, which is generally not the way to bet --- if that's the tradeoff you want, why not drop the index altogether? Hint bits update is also painful in above case, but it is out of the topic here. Are your records spread out around many transactions or so you tend to have large batches all with the same xid? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
So I really doubt that anyone would have any enthusiasm for saddling btree with a similar mechanism. It's complicated (and has been the cause of multiple bugs); it's hard to figure out when is the optimal time to flush the pending insertions; and it slows down searches in favor of making inserts cheap, which is generally not the way to bet --- if that's the tradeoff you want, why not drop the index altogether? I'm not sure you're right about that. MySQL has a feature called secondary index buffering: https://dev.mysql.com/doc/refman/5.0/en/innodb-insert-buffering.html Now that might not be exactly what we want to do for one reason or another, but I think it would be silly to think that they implemented that for any reason other than performance, so there may be some performance to be gained there. Consider that on a table with multiple indexes, we've got to insert into all of them. If it turns out that the first leaf page we need isn't in shared buffers, we'll wait for it to be read in. We won't start the second index insertion until we've completed the first one, and so on. So the whole thing is serial. In the system MySQL has implemented, the foreground process would proceed unimpeded and any indexes whose pages were not in the buffer pool would get updated in the background. Ignoring for the moment the complexities of whether they've got the right design and how to implement it, that's sort of cool. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
On 6/12/15 9:17 PM, deavid wrote: (upper(codagencia::text) ); FWIW, you should consider using citext for these cases; it would let you cut down on the number of indexes (by 50%?). -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
I did another try on BRIN and GIN indexes, and I compared to regular btree indexes. Now i have 16M rows to do the test. The numbers seem to be good. Both GIN and BRIN seem good options for certain tables with more writes than reads (Specially BRIN is very good) I want to share with you my test; I used real-world data, but i didn't had time to do something accurate or real-word uses. I know the methodology is not enough, and maybe some calculations on the spreadsheet are wrong. I tried to do my best. I'm using an SSD and I'm trying to compare CPU cost, not I/O. In short, the results were: (compared to btree) - INSERT: GIN is 50% faster; BRIN is 6x faster. This is the best scenario. - UPDATE: each case has a winner; for big updates BRIN is 10x faster and GIN is 25x faster. For small updates (most real world cases) BTREE is always the winner; but BRIN gives some good results too. - DELETE: Almost no difference between the three. - SELECT: BTREE here is the winner. BRIN is 10% slower, and GIN performance seems a bit random. VACUUM, ANALYZE and other tasks are 6x faster with BRIN, 50% faster with GIN. Index sizes are 50% smaller with GIN, but with BRIN they are very very small Hope you find useful these numbers. El sáb., 13 jun. 2015 a las 11:41, deavid (deavidsed...@gmail.com) escribió: Sorry; Because some misconfiugration vacuum and analyze were'nt working. Now I'm getting better numbers for BRIN indexes where there are zero rows to match. El sáb., 13 jun. 2015 a las 3:17, deavid (deavidsed...@gmail.com) escribió: So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys. * Testing deliverynotes table. - Definition and use case: It is a table contaning real delivery note headers of several years It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data excluding indexes. Since is a table visible for users, almost every column can be searched so we need lots of indexes. We do not need searches to be the fastest possible, we only need to accelerate a bit our user searches; without harming too much writes. - Things to test: - measure index creation times. - measure index space. - with indexes but without primary key - with everything - Create fully, delete everything and Insert again data in blocks - Test updates for recent data I attached the logs for every test, if anyone wants to see what i'm exactly testing. This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can. I'm using a dev build for Postgresql 9.5 downloaded from git sources. Conclusions: - Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought. - BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results. - Primary keys and uniqueness checks doesn't seem to make any difference here. - Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual. - Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again. And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something. El vie., 12 jun. 2015 a las 7:27, Simon Riggs (si...@2ndquadrant.com) escribió: On 5 June 2015 at 18:07, deavid deavidsed...@gmail.com wrote: There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind. Some people already asked for delayed write indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed
Re: [HACKERS] Is it possible to have a fast-write Index?
deavid wrote: I did another try on BRIN and GIN indexes, and I compared to regular btree indexes. Now i have 16M rows to do the test. The numbers seem to be good. Both GIN and BRIN seem good options for certain tables with more writes than reads (Specially BRIN is very good) Thanks, that's very nice to hear. So you may or may not have realized two important details regarding BRIN. One is that when you insert into a table, the values are not immediately put into the index but only as part of a later vacuum, or a brin_summarize_new_values() function call. Either of those things scan the not-already-summarized part of the table and insert the summarized values into the index. If the new values are not in the index, then a query would have to read that part of the table completely instead of excluding it from the scan. The other thing is that in BRIN you can tell the system to make the summary information very detailed or coarsely detailed -- say one summary tuple for every page, or one summary tuple for every 128 pages. The latter is the default. Obviously, the more detailed it is, the more you can skip when scanning the table. If the values are perfectly correlated, then there's no point to the extra detail, but if there's some variability then it could be worthwhile. You change this by specifying WITH (pages_per_range=16) to the CREATE INDEX command, or by doing ALTER INDEX SET (pages_per_range=16) and then REINDEX it. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
2015-06-05 deavid deavidsed...@gmail.com: Mode 3: on aminsert, put the new entry on a second btree; leaving the first one untouched. Because the second btree is new, will be small, and writes should be faster. When doing a index scan, read tuples from both at same time (like merge sort). On vacuum, merge the second btree onto the first. On this mode, the index is sorted and there's no need of recheck. You might be interested in reading the thread “Fast insertion indexes: why no developments” (2013), starting at 1383033222.73186.yahoomail...@web172602.mail.ir2.yahoo.com . That thread talks mostly about reducing the (delayed) I/O caused by inserting in a super-big index at a continuously high rate, while you seem more interested in the delay problem caused by the CPU time used when inserting in multiple indexes (which should be quite fast already, as most of the writing is delayed). Your problem (if it is really about delay and not about continuous throughput) might be better served by a delay-reducing solution such as writing a logical “row X is inserted, please make sure that all indexes are up-to-date” to the WAL, instead of the physical “row X is inserted into table A, part Y of index Z must be updated like this, part Q of index S must be updated like so, etc” as is done now. Updating the indexes (not only the writing itself) would then be performed in a delayed fashion. Reading of an index must always additionally scan the in-memory queue of logical WAL records that is kept. Of course, switching the WAL wholesale from a physical description of the changes that must be performed to a logical description is probably not feasible. Therefore, one could think about some new kind of “logical WAL” that gets logged separately (or even mixed in with the normal, physical WAL), where first the logical WAL is written (“insert row X in table A”), after which other operations can continue and the logical WAL is converted to physical WAL asynchronously (by “performing” the changes as is currently done, but by a background process). Recovery then would first need to replay the physical WAL, and then replay the logical WAL. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
Sorry; Because some misconfiugration vacuum and analyze were'nt working. Now I'm getting better numbers for BRIN indexes where there are zero rows to match. El sáb., 13 jun. 2015 a las 3:17, deavid (deavidsed...@gmail.com) escribió: So I just ran a test case for hash, btree, gin_btree and brin indexes. Also without indexes, and without primary keys. * Testing deliverynotes table. - Definition and use case: It is a table contaning real delivery note headers of several years It consists of 300k rows, 128 columns, 63 indexes, 243Mb of data excluding indexes. Since is a table visible for users, almost every column can be searched so we need lots of indexes. We do not need searches to be the fastest possible, we only need to accelerate a bit our user searches; without harming too much writes. - Things to test: - measure index creation times. - measure index space. - with indexes but without primary key - with everything - Create fully, delete everything and Insert again data in blocks - Test updates for recent data I attached the logs for every test, if anyone wants to see what i'm exactly testing. This was tested on my i5 laptop with 4Gb of RAM and a 120Gb SSD (OCZ Agility 3). I'm trying to measure CPU time, not I/O time, so some configurations and tests are specific to avoid as much as IO as I can. I'm using a dev build for Postgresql 9.5 downloaded from git sources. Conclusions: - Gin_btree seems slower in almost every case. It's writes are marginally better than regular btrees even when using work_mem=160MB. (May be 20% faster than btree). They are smaller than I thought. - BRIN indexes seem very fast for writes. For selects maybe is a blend between having indexes and don't having them. They don't recognize that some values are simply out of range of indexed values, and that's a pity. If the values we want are packed together I guess I would get even better results. - Primary keys and uniqueness checks doesn't seem to make any difference here. - Having no indexes at all is faster than I imagined. (Sometimes it beats BRIN or Btree) Maybe because the IO here is faster than usual. - Hash indexes: i tried to do something, but they take too much time to build and i don't know why. If creates are slow, updates should be slow too. I'm not going to test them again. And finally, don't know why but i couldn't vacuum or analyze tables. It always get stalled without doing anything; so i had to comment every vacuum. Maybe there is a bug in this dev version or i misconfigured something. El vie., 12 jun. 2015 a las 7:27, Simon Riggs (si...@2ndquadrant.com) escribió: On 5 June 2015 at 18:07, deavid deavidsed...@gmail.com wrote: There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind. Some people already asked for delayed write indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects). This is exactly the use case and mechanism for BRIN indexes. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Is it possible to have a fast-write Index?
On 6/5/15 6:54 PM, deavid wrote: So the problem is: i see a low iowait, and CPU time for one core is at 80-90% most of the time. I can buy more ram, better disks, or cpu's with more cores. But one cpu core would have more-or-less the same speed no matter how much money you invest. When someone wants a delayed-write index is similar to setting synchronous_commit = off. We want to give an OK to the backend as soon as is possible and do this work in background. But we also want some reliability against crashes. Also, if the task is done in background it may be done from other backend, so probably several indexes could be packed at once using different backend processes. We could use the entire cpu if our index writes aren't tied to the session who wrote the row. Something that might help here would be doing the index maintenance in parallel via background workers. There's now enough parallelism infrastructure that it shouldn't be too hard to hack up a quick test of that. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
On 5 June 2015 at 18:07, deavid deavidsed...@gmail.com wrote: There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind. Some people already asked for delayed write indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects). This is exactly the use case and mechanism for BRIN indexes. -- Simon Riggshttp://www.2ndQuadrant.com/ http://www.2ndquadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training Services
Re: [HACKERS] Is it possible to have a fast-write Index?
On Fri, Jun 5, 2015 at 10:59 AM, Tom Lane t...@sss.pgh.pa.us wrote: So I really doubt that anyone would have any enthusiasm for saddling btree with a similar mechanism. It's complicated (and has been the cause of multiple bugs); it's hard to figure out when is the optimal time to flush the pending insertions; and it slows down searches in favor of making inserts cheap, which is generally not the way to bet --- if that's the tradeoff you want, why not drop the index altogether? I have seen a case that a major fact table with up to 7 indices, every 15~60 mins with large amount of data loading, and there are concurrrent seeks against indices at the same time. We can play with partitioning, or sarcrifice some application semantics, to alleviate the pressure but it is good to see if we can improve: sorting and batching insert into btree is helpful for better IO and locking behavior. So can we guard the case that hard to handle, e.g., the indices enforcing some constraints (like uniqueness), and improve the loading senario? Hint bits update is also painful in above case, but it is out of the topic here. Thanks, Qingqing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
On Fri, Jun 05, 2015 at 11:54:01PM +, deavid wrote: Thanks to everybody for answering. I wasn't expecting this attention; this is a great community :-) Jim asked me about something real. Well, the problem is this showed up more than five years ago, and keeps popping from time to time since in different circumstances. I solved them in different ways each time, depending the exact use-case. I wanted to generalize, because seems a good feature for several situations; and I don't expect a solution for me as each time I hit with this I found some way to sort it out. As Jim said, we need here are figures for real examples, and i don't have yet. I'll do my homework and email back with exact problems with exact timing. Give me a week or two. Also, some of you are talking about IO. Well, it's hard to say without the figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on those big databases, and also in my tests i tried setting fsync=off. So the problem is: i see a low iowait, and CPU time for one core is at 80-90% most of the time. I can buy more ram, better disks, or cpu's with more cores. But one cpu core would have more-or-less the same speed no matter how much money you invest. When someone wants a delayed-write index is similar to setting synchronous_commit = off. We want to give an OK to the backend as soon as is possible and do this work in background. But we also want some reliability against crashes. Also, if the task is done in background it may be done from other backend, so probably several indexes could be packed at once using different backend processes. We could use the entire cpu if our index writes aren't tied to the session who wrote the row. PD: I'm very interested on existent approaches like GIN or BRIN (this one is new to me). Thanks a lot; i'll try them in my tests. Hi David, Here is an interesting read comparing LSM and Fractal Tree indexing: http://highscalability.com/blog/2014/8/6/tokutek-white-paper-a-comparison-of-log-structured-merge-lsm.html Regards, Ken -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Is it possible to have a fast-write Index?
There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind. Some people already asked for delayed write indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects). Since I do not know every internal of postgres, i feel simpler to share here and ask which things can or cannot be done. Let's imagine there is a new type of index called weird_btree, where we trade-off simplicity for speed. In almost every mode, we will rely on VACUUM to put our index in optimal state. Mode 1: on aminsert mark the index as INVALID. So, if you modified the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before doing SELECT. This is almost the same as create index concurrently, the main difference is you don't have to remember to drop the index before writing. (I don't see much benefit here) Mode 2: on aminsert, put the new entry in a plain, unordered list instead of the btree. Inserting at the end of a list should be faster than big btrees and you'll know later which entries you missed indexing. Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the btree there is the list and output every row, out-of order. You will have to tell postgres that our index isn't sorted and it will have to recheck every row. Mode 2.b: mark the index invalid instead. When doing the next vacuum, sort the list and insert it to the btree in a bulk operation. If it's ok, mark the index valid. Mode 3: on aminsert, put the new entry on a second btree; leaving the first one untouched. Because the second btree is new, will be small, and writes should be faster. When doing a index scan, read tuples from both at same time (like merge sort). On vacuum, merge the second btree onto the first. On this mode, the index is sorted and there's no need of recheck. Anyone thinks this would be a interesting feature for postgresql? Did I miss something? PD: Maybe it's also possible to take advantage of clustering, and have indexes which entries are range of TIDs; but i'm not sure if this is too exotic, or if it will make a difference. Sincerely, David.
Re: [HACKERS] Is it possible to have a fast-write Index?
On 06/06/15 04:07, deavid wrote: There are several use cases where I see useful an index, but adding it will slow too much inserts and updates. For example, when we have 10 million rows on a table, and it's a table which has frequent updates, we need several index to speed up selects, but then we'll slow down updates a lot, specially when we have 10 or more indexes. Other cases involve indexes for text search, which are used only for user search and aren't that important, so we want to have them, but we don't want the overload they put whenever we write on the table. I know different approaches that already solve some of those problems in some ways (table partitioning, partial indexes, etc), but i don't feel they are the solution to every problem of this kind. Some people already asked for delayed write indexes, but the idea gets discarded because the index could get out of sync, so it can omit results and this is unacceptable. But i think maybe that could be fixed in several ways and we can have a fast and reliable index (but maybe not so fast on selects). Since I do not know every internal of postgres, i feel simpler to share here and ask which things can or cannot be done. Let's imagine there is a new type of index called weird_btree, where we trade-off simplicity for speed. In almost every mode, we will rely on VACUUM to put our index in optimal state. Mode 1: on aminsert mark the index as INVALID. So, if you modified the table you need to run REINDEX/CREATE INDEX CONCURRENTLY before doing SELECT. This is almost the same as create index concurrently, the main difference is you don't have to remember to drop the index before writing. (I don't see much benefit here) Mode 2: on aminsert, put the new entry in a plain, unordered list instead of the btree. Inserting at the end of a list should be faster than big btrees and you'll know later which entries you missed indexing. Mode 2.a: on index scan (amrescan, amgettuple), pretend that after the btree there is the list and output every row, out-of order. You will have to tell postgres that our index isn't sorted and it will have to recheck every row. Mode 2.b: mark the index invalid instead. When doing the next vacuum, sort the list and insert it to the btree in a bulk operation. If it's ok, mark the index valid. Mode 3: on aminsert, put the new entry on a second btree; leaving the first one untouched. Because the second btree is new, will be small, and writes should be faster. When doing a index scan, read tuples from both at same time (like merge sort). On vacuum, merge the second btree onto the first. On this mode, the index is sorted and there's no need of recheck. Anyone thinks this would be a interesting feature for postgresql? Did I miss something? PD: Maybe it's also possible to take advantage of clustering, and have indexes which entries are range of TIDs; but i'm not sure if this is too exotic, or if it will make a difference. Sincerely, David. How about a hybrid indexing system, with 2 parts: (1) existing index system which is checked first and has been mostly optimised for speed of reading. If there are only a few inserts/updates and the system is not heavily loaded, then it gets modified immediately. The threshold for being too busy, and few enough changes, could be configurable. (2) overflow index optimised for writing. Possible in memory and not backed to permanent storage. A crash would require a complete index rebuild - but only when there were entries in it (or at least more than some configurable threshold, to allow for cases were some missing index entries are acceptable). So where the index is needed for a query, part 1 is checked first, and the part 2 if necessary Have a background process that will move entries from part 2 to part 1, when the systems is less busy. Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Is it possible to have a fast-write Index?
Thanks to everybody for answering. I wasn't expecting this attention; this is a great community :-) Jim asked me about something real. Well, the problem is this showed up more than five years ago, and keeps popping from time to time since in different circumstances. I solved them in different ways each time, depending the exact use-case. I wanted to generalize, because seems a good feature for several situations; and I don't expect a solution for me as each time I hit with this I found some way to sort it out. As Jim said, we need here are figures for real examples, and i don't have yet. I'll do my homework and email back with exact problems with exact timing. Give me a week or two. Also, some of you are talking about IO. Well, it's hard to say without the figures here, but I'm pretty sure I'm hitting CPU time only. We use SSD on those big databases, and also in my tests i tried setting fsync=off. So the problem is: i see a low iowait, and CPU time for one core is at 80-90% most of the time. I can buy more ram, better disks, or cpu's with more cores. But one cpu core would have more-or-less the same speed no matter how much money you invest. When someone wants a delayed-write index is similar to setting synchronous_commit = off. We want to give an OK to the backend as soon as is possible and do this work in background. But we also want some reliability against crashes. Also, if the task is done in background it may be done from other backend, so probably several indexes could be packed at once using different backend processes. We could use the entire cpu if our index writes aren't tied to the session who wrote the row. PD: I'm very interested on existent approaches like GIN or BRIN (this one is new to me). Thanks a lot; i'll try them in my tests.