Re: [HACKERS] Schedule for 8.1 feature freeze
I'll do my best to submit bitmap index AM patch next week for your review. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 8: explain analyze is your friend
[HACKERS] make distclean keeps some files
Playing with diffs, I've noticed, that after `make distclean` command some files are still present in the source tree. They're were not there before ./configure make. They are: src/backend/bootstrap/bootparse.c src/backend/bootstrap/bootscanner.c src/backend/bootstrap/bootstrap_tokens.h src/backend/parser/gram.c src/backend/parser/parse.h src/backend/parser/scan.c src/backend/utils/misc/guc-file.c src/bin/psql/psqlscan.c src/bin/psql/sql_help.h src/interfaces/ecpg/preproc/pgc.c src/interfaces/ecpg/preproc/preproc.c src/interfaces/ecpg/preproc/preproc.h src/interfaces/libpq/blibpqdll.def src/interfaces/libpq/libpqddll.def src/interfaces/libpq/libpqdll.def src/interfaces/libpq/libpq.rc src/pl/plpgsql/src/pl_gram.c src/pl/plpgsql/src/pl_scan.c src/pl/plpgsql/src/pl.tab.h Are they kept intentionally? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] adding new pages bulky way
* Tom Lane [EMAIL PROTECTED] [07.06.2005 07:59]: Why bother? Just write each page when you need to --- there's no law that says you must use P_NEW. This means 2 things: 1) I cannot mix P_NEW and exact-number ReadBuffer() calls; 2) thus, I have to track next-block-number myself. Is it so? BTW, are there any differences in buffer seeking speed, if buffer block-numbers are mixed and if they're not (i.e. P_NEW is used)? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] adding new pages bulky way
* Alvaro Herrera [EMAIL PROTECTED] [08.06.2005 00:39]: Huh, why? You need to grab the relation extension block (LockRelationForExtension in CVS tip). Really? Didn't knew that. Consider: 1) I add 2 pages to the newly-created relation using P_NEW as BlockNumber; 2) then I do LockRelationForExtension; ReadBuffer(135) and UnockRelationForExtension. What BlockNumber will be assigned to the buffer, if I call ReadBuffer(P_NEW) now? 136? BTW, is it OK to say BlockNumber is assigned to buffer? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] adding new pages bulky way
I need your advice. For on-disk bitmap I run a list of TIDs. TIDs are stored in pages as an array, page's opaque data holds an array of bits, indicating whether corresponding TID has been deleted and should be skipped during the scan. Pages, that contain TIDs list, are organized in extents, each extent has 2^N pages, where N is extent's number (i.e. 2nd extent will occupy 4 pages). Given that I know number of TIDs, that fit into one page, and the TID's sequential number, I can easily calculate: - extent number TID belongs to; - page offset inside that extent, and; - TID place in the page. At the moment, I store BlockNumber of the extent's first page in the metapage and allocate all pages that belongs to that extent sequentially. I need to do so to minimize number of page reads when searching for the TID in the list; I'll need to read 1 page at most to find out TID at given position during the scan. I hope you understood the idea. This also means, that while extent's pages are being added this way, no other pages can be added to the index. And the higher is extent's number, the more time it'll take to allocate all pages. The question is: allocating pages this way is really ugly, I understand. Is there some API that would allow allocating N pages in the bulk way? Maybe this is a know problem, that has been already solved before? Any other ideas? Thanks in advance! -- Victor Y. Yegorov ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Best way to scan on-disk bitmaps
Greetings. I have questions on how to implement on-disk bitmap scan. I've been working on on-disk bitmaps as an ordinary Index Access Method, but now it's clear to me, that it'll lose all it's strengths this way. One on-disk bitmap has exactly one list of indexed table's TIDs and potentially unlimited number of bitmaps (number of index attributes multiplied by attribute's cardinality, to be precise). So, for better performance, one should first retrieve all the needed bitmaps from the index, then join all bitmaps according to the query clauses and, as the last phase, retrieve TIDs from index, that matches final bitmap. According to the docs the index access method is responsible for regurgitating the TIDs ..., but for on-disk bitmaps index scan is devided into 3 phases. So, to perform the scan in phases, to my mind, executor should be involved. (I'd like to mention again, that this is the first time I got so deep inside postgres code). I wanted to use Tom's nodeBitmap* stuff, but it's problematic a bit. Bitmaps in nodeBitmap* are built upon list of TIDs retrieved during relation scan. For on-disk bitmap indexes, there's no need for that, as all bitmaps are already inside the index. The question is: Is it possible to extend nodeBitmap functionality in such a way, that Executor can either build bitmap after list of TIDs, obtained from RelationScan, or ask index access method to give bitmaps it contain (and TIDs at given position in the map later)? This will, probably, require more functions in the pg_am catalog. Or should I create a completely new node for on-disk bitmaps? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Best way to scan on-disk bitmaps
* Tom Lane [EMAIL PROTECTED] [12.05.2005 23:09]: 1. Be sure that all the indexable WHERE conditions are passed to the indexscan as indexquals. This might be, say, WHERE a = 42 and b = 'foo' If I have on-disk bitmap ON (a, b, c) will the planner pick an index scan for WHERE a = 42 AND b = 'foo' (i.e. only part of the index attributes are involved)? Any modifications needed to achieve this functionality? To my mind, bitmap scan even for 1 attribute of a multi-column index would be a win, though I haven't tested this yet. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Comparing Datum's at aminsert() stage
* Tom Lane [EMAIL PROTECTED] [19.04.2005 19:48]: That's probably what you *have* to use, since the normal deconstructors assume they are working with heap tuples, which are different. But I don't understand why you are waiting till after the index tuple is formed. The aminsert function gets an array of Datums to start with. Why not do it there? Well, I need that exactly in aminsert. Each value is stored only once in the index (along with it's own series-of-bits). Thus, I need to compare each Datum from aminsert()'s array with the existing ones. Also, I cannot form tuple the ordinary way (I need all values separated), so I copy each TuplDesc-attrs[i] into temporary TupleDesc (1 attribute big) and call heap_fill_tuple(). Actually, I'm not sure this is the right way... I think, storing some kind of hash-value from the Datum is a good idea, but it's need to be unique. Is it possible with any existing API? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Comparing Datum's at aminsert() stage
Hi! For bitmaps, I need to search each value being inserted (for each column of the index) in the list of already existing values (stored in index's header area). To do that I need: 1) create Datum from PageItem's value (I store each value in it's own PageItem); 2) compare newly inserted Datum with on-disk existing one. For hash access method (maybe others too, haven't checked), this is done via index_keytest() function. But it uses ScanKey, there's no such at aminsert() stage. So, I'd like to ask -- what is the reverse function for heap_fill_tuple(), is it OK to use index_getattr()? And how do I compare 2 Datums? I need FmgrInfo pointer for the equality operator of the corresponding data type. Are there any API calls to obtain one? Thank you. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] bitmap AM design
Some more thoughts and questions. The main idea above bitmaps is narrowing some data sets' possible values to yes and no (i.e. 1 and 0) and then organize the data in the series of bits, where bit's position determines values to consider. In the cases, where several indexed attributes are used in WHERE clause, it's possible to do logical AND/OR on bitmaps before returning any results to the caller. For large tables with high number of low-cardinality attributes using bitmaps can result in certain speed-up. For on-disk bitmaps I'm working on, each value of each indexed attribute has it's own bitmap (i.e. series of bits, with bits set to 1 for rows with corresponding fields having value of that bitmap). Scanning the bitmap, we end up with an array of 1-bits' positions and need to convert those positions to CTIDs, as executor is expecting. So, index should also keep a CTID table, that corresponds to the bitmap's data. This CTID table will be the same for all bitmap indexes, created for one table, thus having 2 bitmap indexes will mean you're wasting some amount of disk space, storing absolutely identical data. So, to save space, we have 2 possibilities: 1) create a new relkind for the CTID table (maybe used not only for on-disk bitmaps); 2) make all create index ... using bitmap statements actually create/extend existing bitmap index. This also implies, that planner/executor should try using multicolumn bitmap index when at least one indexed field is present in the WHERE clause. I'm working on the 2nd case, because 1st one requires more work not only in the access method + executor + planner area. It is also possible to keep things as is and make a note in the documentation, that it is better to have 1 multicolumn bitmap index, then several single column ones, and that planner will still use multicolumn index even if not all columns are involved. Any comments/ideas here? After implementing bitmap index access method, it'll be necessary to teach planner and executor to use multicolumn bitmaps for any number of scan-attributes. Also, I cannot say in what circumstances planner should prefer bitmap scan to seqscan; I thought of cases, when it estimates return set being about 60% of the relation. What community has to say here? Also, as Tom is planning to work on in-memory bitmaps (maybe something is done already, don't know), I thought that it can be possible to cooperate. As I have no clue at the moment how in-memory bitmaps are going to work, is it possible to hear from you some draft of the forthcoming implementation? And what prerequisites would be required to join both bitmaps somehow? Waiting for your thoughts/comments. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] bitmap AM design
* Tom Lane [EMAIL PROTECTED] [01.03.2005 09:37]: The other stuff you mentioned implies that an insertion therefore requires exclusive lock on the index (because you may have to relocate everything in sight to add one more CTID slot). No, exclusive lock on index is worst thing to do. All lists (list of ctids, bitmaps) will only grow, no data will be deleted, as deletes will require relocation and possibly exclusive lock on the index. Extending lists will need only a short-term exclusive locks on the pages in the tails of each list. I can't believe you are seriously suggesting that it's OK to force every VACUUM to rebuild the index from scratch. We already get far too many complaints about the time needed for VACUUM. I don't think we really need any more fundamentally nonconcurrent index types :-( Well, I misunderstood the purpose of ambulkdelete function, my fault. Of course, no index rebuild will take place, instead, only flags in the list of CTIDs will be updated for deleted tuples. I have counter question for you: you've mentioned, that you want bitmaps in the 8.1. What kind of bitmaps you were speaking about? On-disk bitmaps (this is how I call new index access method I'm working on) or in-memory bitmaps, as in here http://archives.postgresql.org/pgsql-hackers/2005-01/msg01001.php Thanks for your reply. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] bitmap AM design
* Tom Lane [EMAIL PROTECTED] [01.03.2005 21:07]: Hmm, you seem to be envisioning that these are actually lists, not arrays --- that is, to find the N'th page in a list requires traversing list links starting at the first page. That doesn't sound workable. If you can't access all the entries in roughly constant time then the thing is going to have problems with big tables. Bitmaps are arays, you're right. But they are only accessed either when data is inserted and gets added to the end (there're direct pointers to the last page in each bitmap in the list of values), or during index scan. Scanning index implies visiting all pages that forms the bitmap. After scanning the bitmaps (and performing all logical operations as requested), we end up with bit positions and need to return CTID from the list, that resides in the given position in the list-of-CTIDs (yes, it's actually one huge array, that occupies many pages). List-of-CTIDs is organized into extents, each extent contains a known number of pages and all pages for the extent are allocated sequentially. ID of the first page of each extent is stored in the metapage. Also, it is known at compile time how many CTIDs can be stored into one page. Given all that, it is possible to compute ID of the page and CTID offset inside that page by CTID offset, obtained at bitmap scan step. Each extent has 2,4,8,16,32,etc. pages, so the metapage has enough space to store an array of ID's for the first page of each extent. Updating list-of-CTIDs is also cheap operation, as direct reference to the last page in the list-of-CTIDs is stored in metapage. The only list, that have drawbacks you've mentioned, is list-of-values. But, as it contains only attributes' related data and pointers to start page of corresponding bitmap, there won't be many pages in this list. Maybe, there are some more insufficiencies I've missed? -- Victor Y. Yegorov ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[HACKERS] bitmap AM design
Here's the design of bitmap AM I'm planning to implement. I've discussed it with Neil, but I'm willing to get more feedback on it. There are going to be 1 metapage, 1 list of CTIDs (LOC), one list of attribute values (LOV, including attributes for multi-column indexes) and a bitmap for each entry in the LOV. Metapage will contain pointers to the LOC, LOV will always start at page-1 and is organized as a 1-way chained list. Neil suggested a very good way how to handle updates. Of course, it's not necessary to strictly follow tuple layout in the table's heap, as I wanted to do initially. All that's needed, is that bit positions in bitmaps would be tied with CTID positions in LOC. So, we can do simple insert (generally, that's how MVCC works for tuple updates): when some tuple is updated, new CTID is added to the LOC and each bitmap is extended by 1 bit. On the other side (as Neil pointed), after VACUUM, we need to do some maintenance of bitmap index to avoid situations when index contains duplicate entries (in LOC and bitmaps) for the same CTID (value before marking tuple for reuse and after actually reusing it). So far, the fastest way would be recreating index, because the whole index designed for faster search/insert operations and lacks effective reverse mapping (CTID-bit position-bit value) functionality. List of CTIDs is organized into an array of extents. Each extent has 2**i pages ('i' is extent number in array). All pages for extent are allocated at once, ensuring their IDs are sequential. So, we need to store only BufferNumber of the first page in extent. LOC pages contain array of ItemPointerData, it's size is detected at compile time. So, CTID for given bit position can be obtained via only one ReadBuffer() call. LOV pages store arrays of value-descriptors, each descriptor has a pointer to the head of value's bitmap. Bitmaps are organized as 1-way chained list, bitmap contents is compressed using Word-Aligned Hybryd method (similar to RLE). Neil suggested the other way of organizing bitmap storage: all bits for given position are stored in one page (if it lacks space, new page is added at the bootom), so we'll have a 2-dimensional bitmap storage. This reduces amount of page reads during index scan, but for low-cardinality indexes, we'll waste a lot of space per page, if each CTIDs slice is stored in one page. On the other hand, it'll be hard to extend bitmap if we'll store several CTID slices per page and some new value will be inserted (i.e. CTID slice needs to be increased). At the moment, I would implement the easiest method -- equality encoding (as it's called in papers, bitmap per value). Neil's suggested techniques are called range encoding. Josh Berkus on the irc suggested implementing several encoding schemes as an option to the create index sql command. What do you think? I haven't looked at planner/executor yet. If you have some questions, please, ask. Also, I'd like to tell that this is my first time coding for PostgreSQL, thus I may use incorrect terminology. Also, it takes much time to write anything, for the same reason. And yes, I would really appreciate all kind of criticism on this design. I've started to implement AM, I need to register am* functions, what OIDs should use to register them in include/catalog/pg_proc.h? Waiting for your feedback. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] Implementing Bitmap Indexes
Hello. I'd like to implement bitmap indexes and want your comments. Here is an essence of what I've found regarding bitmaps for the last month. Consider the following table So, the bitmap for attribute A will be the with 1 attribute A(int2): following: # | A Val | Bitmap(s) +--- -+--- 1 | 1 1 | 11011001 0111 2 | 1 2 | 00100100 1000 3 | 2 3 | 0010 4 | 1 5 | 1 6 | 2 7 | 3 8 | 1 9 | 2 10 | 1 11 | 1 12 | 1 Some points: 1) If some new value will be inserted (say, 4) at some point of time, a new bitmap for it will be added. Same for NULLs (if atrribute has no NOT NULL contraint) --- one more bitmap. Or should we restrict NOT NULL for bitmap'ed attributes?; 2) Queries, like where A = 1 or where A != 2 will require only 1 scan of the index, while where A 3 will require 2 stages: 1st create a list of values lesser then 3, 2nd --- do OR of all bitmaps for that values. For high cardinality attributes, this can take a lot of time; 3) Each bitmap is only a bitmap, so there should be an array of corresponding ctids pointers. Maybe, some more arrays (pages, don't know). For 2)nd --- there are techniques, allowing better performance for A 3 queries via increased storage space (see here for details: http://delab.csd.auth.gr/papers/ADBIS03mmnm.pdf) and increased reaction time for simple queries. I don't know, if they should be implemented, may later. The most tricky part will be combinig multiple index scans on several attributes --- as Neil Conway said on #postrgesql, this will be tricky, as some modifications will be needed in the index scan api. I remember, Tom Lane suggested on-disk bitmaps --- implementing bitmap index access method would be of much use not only for bitmap indexes, I think. WAH compressing method should be used for bitmaps (to my mind). Also, there is a method of reordering heap tuples for better compression of bitmaps, I thought it may be possible to implement it as some option to the existing CLUSTER command, papers: WAH: http://www-library.lbl.gov/docs/LBNL/496/26/PDF/LBNL-49626.pdf CLUSTER: http://www.cse.ohio-state.edu/~hakan/publications/reordering.pdf I'd like to hear from you, before starting to do something. -- Victor ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Implementing Bitmap Indexes
* Tom Lane [EMAIL PROTECTED] [29.01.2005 18:24]: Victor Y. Yegorov [EMAIL PROTECTED] writes: I remember, Tom Lane suggested on-disk bitmaps I have suggested no such thing, and in fact believe that the sort of index structure you are proposing would be of very little use. Why? I thought they would be useful for data warehouse databases. Maybe I said something the wrong way, but what I'm trying to implement is exactly what is said about in the first link you've posted below: http://archives.postgresql.org/pgsql-hackers/2004-10/msg00439.php Or am I misunderstanding the point? What I've been hoping to look into is *in memory* bitmaps used as an interface between index scans and the subsequent heap lookups. Sorry, that was what I've been speaking of. Anyway, bitmap indexes API could be used for in-memory bitmaps you're speaking of. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Implementing Bitmap Indexes
* Pawe Niewiadomski [EMAIL PROTECTED] [29.01.2005 17:45]: I'd like to implement bitmap indexes and want your comments. Here is an essence of what I've found regarding bitmaps for the last month. Do you think it would be possible to work on it as a team? Yes, why not. But everything depends on the community, may bitmaps will be realized as a contrib or pgfoundry module. The only thing --- I don't know, if that is possible for indexes. -- Victor Y. Yegorov ---(end of broadcast)--- TIP 8: explain analyze is your friend