[HACKERS] Vacuuming big btree indexes without pages with deleted items
Hi all. I have described [0] a problem with delaying replicas after vacuuming a relation with big btree index. It stucks in replaying WAL record of type XLOG_BTREE_VACUUM like that (with lastBlockVacuumed 0): rmgr: Btree len (rec/tot): 20/52, tx: 0, lsn: 4115/56126DC0, prev 4115/56126D90, bkp: , desc: vacuum: rel 1663/16420/16796; blk 31222118, lastBlockVacuumed 0 Master writes this record to xlog in btvacuumscan [1] function after vacuuming of all index pages. And in case of no pages with deleted items xlog record would contain lastBlockVacuumed 0. In btree_xlog_vacuum [2] replica reads all blocks from lastBlockVacuumed to last block of the index while applying this record because there is no api in the buffer manager to understand if the page is unpinned. So if the index is quite big (200+ GB in described case) it takes much time to do it. So the questions are: 1. Aren’t there still any api in buffer manager to understand that the page is not in shared_buffers without reading it? 2. Is it possible not to write to xlog record with lastBlockVacuumed 0 in some cases? For example, in case of not deleting any pages. Or maybe there are some better ways of improving this situation? [0] http://www.postgresql.org/message-id/fe82a9a7-0d52-41b5-a9ed-967f6927c...@simply.name [1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtree.c;hb=refs/heads/REL9_4_STABLE#l813 [2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/nbtree/nbtxlog.c;hb=refs/heads/REL9_4_STABLE#l482 -- May the force be with you… https://simply.name
[HACKERS] Implementing a join algorithm in Postgres
Hi, I have written a C program which reads from 3 files(Each file is table having 2 columns and thousands of rows).The program is used to join those 3 tables and the algorithm which I have written will work only for those 3 files. Now I want to test this program for postgres. Can someone tell me how to do that, where should I include this program so that I can successfully implement that program in postgres. Thank you. -- Regards, K.Ravikiran ᐧ
Re: [HACKERS] WIP Patch for GROUPING SETS phase 1
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 18-03-2015 17:18, Svenne Krap wrote: I still need to check against the standard and I will run it against a non-trivival production load... hopefully I will finish up my review shortly after the weekend... I am still on it, but a little delayed. I hope to get it done this weekend. Svenne -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Providing catalog view to pg_hba.conf file - Patch submission
On Fri, Mar 13, 2015 at 1:33 PM, Peter Eisentraut pete...@gmx.net wrote: On 3/4/15 1:34 AM, Haribabu Kommi wrote: On Wed, Mar 4, 2015 at 12:35 PM, Haribabu Kommi kommi.harib...@gmail.com wrote: + foreach(line, parsed_hba_lines) In the above for loop it is better to add check_for_interrupts to avoid it looping if the parsed_hba_lines are more. Updated patch is attached with the addition of check_for_interrupts in the for loop. I tried out your latest patch. I like that it updates even in running sessions when the file is reloaded. Thanks for the review. Sorry for late reply. The permission checking is faulty, because unprivileged users can execute pg_hba_settings() directly. corrected. Check the error messages against the style guide (especially capitalization). corrected. I don't like that there is a hard-coded limit of 16 options 5 pages away from where it is actually used. That could be done better. changed to 12 instead of 16. I'm not sure about the name pg_hba_settings. Why not just pg_hba or pg_hba_conf if you're going for a representation that is close to the file (as opposed to pg_settings, which is really a lot more abstract than any particular file). changed to pg_hba_conf. I would put the line_number column first. changed. I continue to think that it is a serious mistake to stick special values like 'all' and 'replication' into the arrays without additional decoration. That makes this view useless or at least dangerous for editing tools or tools that want to reassemble the original file. Clearly at least one of those has to be a use case. Otherwise we can just print out the physical lines without interpretation. It is possible to provide more than one keyword for databases or users. Is it fine to use the text array for keyword databases and keyword users. The mask field can go, because address is of type inet, which can represent masks itself. (Or should it be cidr then? Who knows.) The preferred visual representation of masks in pg_hba.conf has been address/mask for a while now, so we should preserve that. Additionally, you can then use the existing inet/cidr operations to do things like checking whether some IP address is contained in an address specification. removed. I can't tell from the documentation what the compare_method field is supposed to do. I see it on the code, but that is not a natural representation of pg_hba.conf. In fact, this just supports my earlier statement. Why are special values in the address field special, but not in the user or database fields? uaImplicitReject is not a user-facing authentication method, so it shouldn't be shown (or showable). removed. I would have expected options to be split into keys and values. All that code to reassemble the options from the parsed struct representation seems crazy to me. Surely, we could save the strings as we parse them? I didn't get this point clearly. Can you explain it a bit more. I can't make sense of the log message pg_hba.conf not reloaded, pg_hba_settings may show stale information. If load_hba() failed, the information isn't stale, is it? In any case, printing this to the server log seems kind of pointless, because someone using the view is presumably doing so because they can't or don't want to read the server log. The proper place might be a warning when the view is actually called. Changed accordingly as if the reload fails, further selects on the view through a warning as view data may not be proper like below. There was some failure in reloading pg_hba.conf file. The pg_hba.conf settings data may contains stale information Here I attached latest patch with the fixes other than keyword columns. I will provide the patch with keyword columns and documentation changes later. Regards, Hari Babu Fujitsu Australia Catalog_view_to_HBA_settings_patch_V8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Pluggable Parser
What is Pluggable Parser: It is an option for users to select a different kind of parser to evaluate PostgreSQL for their business logic without much manual effort. Why do we need? As of now migration from other databases to PostgreSQL requires manual effort of translating SQL PL/SQL to PostgreSQL equivalent SQL queries. Because of this manual effort in converting scripts, migration to PostgreSQL considered to be very slow and sometime de-motivating also. So if we allow to plug different database syntaxes with PostgreSQL, then it will be one of the strong motivating result for many DBAs to try PostgreSQL. How to Do? This can be realized by supporting new SQL/Stored procedures syntaxes in the parser corresponding to each other database syntax and plug the one needs to be evaluated by putting the corresponding so file in dynamic_library_path. Default will be PostgreSQL syntax. Parser Interface: /* Hook for plugins to get control in Parser */ typedef List * (*parser_hook_type) const char *str); parser_hook_type parser_hook = NULL; extern PGDLLIMPORT parser_hook_type parser_hook; Make the parser entry point as function pointer (raw_parser); which can be loaded based on parser type. The parse_hook will be initialized with proper function during shared library loading, which is done only during server startup. By this way it can be ensured as only one parser which is provided by the user is used in the PostgreSQL. Each Database syntax related parser can be implemented as part of contrib module to keep it separate from the core code. To start with, I am planning to (For 2015-06 commitFest): 1. Support infrastructure to allow plugging. 2. Support of ORACLE SQL query syntaxes. Please let me know if community will be interested in this or if there were already any discussion about this in past? Please provide your opinion/suggestion. Thanks and Regards, Kumar Rajeev Rastogi
Re: [HACKERS] WIP: Split of hash index bucket
Antonin Houska a...@cybertec.at wrote: I'm still testing it. especially the concurrent access. There are probably bugs in the code, but it can help understand what I mean. I've traced the most important cases of concurrent insertion into a bucket split of which is in progress. A few related bugs fixed. Some tool to check the index structure is needed yet, before performance testing makes sense. That might include enhancement of contrib/pageinspect. -- Antonin Houska Cybertec Schönig Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c new file mode 100644 index 24b06a5..149bbcf *** a/src/backend/access/hash/hash.c --- b/src/backend/access/hash/hash.c *** loop_top: *** 572,577 --- 572,599 opaque = (HashPageOpaque) PageGetSpecialPointer(page); Assert(opaque-hasho_bucket == cur_bucket); + /* + * If the bucket participates in a split, give up. + * + * (Unlike the metapage copy, the flags at bucket level should + * always be up-to-date.) + * + * TODO + * + * 1. Analyze if both buckets participating in the split impose + * too severe restrictions, and if it makes sense to introduce + * separate flags for old and new bucket. Also, would such a + * restricted VACUUM still make sense? + * + * 2. Consider how statistics should reflect the fact that some + * buckets are skipped because of split. + */ + if (opaque-hasho_flag LH_BUCKET_SPLIT) + { + _hash_relbuf(rel, buf); + break; + } + /* Scan each tuple in page */ maxoffno = PageGetMaxOffsetNumber(page); for (offno = FirstOffsetNumber; diff --git a/src/backend/access/hash/hashinsert.c b/src/backend/access/hash/hashinsert.c new file mode 100644 index 63aaec9..de445c9 *** a/src/backend/access/hash/hashinsert.c --- b/src/backend/access/hash/hashinsert.c *** _hash_doinsert(Relation rel, IndexTuple *** 37,42 --- 37,43 Page page; HashPageOpaque pageopaque; Size itemsz; + uint16 buckets_total; bool do_expand; uint32 hashkey; Bucket bucket; *** _hash_doinsert(Relation rel, IndexTuple *** 123,129 */ BlockNumber nextblkno = pageopaque-hasho_nextblkno; ! if (BlockNumberIsValid(nextblkno)) { /* * ovfl page exists; go get it. if it doesn't have room, we'll --- 124,131 */ BlockNumber nextblkno = pageopaque-hasho_nextblkno; ! if (BlockNumberIsValid(nextblkno) ! !(pageopaque-hasho_flag LH_BUCKET_SPLIT_LAST)) { /* * ovfl page exists; go get it. if it doesn't have room, we'll *** _hash_doinsert(Relation rel, IndexTuple *** 136,142 else { /* ! * we're at the end of the bucket chain and we haven't found a * page with enough room. allocate a new overflow page. */ --- 138,145 else { /* ! * we're at the end of the bucket chain, or (during a split) right ! * before redirection to the old bucket, and we haven't found a * page with enough room. allocate a new overflow page. */ *** _hash_doinsert(Relation rel, IndexTuple *** 151,157 Assert(PageGetFreeSpace(page) = itemsz); } pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); ! Assert(pageopaque-hasho_flag == LH_OVERFLOW_PAGE); Assert(pageopaque-hasho_bucket == bucket); } --- 154,160 Assert(PageGetFreeSpace(page) = itemsz); } pageopaque = (HashPageOpaque) PageGetSpecialPointer(page); ! Assert(pageopaque-hasho_flag LH_OVERFLOW_PAGE); Assert(pageopaque-hasho_bucket == bucket); } *** _hash_doinsert(Relation rel, IndexTuple *** 173,180 metap-hashm_ntuples += 1; /* Make sure this stays in sync with _hash_expandtable() */ ! do_expand = metap-hashm_ntuples ! (double) metap-hashm_ffactor * (metap-hashm_maxbucket + 1); /* Write out the metapage and drop lock, but keep pin */ _hash_chgbufaccess(rel, metabuf, HASH_WRITE, HASH_NOLOCK); --- 176,184 metap-hashm_ntuples += 1; /* Make sure this stays in sync with _hash_expandtable() */ ! buckets_total = metap-hashm_maxbucket + 1 + metap-hashm_split_count; ! do_expand = metap-hashm_split_count HASH_MAX_SPLITS ! metap-hashm_ntuples (double) metap-hashm_ffactor * buckets_total; /* Write out the metapage and drop lock, but keep pin */ _hash_chgbufaccess(rel, metabuf, HASH_WRITE, HASH_NOLOCK); diff --git a/src/backend/access/hash/hashovfl.c b/src/backend/access/hash/hashovfl.c new file mode 100644 index b775164..4345f29 *** a/src/backend/access/hash/hashovfl.c --- b/src/backend/access/hash/hashovfl.c *** *** 21,27 #include utils/rel.h - static Buffer _hash_getovflpage(Relation rel, Buffer metabuf); static uint32 _hash_firstfreebit(uint32
[HACKERS] GSoC 2015: SP-GIST for geometrical objects
Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology Abstract: I chose project Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space. According to the presentation https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: box, circle, polygon with operators:| | @ @ @ | | ~ ~= Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features. Project details: After meeting with Alexander Korotkov, I wrote some plan. Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases. The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space. New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators. After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2). Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure. At the end we have index at geometrical objects and use traversal tree for every search operator. Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree. Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures. Project Schedule: until May 25 Read documentation and source code, clarify details of implementation. 1st month Implement new '_ops' with all geometrical operators for box, circle, polygon 2nd month Research new methods for increase speed of geometrical query 3rd month Final refactoring, testing and submitting a patch. Links: http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects
Re: [HACKERS] Pluggable Parser
Rajeev rastogi rajeev.rast...@huawei.com writes: What is Pluggable Parser: It is an option for users to select a different kind of parser to evaluate PostgreSQL for their business logic without much manual effort. Why do we need? As of now migration from other databases to PostgreSQL requires manual effort of translating SQL PL/SQL to PostgreSQL equivalent SQL queries. Because of this manual effort in converting scripts, migration to PostgreSQL considered to be very slow and sometime de-motivating also. So if we allow to plug different database syntaxes with PostgreSQL, then it will be one of the strong motivating result for many DBAs to try PostgreSQL. While I don't have any strong reason to object to putting a hook where you suggest, I think that the above represents an enormous oversale of the benefits, which in actual fact are likely to be near zero. Replacing gram.y as you suggest will not allow more than the most trivial, cosmetic grammar changes, because you'd still have to produce the same raw parse trees as before. Alternatively you could consider replacing both raw_parser() and parse_analyze(), but then you're talking about maintaining a duplicate copy of just about the whole of src/backend/parser/, which doesn't sound terribly practical; it certainly would be unpleasant to maintain such a thing across multiple PG releases. And there will still be pretty strong constraints on whether you could implement say a MySQL-workalike, because you're still having to work with the Postgres execution engine. In short, this proposal sounds a lot like a solution looking for a problem. I think it would be useful for you to pick a small number of concrete alternative-grammar problems and think about how you could provide hooks that would allow solving those issues without duplicating 90% of src/backend/parser/ first. Here is an example of such a concrete problem: Oracle, I believe, uses parentheses () rather than brackets [] for array subscripting expressions. How would you allow that notation in Postgres? Your original proposal cannot do it because it's not a problem of the raw grammar, but rather one for parse analysis --- you have to disambiguate array references from function calls at the time of doing semantic analysis. Another example you might consider is Oracle-style outer join notation, which again will require significant surgery in parse analysis, not just modifying the raw grammar. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bug fix for missing years in make_date()
2015-03-26 23:26 GMT+01:00 David Fetter da...@fetter.org: Folks, For reasons unclear, dates before the Common Era are disallowed in make_date(), even though about 2/3 of the underlying data type's range up until the present time fits that description. Please find attached a patch fixing same. +1 Pavel Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015: SP-GIST for geometrical objects
On Mar 27, 2015 11:08 AM, Dima Ivanovskiy dima...@mail.ru wrote: Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology Abstract: I chose project Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space. According to the presentation https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: box, circle, polygon with operators:| | @ @ @ | | ~ ~= Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features. Project details: After meeting with Alexander Korotkov, I wrote some plan. Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases. The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space. New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators. After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2). Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure. At the end we have index at geometrical objects and use traversal tree for every search operator. Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree. Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures. Project Schedule: until May 25 Read documentation and source code, clarify details of implementation. 1st month Implement new '_ops' with all geometrical operators for box, circle, polygon 2nd month Research new methods for increase speed of geometrical query 3rd month Final refactoring, testing and submitting a patch. Links: http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects Nice proposal. Dynamic Kdtrees can perform badly as the splitting median can get way off as updates are coming. What are your thoughts about that? Also what's up with the 4d space? I don't quite get it. These types are 2 or 3 dimensions.
[HACKERS] Doubt about AccessExclusiveLock in ALTER TABLE .. SET ( .. );
Hi all, I'm tweaking some autovacuum settings in a table with high write usage but with ALTER TABLE .. SET ( .. ) this task was impossible, so I did a catalog update (pg_class) to change reloptions. Maybe it's a stupid doubt, but why we need to get an AccessExclusiveLock on relation to set reloptions if we just touch in pg_class tuples (RowExclusiveLock) ? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] GSoC 2015 proposal. Bitmap Index-only Count
On Wed, Mar 25, 2015 at 11:07 PM, Tom Lane t...@sss.pgh.pa.us wrote: Anastasia Lubennikova lubennikov...@gmail.com writes: 2015-03-24 18:01 GMT+04:00 Tom Lane t...@sss.pgh.pa.us: I wonder whether it'd be possible to teach GIN to support index_getnext instead. Initially it would probably work only for cases where the index didn't have to return any columns ... but if we did it, maybe the door would be open to cases where GIN could reconstruct actual values. Another idea is to write index_getnext() for GIN which would return some fake tuple. Since there is no difference for COUNT aggregate what the tuple contains. COUNT just wants to know whether we have tuple that satisfy the qual. Well, yeah, that would be the idea (at least initially). You don't have to return any real data unless you claim you can do so via amcanreturn. The planner is still capable of selecting an index-only scan as long as the query retrieves no columns. The trick would be to not return the same heap TID more than once per scan. A zero-order implementation would be to construct the same bitmap we do now and then just provide a gingetnext function that scans through that. That would be pretty awful in terms of scan startup time, so doing better would be nice; but perhaps it would be useful even in that form. My ideal picture for FTS using GIN looks like this: 1) Have lexemes offsets in GIN stored with item pointers. 2) Calculate relevance using only GIN information without using heap. 3) Sort results by relevance in either GIN itself or executor node. 4) Get both TOP-N most relevant rows and total rows count (using index-only scan) from single GIN index scan. Implementing index_getnext() for GIN looks step forward for me because it allows index only count and potentially could be used for ordered output. However, it's unclear for me if it's feasible to have #4? Could we return TOP-N results and total count from single GIN index scan? -- With best regards, Alexander Korotkov.
[HACKERS] Re[2]: [HACKERS] GSoC 2015: SP-GIST for geometrical objects
On Mar 27, 2015 11:08 AM, Dima Ivanovskiy dima...@mail.ru wrote: Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology Abstract: I chose project Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space. According to the presentation https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: box, circle, polygon with operators:| | @ @ @ | | ~ ~= Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features. Project details: After meeting with Alexander Korotkov, I wrote some plan. Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases. The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space. New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators. After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2). Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure. At the end we have index at geometrical objects and use traversal tree for every search operator. Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree. Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures. Project Schedule: until May 25 Read documentation and source code, clarify details of implementation. 1st month Implement new '_ops' with all geometrical operators for box, circle, polygon 2nd month Research new methods for increase speed of geometrical query 3rd month Final refactoring, testing and submitting a patch. Links: http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects Nice proposal. Dynamic Kdtrees can perform badly as the splitting median can get way off as updates are coming. What are your thoughts about that? Also what's up with the 4d space? I don't quite get it. These types are 2 or 3 dimensions. I read spgist README one more time . I didn't find the mechanism for maintaining good balance after updates. I think we can use Bkd-Tree, https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf . But It can be not the best solving. I include Research time in 2nd month of timeline. About 4d space. All these types are 2 dimensional. Just as i n R-tree object is approximated by MBR. MBR for 2d-objects can be mapped to 4d-point. More general, nd-object MBR can be mapped into 2nd-point.
[HACKERS] Rounding to even for numeric data type
Hi all, A couple of days ago a bug has showed up regarding rounding of float here: http://www.postgresql.org/message-id/flat/20150320194337.2573.72...@wrigleys.postgresql.org#20150320194337.2573.72...@wrigleys.postgresql.org The result being that the version of rint() shipped in src/port was not IEEE compliant when rounding to even (MSVC 2013 at least using it), leading to inconsistent results depending on if the platform uses src/port's rint() or the platform's one. During this thread, Tom has raised as well that rounding for numeric is not that IEEE-compliant: http://www.postgresql.org/message-id/22366.1427313...@sss.pgh.pa.us For example: =# SELECT round(2.5::numeric), round(1.5::numeric), round(0.5::numeric), round(-2.5::numeric); round | round | round | round ---+---+---+--- 3 | 2 | 1 |-3 (1 row) =# SELECT round(2.5::float), round(1.5::float), round(0.5::float), round(-2.5::float); round | round | round | round ---+---+---+--- 2 | 2 | 0 |-2 (1 row) It sounds appealing to switch the default behavior to something that is more IEEE-compliant, and not only for scale == 0. Now one can argue as well that changing the default is risky for existing applications, or the other way around that other RDBMs (?) are more compliant than us for their equivalent numeric data type, and people get confused when switching to Postgres. An idea, from Dean, would be to have a new specific version for round() able to do compliant IEEE rounding to even as well... Opinions? Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans with btree_gist
On 03/26/2015 10:31 PM, Heikki Linnakangas wrote: I've pushed Anastasia's patch to support index-only scans with GiST, and it's time to add opclasses support for all the opclasses that are not lossy. I think at least all the btree_gist opclasses need to be supported, it would be pretty surprising if they didn't support index-only scans, while some more complex opclasses did. Attached is a WIP patch for that. It covers all the varlen types that btree_gist supports, and int2, int4 and oid. The rest of the fixed-width types should be just a matter of copy-pasting. I'll continue adding those, but wanted to let people know I'm working on this. Would it also be worth doing the same for the inet_ops class for inet/cidr? I have hacked a quick WIP patch which I believe should work, but have not looked into the index only scan code enough to be sure. -- Andreas Karlsson diff --git a/src/backend/utils/adt/network_gist.c b/src/backend/utils/adt/network_gist.c index 14dd62b..0133032 100644 --- a/src/backend/utils/adt/network_gist.c +++ b/src/backend/utils/adt/network_gist.c @@ -105,6 +105,27 @@ typedef struct GistInetKey #define SET_GK_VARSIZE(dst) \ SET_VARSIZE_SHORT(dst, offsetof(GistInetKey, ipaddr) + gk_ip_addrsize(dst)) +Datum +inet_gist_fetch(PG_FUNCTION_ARGS) +{ + GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); + GistInetKey *key = DatumGetInetKeyP(entry-key); + GISTENTRY *retval; + inet *dst; + + dst = (inet *) palloc0(sizeof(inet)); + + ip_family(dst) = gk_ip_family(key); + ip_bits(dst) = gk_ip_minbits(key); + memcpy(ip_addr(dst), gk_ip_addr(key), ip_addrsize(dst)); + SET_INET_VARSIZE(dst); + + retval = palloc(sizeof(GISTENTRY)); + gistentryinit(*retval, InetPGetDatum(dst), entry-rel, entry-page, + entry-offset, FALSE); + + PG_RETURN_POINTER(retval); +} /* * The GiST query consistency check diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h index 612a9d2..3b87f90 100644 --- a/src/include/catalog/pg_amproc.h +++ b/src/include/catalog/pg_amproc.h @@ -411,6 +411,7 @@ DATA(insert ( 3550 869 869 4 3556 )); DATA(insert ( 3550 869 869 5 3557 )); DATA(insert ( 3550 869 869 6 3558 )); DATA(insert ( 3550 869 869 7 3559 )); +DATA(insert ( 3550 869 869 9 3573 )); /* sp-gist */ DATA(insert ( 3474 3831 3831 1 3469 )); diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index 77b7717..a96d369 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -2240,6 +2240,8 @@ DATA(insert OID = 3555 ( inet_gist_compress PGNSP PGUID 12 1 0 0 0 f f f f t f DESCR(GiST support); DATA(insert OID = 3556 ( inet_gist_decompress PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 2281 _null_ _null_ _null_ _null_ inet_gist_decompress _null_ _null_ _null_ )); DESCR(GiST support); +DATA(insert OID = 3573 ( inet_gist_fetch PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2281 2281 _null_ _null_ _null_ _null_ inet_gist_fetch _null_ _null_ _null_ )); +DESCR(GiST support); DATA(insert OID = 3557 ( inet_gist_penalty PGNSP PGUID 12 1 0 0 0 f f f f t f i 3 0 2281 2281 2281 2281 _null_ _null_ _null_ _null_ inet_gist_penalty _null_ _null_ _null_ )); DESCR(GiST support); DATA(insert OID = 3558 ( inet_gist_picksplit PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 2281 2281 2281 _null_ _null_ _null_ _null_ inet_gist_picksplit _null_ _null_ _null_ )); diff --git a/src/include/utils/inet.h b/src/include/utils/inet.h index 6694688..2d2bae4 100644 --- a/src/include/utils/inet.h +++ b/src/include/utils/inet.h @@ -123,6 +123,7 @@ extern int bitncommon(const unsigned char *l, const unsigned char *r, int n); /* * GiST support functions in network_gist.c */ +extern Datum inet_gist_fetch(PG_FUNCTION_ARGS); extern Datum inet_gist_consistent(PG_FUNCTION_ARGS); extern Datum inet_gist_union(PG_FUNCTION_ARGS); extern Datum inet_gist_compress(PG_FUNCTION_ARGS); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rounding to even for numeric data type
Michael Paquier michael.paqu...@gmail.com writes: It sounds appealing to switch the default behavior to something that is more IEEE-compliant, and not only for scale == 0. Now one can argue as well that changing the default is risky for existing applications, or the other way around that other RDBMs (?) are more compliant than us for their equivalent numeric data type, and people get confused when switching to Postgres. An idea, from Dean, would be to have a new specific version for round() able to do compliant IEEE rounding to even as well... I think confining the change to round() would be a fundamental error. The main reason why round-to-nearest-even is IEEE standard is that it reduces error accumulation over long chains of calculations, such as in numeric's power and trig functions; if we go to the trouble of implementing such a behavior, we certainly want to use it there. I think the concern over backwards compatibility here is probably overblown; but if we're sufficiently worried about it, a possible compromise is to invent a numeric_rounding_mode GUC, so that people could get back the old behavior if they really care. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GSoC 2015: SP-GIST for geometrical objects
On Mar 27, 2015 6:41 PM, Dima Ivanovskiy dima...@mail.ru wrote: On Mar 27, 2015 11:08 AM, Dima Ivanovskiy dima...@mail.ru wrote: Hello, I am Dmitrii, student of Moscow Institute of Physics and Technology Abstract: I chose project Indexing prolonged geometrical objects (i.e. boxes, circles, polygons, not points) with SP-GiST by mapping to 4d-space. According to the presentation https://www.pgcon.org/2011/schedule/attachments/197_pgcon-2011.pdf SP-GIST 3 times faster than GiST in some cases. But GIST supports geometrical data types: box, circle, polygon with operators:| | @ @ @ | | ~ ~= Popular spatial extension PostGIS doesn't include SP-GIST, but has a lot of geometrical features. Project details: After meeting with Alexander Korotkov, I wrote some plan. Using of K-D-tree and Quadtree in building index for geometrical data types can increase speed of search in some cases. The main idea is representing 2-D geometrical objects in their bounding box. Set of 2-D boxes is 4-D space. New _ops will work with points from 4-D space, for example kd_box_ops, quad_circle_ops and will support all geometrical operators. After conversion object to their bounding box algo has set of tuples (x1, y1, x2, y2). Our goal is separate this space the most equally. If we talk about K-D-tree, on first step K-D-tree algorithm will split space in 2 parts by the first coordinate, in next step by the second coordinate etc., after 4-th coordinate we repeat this procedure. At the end we have index at geometrical objects and use traversal tree for every search operator. Postgresql has already has realization ideas of MBR in gist/gistproc.c. So I will transfer this realization to other type of tree. Of cource, I assume that SP-GIST can be not the best decision of this problem. So after testing this clear methods, I will try to find more effective way. Maybe with using combination of different spatial tree structures. Project Schedule: until May 25 Read documentation and source code, clarify details of implementation. 1st month Implement new '_ops' with all geometrical operators for box, circle, polygon 2nd month Research new methods for increase speed of geometrical query 3rd month Final refactoring, testing and submitting a patch. Links: http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html - about GIST https://toster.ru/q/27135#answer_110197 - people need SP-GIST for cubes http://www.slideshare.net/profyclub_ru/o-lt - presentation about indexes http://pgconf.ru/static/presentations/2015/korotkov_spatial.pdf - working with geo objects Nice proposal. Dynamic Kdtrees can perform badly as the splitting median can get way off as updates are coming. What are your thoughts about that? Also what's up with the 4d space? I don't quite get it. These types are 2 or 3 dimensions. I read spgist README one more time. I didn't find the mechanism for maintaining good balance after updates. I think we can use Bkd-Tree, https://www.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf. But It can be not the best solving. I include Research time in 2nd month of timeline. About 4d space. All these types are 2 dimensional. Just as in R-tree object is approximated by MBR. MBR for 2d-objects can be mapped to 4d-point. More general, nd-object MBR can be mapped into 2nd-point. The reason I said that is because you pointed performance as one motivation factor (and the lack of balancing properties can degrade performance really fast on larger indexes). The Bkd variant seems interesting but I don't think spgist provides enough abstraction to implement it. A bounding box can still be inserted/queried with a 2d kdtree so I don't know why you call it 4d. I assume it's a matter of naming. Overall the proposal seems totally doable, although I have doubts about its usefulness. But I'm no one... You should have some feedback from the mentors soon.
Re: [HACKERS] Rounding to even for numeric data type
Tom == Tom Lane t...@sss.pgh.pa.us writes: Tom I think the concern over backwards compatibility here is probably Tom overblown; but if we're sufficiently worried about it, a possible Tom compromise is to invent a numeric_rounding_mode GUC, so that Tom people could get back the old behavior if they really care. I only see one issue with this, but it's a nasty one: do we really want to make all numeric operations that might do rounding stable rather than immutable? -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers