Re: [HACKERS] Index-only scans for GiST.
On 03/02/2015 12:43 AM, Heikki Linnakangas wrote: On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote: I add MemoryContext listCxt to avoid memory leak. listCxt is created once in gistrescan (only for index-only scan plan ) and reseted when scan of the leaf page is finished. I do not sure if the problem was completely solved, so I wait for feedback. Yeah, I think that solves it. On further testing, there was actually still a leak with kNN-searches. Fixed. I spent a little time cleaning this up. I reverted that pageData change so that it's an array again, put back the gist_indexonly.sql and expected output files that were missing from your latest version, removed a couple of unused local variables that gcc complained about. I refactored gistFetchTuple a bit, because it was doing IMHO quite bogus things with NULLs. It was passing NULLs to the opclass' fetch function, but it didn't pass the isNull flag correctly. I changed it so that the fetch function is not called at all for NULLs. I think this is pretty close to being committable. I'll make a round of editorializing over the docs, and the code comments as well. The opr_sanity regression test is failing, there's apparently something wrong with the pg_proc entries of the *canreturn functions. I haven't looked into that yet; could you fix that? I have pushed this, after fixing the opr_sanity failure, some bug fixes, and documentation and comment cleanup. Thanks for the patch! - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GiST.
On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote: I add MemoryContext listCxt to avoid memory leak. listCxt is created once in gistrescan (only for index-only scan plan ) and reseted when scan of the leaf page is finished. I do not sure if the problem was completely solved, so I wait for feedback. Yeah, I think that solves it. * What's the reason for turning GISTScanOpaqueData.pageData from an array to a List? This array is field of structure GISTScanOpaqueData. Memory for that structure allocated in function gistbeginscan(). The array is static so it's declared only one time in structure: GISTSearchHeapItem pageData [BLCKSZ/sizeof(IndexTupleData)] But how could we know size of array if we don't know what data would be returned? I mean type and amount. You're only adding a pointer to the IndexTuple to GISTSearchHeapItem. The GISTSearchHeapItem struct itself is still of constant size. I spent a little time cleaning this up. I reverted that pageData change so that it's an array again, put back the gist_indexonly.sql and expected output files that were missing from your latest version, removed a couple of unused local variables that gcc complained about. I refactored gistFetchTuple a bit, because it was doing IMHO quite bogus things with NULLs. It was passing NULLs to the opclass' fetch function, but it didn't pass the isNull flag correctly. I changed it so that the fetch function is not called at all for NULLs. I think this is pretty close to being committable. I'll make a round of editorializing over the docs, and the code comments as well. The opr_sanity regression test is failing, there's apparently something wrong with the pg_proc entries of the *canreturn functions. I haven't looked into that yet; could you fix that? - Heikki diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index db2a452..53750da 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -1404,6 +1404,14 @@ initGISTstate(Relation index) else giststate-distanceFn[i].fn_oid = InvalidOid; + /* opclasses are not required to provide a Fetch method */ + if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC))) + fmgr_info_copy((giststate-fetchFn[i]), + index_getprocinfo(index, i + 1, GIST_FETCH_PROC), + scanCxt); + else + giststate-fetchFn[i].fn_oid = InvalidOid; + /* * If the index column has a specified collation, we should honor that * while doing comparisons. However, we may have a collatable storage @@ -1426,6 +1434,22 @@ initGISTstate(Relation index) return giststate; } +/* + * Gistcanreturn is supposed to be true if ANY FetchFn method is defined. + * If FetchFn exists it would be used in index-only scan + * Thus the responsibility rests with the opclass developer. + */ + +Datum +gistcanreturn(PG_FUNCTION_ARGS) { + Relation index = (Relation) PG_GETARG_POINTER(0); + int i = PG_GETARG_INT32(1); + if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC))) + PG_RETURN_BOOL(true); + else + PG_RETURN_BOOL(false); +} + void freeGISTstate(GISTSTATE *giststate) { diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index 717cb85..80532c8 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -229,6 +229,8 @@ gistindex_keytest(IndexScanDesc scan, * we're doing a plain or ordered indexscan. For a plain indexscan, heap * tuple TIDs are returned into so-pageData[]. For an ordered indexscan, * heap tuple TIDs are pushed into individual search queue items. + * If index-only scan is possible, heap tuples themselves are returned + * into so-pageData or into search queue. * * If we detect that the index page has split since we saw its downlink * in the parent, we push its new right sibling onto the queue so the @@ -241,6 +243,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, GISTScanOpaque so = (GISTScanOpaque) scan-opaque; Buffer buffer; Page page; + GISTSTATE *giststate = so-giststate; + Relation r = scan-indexRelation; GISTPageOpaque opaque; OffsetNumber maxoff; OffsetNumber i; @@ -288,6 +292,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, } so-nPageData = so-curPageData = 0; + if (so-pageDataCxt) + MemoryContextReset(so-pageDataCxt); /* * check all tuples on page @@ -326,10 +332,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, else if (scan-numberOfOrderBys == 0 GistPageIsLeaf(page)) { /* - * Non-ordered scan, so report heap tuples in so-pageData[] + * Non-ordered scan, so report tuples in so-pageData[] */ so-pageData[so-nPageData].heapPtr = it-t_tid; so-pageData[so-nPageData].recheck = recheck; + + /* + * If index-only scan is possible add the data fetched from index. + */ + if (scan-xs_want_itup) + { +oldcxt =
Re: [HACKERS] Index-only scans for GiST.
I add MemoryContext listCxt to avoid memory leak. listCxt is created once in gistrescan (only for index-only scan plan ) and reseted when scan of the leaf page is finished. I do not sure if the problem was completely solved, so I wait for feedback. * What's the reason for turning GISTScanOpaqueData.pageData from an array to a List? This array is field of structure GISTScanOpaqueData. Memory for that structure allocated in function gistbeginscan(). The array is static so it's declared only one time in structure: GISTSearchHeapItem pageData [BLCKSZ/sizeof(IndexTupleData)] But how could we know size of array if we don't know what data would be returned? I mean type and amount. I asked Alexander about that and he offered me to use List instead of Array. indexonlyscan_gist_2.2.patch Description: Binary data indexonlyscan_gist_docs.patch Description: Binary data test_performance.sql Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GiST.
Thanks for answer. Now it seems to be applied correctly. 2015-02-12 3:12 GMT+04:00 Thom Brown t...@linux.com: On 11 February 2015 at 22:50, Anastasia Lubennikova lubennikov...@gmail.com wrote: Finally there is a new version of patch (in attachments). It provides multicolumn index-only scan for GiST indexes. - Memory leak is fixed. - little code cleanup - example of performance test in attachmens - function OIDs have debugging values (*) just to avoid merge conflicts while testing patch Wiki page of the project is https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Waiting for feedback. Hi Anastasia. Thanks for the updated patch. I've just tried applying it to head and it doesn't appear to apply cleanly. $ patch -p1 ~/Downloads/indexonlyscan_gist_2.0.patch (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/access/gist/gist.c Hunk #1 succeeded at 1404 (offset 9 lines). Hunk #2 succeeded at 1434 (offset 9 lines). (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/access/gist/gistget.c Hunk #1 succeeded at 227 (offset 1 line). Hunk #2 succeeded at 243 (offset 1 line). Hunk #3 succeeded at 293 (offset -4 lines). Hunk #4 succeeded at 330 (offset -4 lines). Hunk #5 succeeded at 365 (offset -5 lines). Hunk #6 succeeded at 444 (offset -27 lines). Hunk #7 succeeded at 474 (offset -27 lines). Hunk #8 FAILED at 518. Hunk #9 succeeded at 507 (offset -28 lines). Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines). Hunk #11 FAILED at 601. 2 out of 11 hunks FAILED -- saving rejects to file src/backend/access/gist/gistget.c.rej ... -- Thom -- Best regards, Lubennikova Anastasia diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c index db2a452..53750da 100644 --- a/src/backend/access/gist/gist.c +++ b/src/backend/access/gist/gist.c @@ -1404,6 +1404,14 @@ initGISTstate(Relation index) else giststate-distanceFn[i].fn_oid = InvalidOid; + /* opclasses are not required to provide a Fetch method */ + if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC))) + fmgr_info_copy((giststate-fetchFn[i]), + index_getprocinfo(index, i + 1, GIST_FETCH_PROC), + scanCxt); + else + giststate-fetchFn[i].fn_oid = InvalidOid; + /* * If the index column has a specified collation, we should honor that * while doing comparisons. However, we may have a collatable storage @@ -1426,6 +1434,22 @@ initGISTstate(Relation index) return giststate; } +/* + * Gistcanreturn is supposed to be true if ANY FetchFn method is defined. + * If FetchFn exists it would be used in index-only scan + * Thus the responsibility rests with the opclass developer. + */ + +Datum +gistcanreturn(PG_FUNCTION_ARGS) { + Relation index = (Relation) PG_GETARG_POINTER(0); + int i = PG_GETARG_INT32(1); + if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC))) + PG_RETURN_BOOL(true); + else + PG_RETURN_BOOL(false); +} + void freeGISTstate(GISTSTATE *giststate) { diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index 717cb85..0925e56 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan, * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap * tuples should be reported directly into the bitmap. If they are NULL, * we're doing a plain or ordered indexscan. For a plain indexscan, heap - * tuple TIDs are returned into so-pageData[]. For an ordered indexscan, + * tuple TIDs are returned into so-pageData. For an ordered indexscan, * heap tuple TIDs are pushed into individual search queue items. + * If index-only scan is possible, heap tuples themselves are returned + * into so-pageData or into search queue. * * If we detect that the index page has split since we saw its downlink * in the parent, we push its new right sibling onto the queue so the @@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, GISTScanOpaque so = (GISTScanOpaque) scan-opaque; Buffer buffer; Page page; + GISTSTATE *giststate = so-giststate; + Relation r = scan-indexRelation; + boolisnull[INDEX_MAX_KEYS]; + GISTSearchHeapItem *tmpListItem; GISTPageOpaque opaque; OffsetNumber maxoff; OffsetNumber i; @@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, MemoryContextSwitchTo(oldcxt); } - so-nPageData = so-curPageData = 0; - /* * check all tuples on page */ @@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances, else if (scan-numberOfOrderBys == 0 GistPageIsLeaf(page)) { /* - * Non-ordered scan, so report heap tuples in so-pageData[] + * Non-ordered scan, so report tuples in so-pageData + */ + + /* form tmpListItem and
Re: [HACKERS] Index-only scans for GiST.
On 12 February 2015 at 10:40, Anastasia Lubennikova lubennikov...@gmail.com wrote: Thanks for answer. Now it seems to be applied correctly. (please avoid top-posting) Thanks for the updated patch. I can confirm that it now cleanly applies and compiles fine. I've run the tested in the SQL file you provided, and it's behaving as expected. Thom
Re: [HACKERS] Index-only scans for GiST.
On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote: Thanks for answer. Now it seems to be applied correctly. * Documentation is missing. Anastasia provided a documentation patch in the first email. Thom
Re: [HACKERS] Index-only scans for GiST.
On Thu, Feb 12, 2015 at 3:07 PM, Thom Brown t...@linux.com wrote: On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote: Thanks for answer. Now it seems to be applied correctly. * Documentation is missing. Anastasia provided a documentation patch in the first email. Yeah, but it's a good practice send all the patches together. Will make the life of the reviewers better ;-) Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello
Re: [HACKERS] Index-only scans for GiST.
On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote: Thanks for answer. Now it seems to be applied correctly. Thanks, it would be great to get this completed. This still leaks memory with the same test query as earlier. The leak seems to be into a different memory context now; it used to be to the GiST scan context, but now it's to the ExecutorState context. See attached patch which makes the leak evident. Looking at my previous comments from August: * What's the reason for turning GISTScanOpaqueData.pageData from an array to a List? * Documentation is missing. - Heikki diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c index 0925e56..3768c9c 100644 --- a/src/backend/access/gist/gistget.c +++ b/src/backend/access/gist/gistget.c @@ -526,6 +526,12 @@ gistgettuple(PG_FUNCTION_ARGS) * It's always head of so-pageData */ so-pageData = list_delete_first(so-pageData); +{ + static int lastreport = 0; + if ((lastreport++) % 1 == 0) + MemoryContextStats(CurrentMemoryContext); +} + PG_RETURN_BOOL(TRUE); } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GiST.
On 11 February 2015 at 22:50, Anastasia Lubennikova lubennikov...@gmail.com wrote: Finally there is a new version of patch (in attachments). It provides multicolumn index-only scan for GiST indexes. - Memory leak is fixed. - little code cleanup - example of performance test in attachmens - function OIDs have debugging values (*) just to avoid merge conflicts while testing patch Wiki page of the project is https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Waiting for feedback. Hi Anastasia. Thanks for the updated patch. I've just tried applying it to head and it doesn't appear to apply cleanly. $ patch -p1 ~/Downloads/indexonlyscan_gist_2.0.patch (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/access/gist/gist.c Hunk #1 succeeded at 1404 (offset 9 lines). Hunk #2 succeeded at 1434 (offset 9 lines). (Stripping trailing CRs from patch; use --binary to disable.) patching file src/backend/access/gist/gistget.c Hunk #1 succeeded at 227 (offset 1 line). Hunk #2 succeeded at 243 (offset 1 line). Hunk #3 succeeded at 293 (offset -4 lines). Hunk #4 succeeded at 330 (offset -4 lines). Hunk #5 succeeded at 365 (offset -5 lines). Hunk #6 succeeded at 444 (offset -27 lines). Hunk #7 succeeded at 474 (offset -27 lines). Hunk #8 FAILED at 518. Hunk #9 succeeded at 507 (offset -28 lines). Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines). Hunk #11 FAILED at 601. 2 out of 11 hunks FAILED -- saving rejects to file src/backend/access/gist/gistget.c.rej ... -- Thom
Re: [HACKERS] Index-only scans for GIST
On 18 August 2014 09:05, Heikki Linnakangas hlinnakan...@vmware.com wrote: On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote: 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com: * I'm getting two regression failures with this (opr_sanity and join). opr_sanity failure is corrected. But there is remain question with join. I check the latest version of my github repo and there's no fail in join regression test All 145 tests passed. To tell the truth, I don't understand which changes could led to this failure. Could you show me regression diffs? Sure, here you go. It seems like a change in a plan. At a quick glance it seems harmless: the new plan is identical except that the left and right side of a join have been reversed. But I don't understand either why this patch would change that, so it needs to be investigated. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. Thank you for comment, I rewrote wrong queries. But could you explain why LIMIT queries may return different results? Is it happens because of different query optimization? Imagine that you have a table with two rows, A and B. If you run a query like SELECT * FROM table LIMIT 1, the system can legally return either row A or B, because there's no ORDER BY. Now, admittedly you have a similar problem even without the LIMIT - the system can legally return the rows in either order - but it's less of an issue because at least you can quickly see from the diff that the result set is in fact the same, the rows are just in different order. You could fix that by adding an ORDER BY to all test queries, but we haven't done that in the regression suite because then we would not have any test coverage for cases where you don't have an ORDER BY. As a compromise, test cases are usually written without an ORDER BY, but if e.g. the buildfarm starts failing because of differences in the result set order across platforms, then we add an ORDER BY to make it stable. * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. I don't think it's memory leak. After some increasing, memory using remain the same. It works similar without using indexonlyscan. No, it's definitely a leak caused by the patch. Test with the attached patch, which prints out to the server log the amount of memory used by the GiST scan memory context every 1 rows. It clearly shows increasing memory usage all the way to the end of the query. It's cleaned up at the end of the query, but that's not good enough because for a large query you might accumulate gigabytes of leaked memory until the query has finished. If you (manually) apply the same patch to git master, you'll see that the memory usage stays consistent and small. Hi Anastasia, Do you have time to address the issues brought up in Heikki's review? It would be good if we could your work into PostgreSQL 9.5. Thanks Thom
Re: [HACKERS] Index-only scans for GIST
Updated patch * Compiler, merge and regression fails checked * Regression tests was impoved * GiST and amcanreturn docs updated -- Best regards, Lubennikova Anastasia indexonlyscan_gist2.patch Description: Binary data indexonlyscan_gist_docs.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GIST
On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote: 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com: * I'm getting two regression failures with this (opr_sanity and join). opr_sanity failure is corrected. But there is remain question with join. I check the latest version of my github repo and there's no fail in join regression test All 145 tests passed. To tell the truth, I don't understand which changes could led to this failure. Could you show me regression diffs? Sure, here you go. It seems like a change in a plan. At a quick glance it seems harmless: the new plan is identical except that the left and right side of a join have been reversed. But I don't understand either why this patch would change that, so it needs to be investigated. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. Thank you for comment, I rewrote wrong queries. But could you explain why LIMIT queries may return different results? Is it happens because of different query optimization? Imagine that you have a table with two rows, A and B. If you run a query like SELECT * FROM table LIMIT 1, the system can legally return either row A or B, because there's no ORDER BY. Now, admittedly you have a similar problem even without the LIMIT - the system can legally return the rows in either order - but it's less of an issue because at least you can quickly see from the diff that the result set is in fact the same, the rows are just in different order. You could fix that by adding an ORDER BY to all test queries, but we haven't done that in the regression suite because then we would not have any test coverage for cases where you don't have an ORDER BY. As a compromise, test cases are usually written without an ORDER BY, but if e.g. the buildfarm starts failing because of differences in the result set order across platforms, then we add an ORDER BY to make it stable. * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. I don't think it's memory leak. After some increasing, memory using remain the same. It works similar without using indexonlyscan. No, it's definitely a leak caused by the patch. Test with the attached patch, which prints out to the server log the amount of memory used by the GiST scan memory context every 1 rows. It clearly shows increasing memory usage all the way to the end of the query. It's cleaned up at the end of the query, but that's not good enough because for a large query you might accumulate gigabytes of leaked memory until the query has finished. If you (manually) apply the same patch to git master, you'll see that the memory usage stays consistent and small. - Heikki *** /home/heikki/git-sandbox/postgresql/src/test/regress/expected/join.out 2014-08-18 09:38:55.146171394 +0300 --- /home/heikki/git-sandbox/postgresql/src/test/regress/results/join.out 2014-08-18 10:28:30.326491898 +0300 *** *** 2579,2590 --- Sort Sort Key: t1.q1, t1.q2 !- Hash Left Join ! Hash Cond: (t1.q2 = t2.q1) Filter: (1 = (SubPlan 1)) ! - Seq Scan on int8_tbl t1 - Hash !- Seq Scan on int8_tbl t2 SubPlan 1 - Limit - Result --- 2579,2590 --- Sort Sort Key: t1.q1, t1.q2 !- Hash Right Join ! Hash Cond: (t2.q1 = t1.q2) Filter: (1 = (SubPlan 1)) ! - Seq Scan on int8_tbl t2 - Hash !- Seq Scan on int8_tbl t1 SubPlan 1 - Limit - Result *** *** 3589,3603 lateral (values(x.q1,y.q1,y.q2)) v(xq1,yq1,yq2); q1|q2 |q1|q2 | xq1| yq1|yq2 --+---+--+---+--+--+--- - 123 | 456 | | | 123 | | - 123 | 4567890123456789 | 4567890123456789
Re: [HACKERS] Index-only scans for GIST
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com: * I'm getting two regression failures with this (opr_sanity and join). opr_sanity failure is corrected. But there is remain question with join. I check the latest version of my github repo and there's no fail in join regression test All 145 tests passed. To tell the truth, I don't understand which changes could led to this failure. Could you show me regression diffs? I want to understand what's wrong with the patch. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. Thank you for comment, I rewrote wrong queries. But could you explain why LIMIT queries may return different results? Is it happens because of different query optimization? * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. I don't think it's memory leak. After some increasing, memory using remain the same. It works similar without using indexonlyscan. -- Best regards, Lubennikova Anastasia
Re: [HACKERS] Index-only scans for GIST
On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. Thanks! Some comments: * I got this compiler warning: gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] ListCell *tmpPageData = so-curPageData; ^ * I'm getting two regression failures with this (opr_sanity and join). * After merging with master, build fails because of duplicate OIDs. * The regression test queries that use LIMIT are not guaranteed to always return the same rows, hence they're not very good regression test cases. I'd suggest using more restricting WHERE clauses, so that each query only returns a handful of rows. * What's the reason for turning GISTScanOpaqueData.pageData from an array to a List? * I think it's leaking memory, in GIST scan context. I tested this with a variant of the regression tests: insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 0.05*i)), point(0.05*i, 0.05*i) FROM generate_series(0, 1000) as i; CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p); set enable_seqscan=off; set enable_bitmapscan=off; explain analyze select p from gist_tbl where p @ box(point(0,0), point(999,999)) and length(p::text) 10; while the final query runs, 'top' shows constantly increasing memory usage. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Can we have Fetch functions for all the datatypes in btree_gist contrib module, please? Do other contrib modules contain GiST opclasses that could have Fetch functions? Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2 function. But I couldn't understand how. The query is: select * from gist_tbl where b @ box(point(5,5), point(6,6)) or p @ box(point(0,0), point(100,100)) limit 10; It cannot use an index(-only) scan for this, because a single index scan can only return rows based on one key. In this case, you need to do two scans, and then return the rows returned by either scan, removing duplicates. A bitmap scan is possible, because it can remove the duplicates, but the planner can't produce a plain index scan plan that would do the same. A common trick when that happens in a real-world application is to re-write the query using UNION: select * from gist_tbl where b @ box(point(5,5), point(6,6)) UNION select * from gist_tbl where p @ box(point(0,0), point(100,100)) limit 10; Although that doesn't seem to actually work: ERROR: could not identify an equality operator for type box LINE 1: select * from gist_tbl ^ but that's not your patch's fault, the same happens with unpatched master. IOW, you don't need to worry about that case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Index-only scans for GIST
On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova lubennikov...@gmail.com wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR function. But I couldn't understand how. Very nice... please add your patch to the next commit fest [1]. Regards, [1] https://commitfest.postgresql.org/action/commitfest_view?id=23 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Index-only scans for GIST
Thank you for comment Patch is already added in Performance topic. 2014-08-01 20:25 GMT+04:00 Fabrízio de Royes Mello fabriziome...@gmail.com : On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova lubennikov...@gmail.com wrote: Hi, hackers! I work on a GSoC project Index-only scans for GIST https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014 Repository is https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments. It includes index-only scans for multicolumn GIST and new regression test. Fetch() method is realized for box and point opclasses. Documentation is not updated yet, but I'm going to do it till the end of GSoC. I've got one question about query with OR condition. It is the last query in regression test gist_indexonly. It doesn't fail but it doensn't use index-only scans. Could someone explain to me how it works? It seems to depend on build_paths_for_OR function. But I couldn't understand how. Very nice... please add your patch to the next commit fest [1]. Regards, [1] https://commitfest.postgresql.org/action/commitfest_view?id=23 -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello -- Best regards, Lubennikova Anastasia
Re: [HACKERS] Index-only scans for GIST
On Sun, May 25, 2014 at 6:12 AM, Anastasia Lubennikova lubennikov...@gmail.com wrote: Hi, hackers! There are first results of my work on GSoC project Index-only scans for GIST. Cool. 1. Version of my project code is in forked repository https://github.com/lubennikovaav/postgres/tree/indexonlygist2 Patch is in attachments - This version is only for one-column indexes That's probably a limitation that needs to be fixed before this can be committed. - fetch() method is realized only for box opclass (because it's trivial) That might not need to be fixed before this can be committed. 2. I test Index-only scans with SQL script box_test.sql and it works really faster. (results in box_test_out) I'll be glad to get your feedback about this feature. Since this is a GSoC project, it would be nice if one of the people who is knowledgeable about GIST (Heikki, Alexander, etc.) could weigh in on this before too much time goes by, so that Anastasia can press forward with this work. I don't know enough to offer too many substantive comments, but I think you should remove all of the //-style comments (most of which are debugging leftovers) and add some more comments describing what you're actually doing and, more importantly, why. This comment doesn't appear to make sense: + /* +* The offset number on tuples on internal pages is unused. For historical +* reasons, it is set 0x. +*/ The reason this doesn't make sense is because the tuple in question is not on an internal page, or indeed any page at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers