Re: [HACKERS] Index-only scans for GiST.

2015-03-26 Thread Heikki Linnakangas

On 03/02/2015 12:43 AM, Heikki Linnakangas wrote:

On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote:

I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.


Yeah, I think that solves it.


On further testing, there was actually still a leak with kNN-searches. 
Fixed.



I spent a little time cleaning this up. I reverted that pageData change
so that it's an array again, put back the gist_indexonly.sql and
expected output files that were missing from your latest version,
removed a couple of unused local variables that gcc complained about. I
refactored gistFetchTuple a bit, because it was doing IMHO quite bogus
things with NULLs. It was passing NULLs to the opclass' fetch function,
but it didn't pass the isNull flag correctly. I changed it so that the
fetch function is not called at all for NULLs.

I think this is pretty close to being committable. I'll make a round of
editorializing over the docs, and the code comments as well.

The opr_sanity regression test is failing, there's apparently something
wrong with the pg_proc entries of the *canreturn functions. I haven't
looked into that yet; could you fix that?


I have pushed this, after fixing the opr_sanity failure, some bug fixes, 
and documentation and comment cleanup.


Thanks for the patch!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-03-01 Thread Heikki Linnakangas

On 02/27/2015 04:19 PM, Anastasia Lubennikova wrote:

I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.


Yeah, I think that solves it.


* What's the reason for turning GISTScanOpaqueData.pageData from an array
to a List?

This array is field of structure GISTScanOpaqueData. Memory for that
structure allocated in function gistbeginscan(). The array is static so
it's declared only one time in structure:
GISTSearchHeapItem  pageData [BLCKSZ/sizeof(IndexTupleData)]

But how could we know size of array if we don't know what data would be
returned? I mean type and amount.


You're only adding a pointer to the IndexTuple to GISTSearchHeapItem. 
The GISTSearchHeapItem struct itself is still of constant size.


I spent a little time cleaning this up. I reverted that pageData change 
so that it's an array again, put back the gist_indexonly.sql and 
expected output files that were missing from your latest version, 
removed a couple of unused local variables that gcc complained about. I 
refactored gistFetchTuple a bit, because it was doing IMHO quite bogus 
things with NULLs. It was passing NULLs to the opclass' fetch function, 
but it didn't pass the isNull flag correctly. I changed it so that the 
fetch function is not called at all for NULLs.


I think this is pretty close to being committable. I'll make a round of 
editorializing over the docs, and the code comments as well.


The opr_sanity regression test is failing, there's apparently something 
wrong with the pg_proc entries of the *canreturn functions. I haven't 
looked into that yet; could you fix that?


- Heikki

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate-distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy((giststate-fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate-fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..80532c8 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -229,6 +229,8 @@ gistindex_keytest(IndexScanDesc scan,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
  * tuple TIDs are returned into so-pageData[].  For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so-pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan-opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so-giststate;
+	Relation r = scan-indexRelation;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -288,6 +292,8 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	}
 
 	so-nPageData = so-curPageData = 0;
+	if (so-pageDataCxt)
+		MemoryContextReset(so-pageDataCxt);
 
 	/*
 	 * check all tuples on page
@@ -326,10 +332,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan-numberOfOrderBys == 0  GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so-pageData[]
+			 * Non-ordered scan, so report tuples in so-pageData[]
 			 */
 			so-pageData[so-nPageData].heapPtr = it-t_tid;
 			so-pageData[so-nPageData].recheck = recheck;
+
+			/*
+			 * If index-only scan is possible add the data fetched from index.
+			 */
+			if (scan-xs_want_itup)
+			{
+oldcxt = 

Re: [HACKERS] Index-only scans for GiST.

2015-02-27 Thread Anastasia Lubennikova
I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.

* What's the reason for turning GISTScanOpaqueData.pageData from an array
to a List?

This array is field of structure GISTScanOpaqueData. Memory for that
structure allocated in function gistbeginscan(). The array is static so
it's declared only one time in structure:
GISTSearchHeapItem  pageData [BLCKSZ/sizeof(IndexTupleData)]

But how could we know size of array if we don't know what data would be
returned? I mean type and amount.

I asked Alexander about that and he offered me to use List instead of Array.


indexonlyscan_gist_2.2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data


test_performance.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Anastasia Lubennikova
Thanks for answer.
Now it seems to be applied correctly.

2015-02-12 3:12 GMT+04:00 Thom Brown t...@linux.com:

 On 11 February 2015 at 22:50, Anastasia Lubennikova 
 lubennikov...@gmail.com wrote:

 Finally there is a new version of patch (in attachments).
 It provides multicolumn index-only scan for GiST indexes.

 - Memory leak is fixed.
 - little code cleanup
 - example of performance test in attachmens
 - function OIDs have debugging values (*) just to avoid merge
 conflicts while testing patch

 Wiki page of the project is

 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

 Waiting for feedback.


 Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
 to head and it doesn't appear to apply cleanly.

 $ patch -p1  ~/Downloads/indexonlyscan_gist_2.0.patch
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gist.c
 Hunk #1 succeeded at 1404 (offset 9 lines).
 Hunk #2 succeeded at 1434 (offset 9 lines).
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gistget.c
 Hunk #1 succeeded at 227 (offset 1 line).
 Hunk #2 succeeded at 243 (offset 1 line).
 Hunk #3 succeeded at 293 (offset -4 lines).
 Hunk #4 succeeded at 330 (offset -4 lines).
 Hunk #5 succeeded at 365 (offset -5 lines).
 Hunk #6 succeeded at 444 (offset -27 lines).
 Hunk #7 succeeded at 474 (offset -27 lines).
 Hunk #8 FAILED at 518.
 Hunk #9 succeeded at 507 (offset -28 lines).
 Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
 Hunk #11 FAILED at 601.
 2 out of 11 hunks FAILED -- saving rejects to file
 src/backend/access/gist/gistget.c.rej
 ...

 --
 Thom




-- 
Best regards,
Lubennikova Anastasia
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate-distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy((giststate-fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate-fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..0925e56 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan,
  * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap
  * tuples should be reported directly into the bitmap.  If they are NULL,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
- * tuple TIDs are returned into so-pageData[].  For an ordered indexscan,
+ * tuple TIDs are returned into so-pageData. For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so-pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan-opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so-giststate;
+	Relation r = scan-indexRelation;
+	boolisnull[INDEX_MAX_KEYS];
+	GISTSearchHeapItem *tmpListItem;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	so-nPageData = so-curPageData = 0;
-
 	/*
 	 * check all tuples on page
 	 */
@@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan-numberOfOrderBys == 0  GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so-pageData[]
+			 * Non-ordered scan, so report tuples in so-pageData
+			 */
+
+			/* form tmpListItem and 

Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 10:40, Anastasia Lubennikova lubennikov...@gmail.com
 wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


(please avoid top-posting)

Thanks for the updated patch.  I can confirm that it now cleanly applies
and compiles fine.  I've run the tested in the SQL file you provided, and
it's behaving as expected.

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Thom Brown
On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


 * Documentation is missing.


Anastasia provided a documentation patch in the first email.

Thom


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Fabrízio de Royes Mello
On Thu, Feb 12, 2015 at 3:07 PM, Thom Brown t...@linux.com wrote:

 On 12 February 2015 at 16:40, Heikki Linnakangas hlinnakan...@vmware.com
wrote:

 On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

 Thanks for answer.
 Now it seems to be applied correctly.


 * Documentation is missing.


 Anastasia provided a documentation patch in the first email.


Yeah, but it's a good practice send all the patches together. Will make the
life of the reviewers better ;-)

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Heikki Linnakangas

On 02/12/2015 12:40 PM, Anastasia Lubennikova wrote:

Thanks for answer.
Now it seems to be applied correctly.


Thanks, it would be great to get this completed.

This still leaks memory with the same test query as earlier. The leak 
seems to be into a different memory context now; it used to be to the 
GiST scan context, but now it's to the ExecutorState context. See 
attached patch which makes the leak evident.


Looking at my previous comments from August:

* What's the reason for turning GISTScanOpaqueData.pageData from an
array to a List?

* Documentation is missing.

- Heikki
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 0925e56..3768c9c 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -526,6 +526,12 @@ gistgettuple(PG_FUNCTION_ARGS)
  * It's always head of so-pageData
  */
 so-pageData = list_delete_first(so-pageData);
+{
+	static int lastreport = 0;
+	if ((lastreport++) % 1 == 0)
+		MemoryContextStats(CurrentMemoryContext);
+}
+
 PG_RETURN_BOOL(TRUE);
 			}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-11 Thread Thom Brown
On 11 February 2015 at 22:50, Anastasia Lubennikova lubennikov...@gmail.com
 wrote:

 Finally there is a new version of patch (in attachments).
 It provides multicolumn index-only scan for GiST indexes.

 - Memory leak is fixed.
 - little code cleanup
 - example of performance test in attachmens
 - function OIDs have debugging values (*) just to avoid merge
 conflicts while testing patch

 Wiki page of the project is

 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

 Waiting for feedback.


Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
to head and it doesn't appear to apply cleanly.

$ patch -p1  ~/Downloads/indexonlyscan_gist_2.0.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/access/gist/gist.c
Hunk #1 succeeded at 1404 (offset 9 lines).
Hunk #2 succeeded at 1434 (offset 9 lines).
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/access/gist/gistget.c
Hunk #1 succeeded at 227 (offset 1 line).
Hunk #2 succeeded at 243 (offset 1 line).
Hunk #3 succeeded at 293 (offset -4 lines).
Hunk #4 succeeded at 330 (offset -4 lines).
Hunk #5 succeeded at 365 (offset -5 lines).
Hunk #6 succeeded at 444 (offset -27 lines).
Hunk #7 succeeded at 474 (offset -27 lines).
Hunk #8 FAILED at 518.
Hunk #9 succeeded at 507 (offset -28 lines).
Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
Hunk #11 FAILED at 601.
2 out of 11 hunks FAILED -- saving rejects to file
src/backend/access/gist/gistget.c.rej
...

-- 
Thom


Re: [HACKERS] Index-only scans for GIST

2014-10-27 Thread Thom Brown
On 18 August 2014 09:05, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:

 2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com:

 * I'm getting two regression failures with this (opr_sanity and join).



 opr_sanity failure is corrected.
 But there is remain question with join.
 I check the latest version of my github repo and there's no fail in join
 regression test
 All 145 tests passed.
 To tell the truth, I don't understand which changes could led to this
 failure.
 Could you show me regression diffs?


 Sure, here you go. It seems like a change in a plan. At a quick glance it
 seems harmless: the new plan is identical except that the left and right
 side of a join have been reversed. But I don't understand either why this
 patch would change that, so it needs to be investigated.

  * The regression test queries that use LIMIT are not guaranteed to always

 return the same rows, hence they're not very good regression test cases.
 I'd suggest using more restricting WHERE clauses, so that each query only
 returns a handful of rows.


 Thank you for comment, I rewrote wrong queries. But could you explain why
 LIMIT queries may return different results? Is it happens because of
 different query optimization?


 Imagine that you have a table with two rows, A and B. If you run a query
 like SELECT * FROM table LIMIT 1, the system can legally return either
 row A or B, because there's no ORDER BY.

 Now, admittedly you have a similar problem even without the LIMIT - the
 system can legally return the rows in either order - but it's less of an
 issue because at least you can quickly see from the diff that the result
 set is in fact the same, the rows are just in different order. You could
 fix that by adding an ORDER BY to all test queries, but we haven't done
 that in the regression suite because then we would not have any test
 coverage for cases where you don't have an ORDER BY. As a compromise, test
 cases are usually written without an ORDER BY, but if e.g. the buildfarm
 starts failing because of differences in the result set order across
 platforms, then we add an ORDER BY to make it stable.

  * I think it's leaking memory, in GIST scan context. I tested this with a

 variant of the regression tests:

 insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
 0.05*i)),
   point(0.05*i, 0.05*i) FROM generate_series(0,
 1000) as i;
 CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

 set enable_seqscan=off;
 set enable_bitmapscan=off;

 explain analyze  select p from gist_tbl where p @ box(point(0,0),
 point(999,999)) and length(p::text)  10;

 while the final query runs, 'top' shows constantly increasing memory
 usage.


 I don't think it's memory leak. After some increasing, memory using remain
 the same. It works similar without using indexonlyscan.


 No, it's definitely a leak caused by the patch. Test with the attached
 patch, which prints out to the server log the amount of memory used by the
 GiST scan memory context every 1 rows. It clearly shows increasing
 memory usage all the way to the end of the query.

 It's cleaned up at the end of the query, but that's not good enough
 because for a large query you might accumulate gigabytes of leaked memory
 until the query has finished. If you (manually) apply the same patch to git
 master, you'll see that the memory usage stays consistent and small.


Hi Anastasia,

Do you have time to address the issues brought up in Heikki's review?  It
would be good if we could your work into PostgreSQL 9.5.

Thanks

Thom


Re: [HACKERS] Index-only scans for GIST

2014-08-18 Thread Anastasia Lubennikova
Updated patch
* Compiler, merge and regression fails checked
* Regression tests was impoved
* GiST and amcanreturn docs updated
-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-08-18 Thread Heikki Linnakangas

On 08/17/2014 07:15 PM, Anastasia Lubennikova wrote:

2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com:

* I'm getting two regression failures with this (opr_sanity and join).




opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?


Sure, here you go. It seems like a change in a plan. At a quick glance 
it seems harmless: the new plan is identical except that the left and 
right side of a join have been reversed. But I don't understand either 
why this patch would change that, so it needs to be investigated.



* The regression test queries that use LIMIT are not guaranteed to always

return the same rows, hence they're not very good regression test cases.
I'd suggest using more restricting WHERE clauses, so that each query only
returns a handful of rows.


Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?


Imagine that you have a table with two rows, A and B. If you run a query 
like SELECT * FROM table LIMIT 1, the system can legally return either 
row A or B, because there's no ORDER BY.


Now, admittedly you have a similar problem even without the LIMIT - the 
system can legally return the rows in either order - but it's less of an 
issue because at least you can quickly see from the diff that the result 
set is in fact the same, the rows are just in different order. You could 
fix that by adding an ORDER BY to all test queries, but we haven't done 
that in the regression suite because then we would not have any test 
coverage for cases where you don't have an ORDER BY. As a compromise, 
test cases are usually written without an ORDER BY, but if e.g. the 
buildfarm starts failing because of differences in the result set order 
across platforms, then we add an ORDER BY to make it stable.



* I think it's leaking memory, in GIST scan context. I tested this with a

variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
0.05*i)),
  point(0.05*i, 0.05*i) FROM generate_series(0,
1000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p @ box(point(0,0),
point(999,999)) and length(p::text)  10;

while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.


No, it's definitely a leak caused by the patch. Test with the attached 
patch, which prints out to the server log the amount of memory used by 
the GiST scan memory context every 1 rows. It clearly shows 
increasing memory usage all the way to the end of the query.


It's cleaned up at the end of the query, but that's not good enough 
because for a large query you might accumulate gigabytes of leaked 
memory until the query has finished. If you (manually) apply the same 
patch to git master, you'll see that the memory usage stays consistent 
and small.


- Heikki

*** /home/heikki/git-sandbox/postgresql/src/test/regress/expected/join.out  
2014-08-18 09:38:55.146171394 +0300
--- /home/heikki/git-sandbox/postgresql/src/test/regress/results/join.out   
2014-08-18 10:28:30.326491898 +0300
***
*** 2579,2590 
  ---
   Sort
 Sort Key: t1.q1, t1.q2
!-  Hash Left Join
!  Hash Cond: (t1.q2 = t2.q1)
   Filter: (1 = (SubPlan 1))
!  -  Seq Scan on int8_tbl t1
   -  Hash
!-  Seq Scan on int8_tbl t2
   SubPlan 1
 -  Limit
   -  Result
--- 2579,2590 
  ---
   Sort
 Sort Key: t1.q1, t1.q2
!-  Hash Right Join
!  Hash Cond: (t2.q1 = t1.q2)
   Filter: (1 = (SubPlan 1))
!  -  Seq Scan on int8_tbl t2
   -  Hash
!-  Seq Scan on int8_tbl t1
   SubPlan 1
 -  Limit
   -  Result
***
*** 3589,3603 
lateral (values(x.q1,y.q1,y.q2)) v(xq1,yq1,yq2);
  q1|q2 |q1|q2 
|   xq1|   yq1|yq2
  
--+---+--+---+--+--+---
-   123 |   456 |  |   
|  123 |  |  
-   123 |  4567890123456789 | 4567890123456789 

Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com:

* I'm getting two regression failures with this (opr_sanity and join).


opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?
I want to understand what's wrong with the patch.

* The regression test queries that use LIMIT are not guaranteed to always
 return the same rows, hence they're not very good regression test cases.
 I'd suggest using more restricting WHERE clauses, so that each query only
 returns a handful of rows.


Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

* I think it's leaking memory, in GIST scan context. I tested this with a
 variant of the regression tests:

 insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
 0.05*i)),
  point(0.05*i, 0.05*i) FROM generate_series(0,
 1000) as i;
 CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

 set enable_seqscan=off;
 set enable_bitmapscan=off;

 explain analyze  select p from gist_tbl where p @ box(point(0,0),
 point(999,999)) and length(p::text)  10;

 while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-08-06 Thread Heikki Linnakangas

On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:

Hi, hackers!
I work on a GSoC project Index-only scans for GIST
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.


Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]

 ListCell *tmpPageData = so-curPageData;
 ^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to 
always return the same rows, hence they're not very good regression test 
cases. I'd suggest using more restricting WHERE clauses, so that each 
query only returns a handful of rows.


* What's the reason for turning GISTScanOpaqueData.pageData from an 
array to a List?


* I think it's leaking memory, in GIST scan context. I tested this with 
a variant of the regression tests:


insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i, 
0.05*i)),
 point(0.05*i, 0.05*i) FROM generate_series(0, 
1000) as i;

CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze  select p from gist_tbl where p @ box(point(0,0), 
point(999,999)) and length(p::text)  10;


while the final query runs, 'top' shows constantly increasing memory usage.


It includes index-only scans for multicolumn GIST and new regression test.
Fetch() method is realized for box and point opclasses.


Can we have Fetch functions for all the datatypes in btree_gist contrib 
module, please? Do other contrib modules contain GiST opclasses that 
could have Fetch functions?



Documentation is not updated yet, but I'm going to do it till the end of
GSoC.

I've got one question about query with OR condition. It is the last query
in regression test gist_indexonly. It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
It seems to depend on build_paths_for_OR
http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2
function.
But I couldn't understand how.


The query is:

select * from gist_tbl
where b @ box(point(5,5),  point(6,6))
or p @ box(point(0,0),  point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan 
can only return rows based on one key. In this case, you need to do two 
scans, and then return the rows returned by either scan, removing 
duplicates. A bitmap scan is possible, because it can remove the 
duplicates, but the planner can't produce a plain index scan plan that 
would do the same.


A common trick when that happens in a real-world application is to 
re-write the query using UNION:


select * from gist_tbl
where b @ box(point(5,5),  point(6,6))
UNION
select * from gist_tbl
where p @ box(point(0,0),  point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR:  could not identify an equality operator for type box
LINE 1: select * from gist_tbl
   ^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-08-01 Thread Fabrízio de Royes Mello
On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova 
lubennikov...@gmail.com wrote:

 Hi, hackers!
 I work on a GSoC project Index-only scans for GIST

https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

 Repository is
 https://github.com/lubennikovaav/postgres/tree/indexonlygist2
 Patch is in attachments.

 It includes index-only scans for multicolumn GIST and new regression test.
 Fetch() method is realized for box and point opclasses.

 Documentation is not updated yet, but I'm going to do it till the end of
GSoC.

 I've got one question about query with OR condition. It is the last query
in regression test gist_indexonly. It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
 It seems to depend on build_paths_for_OR function. But I couldn't
understand how.


Very nice... please add your patch to the next commit fest [1].

Regards,


[1] https://commitfest.postgresql.org/action/commitfest_view?id=23

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog sobre TI: http://fabriziomello.blogspot.com
 Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello


Re: [HACKERS] Index-only scans for GIST

2014-08-01 Thread Anastasia Lubennikova
Thank you for comment
Patch is already added in Performance topic.


2014-08-01 20:25 GMT+04:00 Fabrízio de Royes Mello fabriziome...@gmail.com
:


 On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova 
 lubennikov...@gmail.com wrote:
 
  Hi, hackers!
  I work on a GSoC project Index-only scans for GIST
 
 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
 
  Repository is
  https://github.com/lubennikovaav/postgres/tree/indexonlygist2
  Patch is in attachments.
 
  It includes index-only scans for multicolumn GIST and new regression
 test.
  Fetch() method is realized for box and point opclasses.
 
  Documentation is not updated yet, but I'm going to do it till the end of
 GSoC.
 
  I've got one question about query with OR condition. It is the last
 query in regression test gist_indexonly. It doesn't fail but it doensn't
 use index-only scans. Could someone explain to me how it works?
  It seems to depend on build_paths_for_OR function. But I couldn't
 understand how.
 

 Very nice... please add your patch to the next commit fest [1].

 Regards,


 [1] https://commitfest.postgresql.org/action/commitfest_view?id=23

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello




-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-05-29 Thread Robert Haas
On Sun, May 25, 2014 at 6:12 AM, Anastasia Lubennikova
lubennikov...@gmail.com wrote:
 Hi, hackers!
 There are first results of my work on GSoC project Index-only scans for
 GIST.

Cool.

 1. Version of my project code is in forked repository
 https://github.com/lubennikovaav/postgres/tree/indexonlygist2
 Patch is in attachments
 - This version is only for one-column indexes

That's probably a limitation that needs to be fixed before this can be
committed.

 - fetch() method is realized only for box opclass (because it's trivial)

That might not need to be fixed before this can be committed.

 2. I test Index-only scans with SQL script box_test.sql
 and it works really faster. (results in box_test_out)

 I'll be glad to get your feedback about this feature.

Since this is a GSoC project, it would be nice if one of the people
who is knowledgeable about GIST (Heikki, Alexander, etc.) could weigh
in on this before too much time goes by, so that Anastasia can press
forward with this work.

I don't know enough to offer too many substantive comments, but I
think you should remove all of the //-style comments (most of which
are debugging leftovers) and add some more comments describing what
you're actually doing and, more importantly, why.

This comment doesn't appear to make sense:

+   /*
+* The offset number on tuples on internal pages is unused.
For historical
+* reasons, it is set 0x.
+*/

The reason this doesn't make sense is because the tuple in question is
not on an internal page, or indeed any page at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers