[HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-18 Thread Anastasia Lubennikova
Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2014/lubennikovaav/5629499534213120)
Any suggestions and comments are welcome.

*Project name*

Support for index-only scans for GIST index



*Brief review*

Currently GiST index don't have index-only scan functionality. Index-only
scans are a major performance feature added to Postgres 9.2. They allow
certain types of queries to be satisfied just by retrieving data from
indexes, and not from tables. This feature for B-trees (implemented since
PostgreSQL-9.2) allows doing certain types of queries significantly faster.
This is achieved by reduction in the amount of I/O necessary to satisfy
queries. I think it will be useful to implement this feature for user
defined data types that use GiST index.



*Benefits to the PostgreSQL Community*

Faster GiST index search would be actual for many PostgreSQL applications
(for example some GIS systems).


 *Quantifiable results*

Acceleration of GiST index search.


*Project details*

1. The GiST is a balanced tree structure like a B-tree, containing key,
pointer pairs. GiST key is a member of a user-defined class, and
represents some property that is true of all data items reachable from the
pointer associated with the key. The GiST provides a possibility to create
custom data types with indexed access methods and extensible set of queries.

There are seven methods that an index operator class for GiST must provide,
and an eighth that is optional.

-consistent

-union

-compress

-decompress

-penalty

-picksplit

-equal

-distance (optional)

I'm going to create new optional method fetch() in addition to existing.
Thus user can create a method of retrieving data from the index without
loss. It will be used when performing search queries to speed data
retrieving.



2. gistget algorithm (several parts omitted):

Check the key
gistindex_keytest() - does this index tuple satisfy the scan key(s)?

Scan all items on the GiST index page and insert them into the queue (or
directly to output areas)

plain scan

Heap tuple TIDs are returned into so-pageData[]

ordered scan

Heap tuple TIDs are pushed into individual search queue items

If the fetch() is specified by the developer, then using it, algorithm can
retrieve the data directly to output areas at this stage, without reference
to the heap.


3. Create function gistcanreturn to check whether fetch() is specified by
user.

Amcanreturn - Function to check whether index supports index-only scans, or
zero if none

There is the question of support index-only scans for multicolumn indexes.
Probably it would require extend the interface to add separate columns
checking.

To solve this question I need the community's help.


4. Add details for index only scans into gistcostestimate function



 *Links*

1) Hellerstein J. M., Naughton J. F., Pfeffer A. Generalized search
trees for database systems. - September, 1995.

2) http://www.sai.msu.su/~megera/postgres/gist/

3) PostgreSQL 9.3.3 Documentation: chapters 11. Indexes, 55. GiST
Indexes.

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] GSoC proposal. Index-only scans for GIST

2014-03-25 Thread Anastasia Lubennikova
2014-03-18 18:47 GMT+04:00 Robert Haas robertmh...@gmail.com


  If the fetch() is specified by the developer, then using it, algorithm
 can
  retrieve the data directly to output areas at this stage, without
 reference
  to the heap.

 This seems to be the crux of your proposal, but it seems vague: what
 exactly do you mean by retrieve the data directly to output areas?
  What data are you going to retrieve and where are you going to put it?


 I meant Datum that storages in Gist-tree nodes. Now gistgettuple() returns
xs_ctup.t_self (item pointer). I'm going to add index-only scan
functionality: gistsettuple() will return pointer and Datum itself as
xs_itup . So queue will contain both the pointer and the Datum. If
visibilitymap_test returns true then Datum from xs_itup would be added into
queue. Otherwise page must be scanned.

Another question to consider is: which operator classes do you
 anticipate that this will work for and which ones do you anticipate
 that it will not work for?  Any operator class that lossifies that
 input data before storing it in the index is presumably doomed, but
 which ones do that, and which do not?


about amcanreturn:
I'm going to create function gistcanreturn() = If fetch() is defined for
all indexed columns?

And last point of my project is to implement fetch() for existing opclasses
based on GIST.

-- 
Best regards,
Lubennikova Anastasia


[HACKERS] Index-only scans for GIST

2014-05-25 Thread Anastasia Lubennikova
Hi, hackers!
There are first results of my work on GSoC project Index-only scans for
GIST.

1. Version of my project code is in forked repository
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments
- This version is only for one-column indexes
- fetch() method is realized only for box opclass (because it's trivial)

2. I test Index-only scans with SQL script box_test.sql
and it works really faster. (results in box_test_out)

I'll be glad to get your feedback about this feature.

-- 
Best regards,
Lubennikova Anastasia


indexonlygist_2.0.patch
Description: Binary data


box_test.sql
Description: Binary data


box_test_out.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Changes in amcanreturn() interface to support multicolumn indexes

2014-06-26 Thread Anastasia Lubennikova
Changes in amcanreturn() interface to support multicolumn indexes.
Hi, hackers
I work on GSoC project Support_for_Index-only_scans_for_GIST_GSoC_2014
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
There is a question of support multicolumn index only scans for GIST.
I need help with this problem.

bool amcanreturn() - function to check whether index supports index-only
scans.
Now it's defined for
- B-tree. It always returns true, so there's no questions.
- SP-GIST. It doesn't support multicolumn indexes, so there's no problems
in spgcanreturn too.

- GIST. In first version it works only for onecolumn indexes.
gistcanreturn() checks whether fetch() method is defined for column's data
type.

There is a question of support multicolumn index only scans for GIST.
gistcanreturn() can return true if fetch is implemented for all indexed
columns and false otherwise.
But that's not very good case for multicolumn indexes.

I think, it requires extend the interface to add separate columns checking.
But I can't understand what kind of changes is required
and how would it affect on previous amcanreturn interface.


-- 
Regards,
Lubennikova Anastasia


[HACKERS] Index-only scans for multicolumn GIST

2014-07-21 Thread Anastasia Lubennikova
Hi, hackers!
There are new results of my work on GSoC project Index-only scans for
GIST.
Previous post is here:
http://postgresql.1045698.n5.nabble.com/Index-only-scans-for-GIST-td5804892.html

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.
It includes indexonlyscan for multicolumn GiST. It works correctly -
results are the same with another scan plans.
Fetch() method is realized for box and circle opclasses
Improvement for circle opclass is not such distinct as for box opclass,
because of recheck.

I remember that all elog and other bad comments must be fixed before this
patch can be committed.

Any comments are welcome
-- 
Best regards,
Lubennikova Anastasia


indexonlygist_2.1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index-only scans for GIST

2014-08-01 Thread Anastasia Lubennikova
Hi, hackers!
I work on a GSoC project Index-only scans for GIST
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

Repository is
https://github.com/lubennikovaav/postgres/tree/indexonlygist2
Patch is in attachments.

It includes index-only scans for multicolumn GIST and new regression test.
Fetch() method is realized for box and point opclasses.

Documentation is not updated yet, but I'm going to do it till the end of
GSoC.

I've got one question about query with OR condition. It is the last query
in regression test gist_indexonly. It doesn't fail but it doensn't use
index-only scans. Could someone explain to me how it works?
It seems to depend on build_paths_for_OR
http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2
function.
But I couldn't understand how.

-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GIST

2014-08-01 Thread Anastasia Lubennikova
Thank you for comment
Patch is already added in Performance topic.


2014-08-01 20:25 GMT+04:00 Fabrízio de Royes Mello fabriziome...@gmail.com
:


 On Fri, Aug 1, 2014 at 4:58 AM, Anastasia Lubennikova 
 lubennikov...@gmail.com wrote:
 
  Hi, hackers!
  I work on a GSoC project Index-only scans for GIST
 
 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
 
  Repository is
  https://github.com/lubennikovaav/postgres/tree/indexonlygist2
  Patch is in attachments.
 
  It includes index-only scans for multicolumn GIST and new regression
 test.
  Fetch() method is realized for box and point opclasses.
 
  Documentation is not updated yet, but I'm going to do it till the end of
 GSoC.
 
  I've got one question about query with OR condition. It is the last
 query in regression test gist_indexonly. It doesn't fail but it doensn't
 use index-only scans. Could someone explain to me how it works?
  It seems to depend on build_paths_for_OR function. But I couldn't
 understand how.
 

 Very nice... please add your patch to the next commit fest [1].

 Regards,


 [1] https://commitfest.postgresql.org/action/commitfest_view?id=23

 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira: http://www.timbira.com.br
  Blog sobre TI: http://fabriziomello.blogspot.com
  Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
  Twitter: http://twitter.com/fabriziomello




-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas hlinnakan...@vmware.com:

* I'm getting two regression failures with this (opr_sanity and join).


opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?
I want to understand what's wrong with the patch.

* The regression test queries that use LIMIT are not guaranteed to always
 return the same rows, hence they're not very good regression test cases.
 I'd suggest using more restricting WHERE clauses, so that each query only
 returns a handful of rows.


Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

* I think it's leaking memory, in GIST scan context. I tested this with a
 variant of the regression tests:

 insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
 0.05*i)),
  point(0.05*i, 0.05*i) FROM generate_series(0,
 1000) as i;
 CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

 set enable_seqscan=off;
 set enable_bitmapscan=off;

 explain analyze  select p from gist_tbl where p @ box(point(0,0),
 point(999,999)) and length(p::text)  10;

 while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GIST

2014-08-18 Thread Anastasia Lubennikova
Updated patch
* Compiler, merge and regression fails checked
* Regression tests was impoved
* GiST and amcanreturn docs updated
-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index-only scans for GiST.

2015-02-11 Thread Anastasia Lubennikova
Finally there is a new version of patch (in attachments).
It provides multicolumn index-only scan for GiST indexes.

- Memory leak is fixed.
- little code cleanup
- example of performance test in attachmens
- function OIDs have debugging values (*) just to avoid merge conflicts
while testing patch

Wiki page of the project is
https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

Waiting for feedback.
-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist_2.0.patch
Description: Binary data


test_performance.sql
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index-only scans for GiST.

2015-02-12 Thread Anastasia Lubennikova
Thanks for answer.
Now it seems to be applied correctly.

2015-02-12 3:12 GMT+04:00 Thom Brown t...@linux.com:

 On 11 February 2015 at 22:50, Anastasia Lubennikova 
 lubennikov...@gmail.com wrote:

 Finally there is a new version of patch (in attachments).
 It provides multicolumn index-only scan for GiST indexes.

 - Memory leak is fixed.
 - little code cleanup
 - example of performance test in attachmens
 - function OIDs have debugging values (*) just to avoid merge
 conflicts while testing patch

 Wiki page of the project is

 https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014

 Waiting for feedback.


 Hi Anastasia.  Thanks for the updated patch.  I've just tried applying it
 to head and it doesn't appear to apply cleanly.

 $ patch -p1  ~/Downloads/indexonlyscan_gist_2.0.patch
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gist.c
 Hunk #1 succeeded at 1404 (offset 9 lines).
 Hunk #2 succeeded at 1434 (offset 9 lines).
 (Stripping trailing CRs from patch; use --binary to disable.)
 patching file src/backend/access/gist/gistget.c
 Hunk #1 succeeded at 227 (offset 1 line).
 Hunk #2 succeeded at 243 (offset 1 line).
 Hunk #3 succeeded at 293 (offset -4 lines).
 Hunk #4 succeeded at 330 (offset -4 lines).
 Hunk #5 succeeded at 365 (offset -5 lines).
 Hunk #6 succeeded at 444 (offset -27 lines).
 Hunk #7 succeeded at 474 (offset -27 lines).
 Hunk #8 FAILED at 518.
 Hunk #9 succeeded at 507 (offset -28 lines).
 Hunk #10 succeeded at 549 with fuzz 1 (offset -28 lines).
 Hunk #11 FAILED at 601.
 2 out of 11 hunks FAILED -- saving rejects to file
 src/backend/access/gist/gistget.c.rej
 ...

 --
 Thom




-- 
Best regards,
Lubennikova Anastasia
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index db2a452..53750da 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1404,6 +1404,14 @@ initGISTstate(Relation index)
 		else
 			giststate-distanceFn[i].fn_oid = InvalidOid;
 
+		/* opclasses are not required to provide a Fetch method */
+		if (OidIsValid(index_getprocid(index, i + 1, GIST_FETCH_PROC)))
+			fmgr_info_copy((giststate-fetchFn[i]),
+		 index_getprocinfo(index, i + 1, GIST_FETCH_PROC),
+		   scanCxt);
+		else
+			giststate-fetchFn[i].fn_oid = InvalidOid;
+
 		/*
 		 * If the index column has a specified collation, we should honor that
 		 * while doing comparisons.  However, we may have a collatable storage
@@ -1426,6 +1434,22 @@ initGISTstate(Relation index)
 	return giststate;
 }
 
+/*
+ * Gistcanreturn is supposed to be true if ANY FetchFn method is defined.
+ * If FetchFn exists it would be used in index-only scan
+ * Thus the responsibility rests with the opclass developer.
+ */
+
+Datum
+gistcanreturn(PG_FUNCTION_ARGS) {
+	Relation index = (Relation) PG_GETARG_POINTER(0);
+	int i = PG_GETARG_INT32(1);
+	if (OidIsValid(index_getprocid(index, i+1, GIST_FETCH_PROC)))
+		PG_RETURN_BOOL(true);
+	else
+		PG_RETURN_BOOL(false);
+}
+
 void
 freeGISTstate(GISTSTATE *giststate)
 {
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 717cb85..0925e56 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -227,8 +227,10 @@ gistindex_keytest(IndexScanDesc scan,
  * If tbm/ntids aren't NULL, we are doing an amgetbitmap scan, and heap
  * tuples should be reported directly into the bitmap.  If they are NULL,
  * we're doing a plain or ordered indexscan.  For a plain indexscan, heap
- * tuple TIDs are returned into so-pageData[].  For an ordered indexscan,
+ * tuple TIDs are returned into so-pageData. For an ordered indexscan,
  * heap tuple TIDs are pushed into individual search queue items.
+ * If index-only scan is possible, heap tuples themselves are returned
+ * into so-pageData or into search queue.
  *
  * If we detect that the index page has split since we saw its downlink
  * in the parent, we push its new right sibling onto the queue so the
@@ -241,6 +243,10 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 	GISTScanOpaque so = (GISTScanOpaque) scan-opaque;
 	Buffer		buffer;
 	Page		page;
+	GISTSTATE *giststate = so-giststate;
+	Relation r = scan-indexRelation;
+	boolisnull[INDEX_MAX_KEYS];
+	GISTSearchHeapItem *tmpListItem;
 	GISTPageOpaque opaque;
 	OffsetNumber maxoff;
 	OffsetNumber i;
@@ -287,8 +293,6 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		MemoryContextSwitchTo(oldcxt);
 	}
 
-	so-nPageData = so-curPageData = 0;
-
 	/*
 	 * check all tuples on page
 	 */
@@ -326,11 +330,20 @@ gistScanPage(IndexScanDesc scan, GISTSearchItem *pageItem, double *myDistances,
 		else if (scan-numberOfOrderBys == 0  GistPageIsLeaf(page))
 		{
 			/*
-			 * Non-ordered scan, so report heap tuples in so-pageData[]
+			 * Non-ordered scan, so report tuples in so-pageData
+			 */
+
+			/* form tmpListItem

Re: [HACKERS] GSoC 2015 - mentors, students and admins.

2015-02-11 Thread Anastasia Lubennikova
I'm interested to participate as student again.


-- 
Best regards,
Lubennikova Anastasia


[HACKERS] GSoC 2015 proposal. Bitmap Index-only Count

2015-03-24 Thread Anastasia Lubennikova
Hi, hackers!
Here is the text of my proposal which I've applied to GSoC.
(and link
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/lubennikovaav/5657382461898752
 )
Any suggestions and comments are welcome.


*Project name*

Bitmap Index-only Count



*Brief review*

There is a problem of slow counting in PostgreSQL [1]. The reason why this
is slow is related to the *MVCC* implementation in PostgreSQL. Index-only
scans (implemented since PostgreSQL-9.2) providing some performance
improvements where the *visibility map* of the table allows it. That’s
good. But it works only for access methods which provide amgettuple method.
Unfortunately GIN supports only BitmapIndexScan and has no implementation
of index_getnext() interface [2]. Idea of the proposal is to implement
Bitmap Index-Only Count method that allows to count elements, without
reference to the heap.



*Benefits to the PostgreSQL Community*

Faster count(*) would be actual for GIN. Probably it would accelerate
count(*) for other access methods too, but I do not sure if it would be
significant difference.



*Quantifiable results*

Acceleration of count(*) queries with WHERE clause which use GIN.



*Project details*

Let’s look at count(*) query:

EXPLAIN  SELECT count(*) from tbl_mytbl where val @ '{b:2}';



Now the query plan looks like this:

Aggregate

   -  Bitmap Heap Scan on tbl_mytbl

 Recheck Cond: (val @ '{b: 2}'::jsonb)

 -  Bitmap Index Scan on idx_mytbl

   Index Cond: (val @ '{b: 2}'::jsonb)



Idea of the proposal is to implement Bitmap Index-Only Count method that
allows to count elements, without reference to the heap.

Conditions:


   -  all tuples are visible (the same problem as for Index-only count(*));
   - result TIDBitmap is not lossy. nchunks = 0;

int nchunks
http://doxygen.postgresql.org/structTIDBitmap.html#a85d5883056bad6863cb47a15446581c7;
/* number of lossy entries in pagetable */


   - pages in TIDBitmap don’t require recheck

* recheck is used only on exact pages --- it indicates that although

* only the stated tuples need be checked, the full index qual condition

* must be checked for each (ie, these are candidate matches).



If all conditions are true, it’s possible to call aggregate COUNT function
for each tuple from TIDBitmap returned by Bitmap Index Scan (or
BitmapAnd/BitmapOr nodes). And there’s no necessity to call Bitmap Heap
Scan.





As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which
would catch tuples from Bitmap Index Scan node and pass them to Aggregate
node. Thus, new query plan will be as follow:



Aggregate

   -  Bitmap Index-only Count

 -  Bitmap Index Scan on idx_mytbl

   Index Cond: (val @ '{b: 2}'::jsonb)





*Project Schedule*

until May 25

Read documentation and source code, clarify details of implementation.

until June 30

Implement Bitmap Index-only Count node.

1 July – 31 July

Add Bitmap Index-only Count node to Planner.

1 August -15 August

Final refactoring, testing and submitting a patch.



*Links*


   1.  https://wiki.postgresql.org/wiki/Slow_Counting
   2.
   
http://postgresql.nabble.com/Todo-item-Support-amgettuple-in-GIN-td5780810.html



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] GSoC 2015 proposal. Bitmap Index-only Count

2015-03-25 Thread Anastasia Lubennikova
2015-03-24 18:01 GMT+04:00 Tom Lane t...@sss.pgh.pa.us:

 Anastasia Lubennikova lubennikov...@gmail.com writes:
  There is a problem of slow counting in PostgreSQL [1]. The reason why
 this
  is slow is related to the *MVCC* implementation in PostgreSQL. Index-only
  scans (implemented since PostgreSQL-9.2) providing some performance
  improvements where the *visibility map* of the table allows it. That’s
  good. But it works only for access methods which provide amgettuple
 method.
  Unfortunately GIN supports only BitmapIndexScan and has no implementation
  of index_getnext() interface [2].

 Right ...

  As a GSoC student I will create new Node “Bitmap Index-Only Scan”, which
  would catch tuples from Bitmap Index Scan node and pass them to Aggregate
  node. Thus, new query plan will be as follow:

 I'm pretty hesitant about adding a whole new plan node type (which will
 require quite a lot of infrastructure) for such a narrow use-case.
 I think the odds are good that if you proceed down this path, you will
 end up with something that never gets committed to Postgres.


Thanks a lot for reply. It was just approximate idea. I thought is wasn't
very good.

I wonder whether it'd be possible to teach GIN to support index_getnext
 instead.  Initially it would probably work only for cases where the
 index didn't have to return any columns ... but if we did it, maybe the
 door would be open to cases where GIN could reconstruct actual values.

 Another idea is to write index_getnext() for GIN which would return some
fake tuple. Since there is no difference for COUNT aggregate what the tuple
contains. COUNT just wants to know whether we have tuple that satisfy the
qual.
Is this idea better? Is it possible for planner to use index_getnext() for
GIN only with COUNT aggregate?


-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Index-only scans for GiST.

2015-02-27 Thread Anastasia Lubennikova
I add MemoryContext listCxt to avoid memory leak. listCxt is created once
in gistrescan (only for index-only scan plan ) and reseted when scan of the
leaf page is finished.

I do not sure if the problem was completely solved, so I wait for feedback.

* What's the reason for turning GISTScanOpaqueData.pageData from an array
to a List?

This array is field of structure GISTScanOpaqueData. Memory for that
structure allocated in function gistbeginscan(). The array is static so
it's declared only one time in structure:
GISTSearchHeapItem  pageData [BLCKSZ/sizeof(IndexTupleData)]

But how could we know size of array if we don't know what data would be
returned? I mean type and amount.

I asked Alexander about that and he offered me to use List instead of Array.


indexonlyscan_gist_2.2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data


test_performance.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Wrong Assert in PageIndexMultiDelete?

2015-05-19 Thread Anastasia Lubennikova
Hi, hackers!

I am trying to create new index access method.
And I found strange Assert in PageIndexMultiDelete
http://doxygen.postgresql.org/bufpage_8c_source.html#l00791 function.

Assert
http://doxygen.postgresql.org/c_8h.html#a706ac5b1a53bd04067f81924b92cb9f6(nitems
 MaxIndexTuplesPerPage
http://doxygen.postgresql.org/itup_8h.html#adb7c94e95ce112eb47d71f7797604ddb
);

Is '' sign is correct? I thougt it should be '='.
Is it a bug or just my misunderstanding?

-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-08-03 Thread Anastasia Lubennikova


On Mon, Aug 3, 2015 at 12:27 PM, Anastasia Lubennikova 
a.lubennik...@postgrespro.ru mailto:a.lubennik...@postgrespro.ru 
wrote:


1) Test and results are in attachments. Everything seems to work
as expected.
2) I dropped these notices. It was done only for debug purposes.
Updated patch is attached.
3) fixed


Good! Another couple of notes from me:
1) I think gistvacuumpage() and gistkillitems() need function-level 
comments.
2) ItemIdIsDead() can be used in index scan like it's done 
in _bt_checkkeys().


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 
http://www.postgrespro.com/

The Russian Postgres Company

I've added some comments.
ItemIdIsDead() is used now (just skip dead tuples as not matching the 
quals).
And there is one else check of LSN in gistkillitems to make sure that 
page was not changed between reads.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 36,42  static bool gistinserttuples(GISTInsertState *state, 
GISTInsertStack *stack,
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! 
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
--- 36,42 
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
***
*** 209,214  gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
--- 209,225 
 * because the tuple vector passed to gistSplit won't include this 
tuple.
 */
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 
+   /*
+* If leaf page is full, try at first to delete dead tuples. And then
+* check again.
+*/
+   if ((is_split)  GistPageIsLeaf(page)  GistPageHasGarbage(page))
+   {
+   gistvacuumpage(rel, page, buffer);
+   is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+   }
+ 
if (is_split)
{
/* no space for insertion */
***
*** 1440,1442  freeGISTstate(GISTSTATE *giststate)
--- 1451,1522 
/* It's sufficient to delete the scanCxt */
MemoryContextDelete(giststate-scanCxt);
  }
+ 
+ /*
+  * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+  */
+ static void
+ gistvacuumpage(Relation rel, Page page, Buffer buffer)
+ {
+   OffsetNumber deletable[MaxOffsetNumber];
+   int ndeletable = 0;
+   OffsetNumber offnum,
+   minoff,
+   maxoff;
+ 
+   /*
+* Scan over all items to see which ones need to be deleted according to
+* LP_DEAD flags.
+*/
+   maxoff = PageGetMaxOffsetNumber(page);
+   for (offnum = FirstOffsetNumber;
+offnum = maxoff;
+offnum = OffsetNumberNext(offnum))
+   {
+   ItemId  itemId = PageGetItemId(page, offnum);
+ 
+   if (ItemIdIsDead(itemId))
+   deletable[ndeletable++] = offnum;
+   }
+ 
+   if (ndeletable  0)
+   {
+   START_CRIT_SECTION();
+ 
+   PageIndexMultiDelete(page, deletable, ndeletable);
+ 
+   /*
+* Mark the page as not containing any LP_DEAD items.  This is 
not
+* certainly true (there might be some that have recently been 
marked,
+* but weren't included in our target-item list), but it will 
almost
+* always be true and it doesn't seem worth an additional page 
scan to
+* check it. Remember that F_HAS_GARBAGE is only a hint anyway.
+*/
+   GistClearPageHasGarbage(page);
+ 
+   MarkBufferDirty(buffer);
+ 
+   /* XLOG stuff */
+   if (RelationNeedsWAL(rel))
+   {
+   XLogRecPtr  recptr;
+ 
+   recptr = gistXLogUpdate(rel-rd_node, buffer,
+   
deletable, ndeletable,
+   NULL, 
0, InvalidBuffer);
+ 
+   PageSetLSN(page, recptr);
+   }
+   else

[HACKERS] [PATCH] Microvacuum for gist.

2015-07-30 Thread Anastasia Lubennikova
Hi,

I have written microvacuum support for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all transactions it
is marked LP_DEAD and page is marked as has dead tuples,
2. Then, when insert touches full page which has dead tuples it calls
microvacuum instead of splitting page.
You can find a kind of review here [1].

[1]
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120

Patch is in attachements. Please review it.

-- 
Best regards,
Lubennikova Anastasia


microvacuum_for_gist.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-08-03 Thread Anastasia Lubennikova



30.07.2015 16:33, Alexander Korotkov пишет:

Hi!

On Thu, Jul 30, 2015 at 2:51 PM, Anastasia Lubennikova 
lubennikov...@gmail.com mailto:lubennikov...@gmail.com wrote:


I have written microvacuum support for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all
transactions it is marked LP_DEAD and page is marked as has dead
tuples,
2. Then, when insert touches full page which has dead tuples it
calls microvacuum instead of splitting page.
You can find a kind of review here [1].

[1]

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120

Patch is in attachements. Please review it.


Nice!

Some notes about this patch.

1) Could you give same test case demonstrating that microvacuum really 
work with patch? Finally, we should get index less growing with 
microvacuum.


2) Generating notices for every dead tuple would be too noisy. I 
suggest to replace notice with one of debug levels.


+ elog(NOTICE, gistkillitems. Mark Item Dead offnum %hd, blkno %d, 
offnum, BufferGetBlockNumber(buffer));



3) Please, recheck coding style. For instance, this line needs more 
spaces and open brace should be on the next line.


+ if ((scan-kill_prior_tuple)(so-curPageData  
0)(so-curPageData == so-nPageData)) {


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 
http://www.postgrespro.com/

The Russian Postgres Company
1) Test and results are in attachments. Everything seems to work as 
expected.
2) I dropped these notices. It was done only for debug purposes. Updated 
patch is attached.

3) fixed
//

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 36,42  static bool gistinserttuples(GISTInsertState *state, 
GISTInsertStack *stack,
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! 
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
--- 36,42 
 bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
GISTSTATE *giststate, List *splitinfo, bool 
releasebuf);
! static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
  
  #define ROTATEDIST(d) do { \
SplitedPageLayout 
*tmp=(SplitedPageLayout*)palloc(sizeof(SplitedPageLayout)); \
***
*** 209,214  gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
--- 209,225 
 * because the tuple vector passed to gistSplit won't include this 
tuple.
 */
is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 
+   /*
+* If leaf page is full, try at first to delete dead tuples. And then
+* check again.
+*/
+   if ((is_split)  GistPageIsLeaf(page)  GistPageHasGarbage(page))
+   {
+   gistvacuumpage(rel, page, buffer);
+   is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+   }
+ 
if (is_split)
{
/* no space for insertion */
***
*** 1440,1442  freeGISTstate(GISTSTATE *giststate)
--- 1451,1519 
/* It's sufficient to delete the scanCxt */
MemoryContextDelete(giststate-scanCxt);
  }
+ 
+ static void
+ gistvacuumpage(Relation rel, Page page, Buffer buffer)
+ {
+   OffsetNumber deletable[MaxOffsetNumber];
+   int ndeletable = 0;
+   OffsetNumber offnum,
+   minoff,
+   maxoff;
+ 
+   /*
+* Scan over all items to see which ones need to be deleted according to
+* LP_DEAD flags.
+*/
+   maxoff = PageGetMaxOffsetNumber(page);
+   for (offnum = FirstOffsetNumber;
+offnum = maxoff;
+offnum = OffsetNumberNext(offnum))
+   {
+   ItemId  itemId = PageGetItemId(page, offnum);
+ 
+   if (ItemIdIsDead(itemId))
+   deletable[ndeletable++] = offnum;
+   }
+ 
+   if (ndeletable  0)
+   {
+   START_CRIT_SECTION();
+ 
+   PageIndexMultiDelete(page, deletable, ndeletable);
+ 
+   /*
+* Mark the page as not containing any LP_DEAD items.  This is 
not
+* certainly true (there might be some that have recently been 
marked,
+* but weren't included in our target-item list), but it will 
almost
+* always be true and it doesn't seem worth an additional

Re: [HACKERS] How to compare different datums within from a tuple?

2015-08-11 Thread Anastasia Lubennikova

 Can someone tell me, how I can compare two datum fields, when I do not
 know the data type in advance inside an executor function?


In a nutshell, there is no way to compare Datums.
Datum is an abstact data type. It's the backend internal representation of
a single value of any SQL data type.
The code using Datum has to know which type it is, since the Datum itself
doesn't contain that information.


 For example, x less than y where x and y are of various types that form
 intervals. I have found the method ExecTuplesMatch, but it is only for
 equality comparison, I think. Another one is ApplySortComparator... maybe
 that's the correct way to go?

 Some code to make things clearer...

 Datum x = heap_getattr(out-tts_tuple,
 node-xpos,
 out-tts_tupleDescriptor,
 isNull1);
 Datum y = slot_getattr(curr, node-ypos, isNull2);

 if (compareDatumWithCorrectMethod(x,y)  0)
 {
  /* do something */
 }

 Thx,
 Peter


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


Maybe you can use DatumGetXXX function to extract value. For example,
DatumGetInt32.
http://doxygen.postgresql.org/postgres_8h.html#aacbc8a3ac6d52d85feaf0b7ac1b1160c
-- 
Best regards,
Lubennikova Anastasia


[HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
Hi,

I'm working on microvacuum for gist access method.
Briefly microvacuum includes two steps:
1. When search tells us that the tuple is invisible to all transactions it
is marked LP_DEAD and page is marked as has dead tuples,
2. Then, when insert touches full page which has dead tuples it calls
microvacuum instead of splitting page.
You can find a kind of review here [1].

While writing patch, I found strange GISTPageOpaqueData flag -
F_TUPLES_DELETED
http://doxygen.postgresql.org/gist_8h.html#a23812efd70313b9b10ae61376e2594f6
.
Its description looks like it is the same for BTP_HAS_GARBAGE
http://doxygen.postgresql.org/nbtree_8h.html#a3b7c77849276ff8617edc1f84441c230

#define F_TUPLES_DELETED   (1  2) /* some tuples on the page are dead */

#define BTP_HAS_GARBAGE   (1  6) /* page has LP_DEAD tuples */

But it's definitely not the same things. I found only two mentions of this
flag.
Function GistMarkTuplesDeleted
http://doxygen.postgresql.org/gist_8h.html#a96fc3c6bb5aecfc8d2818b7010d68aac
sets
the flag after dead tuples deletion.

Do anyone need it at all? I found no place where this flag is checked.
Is it correct using of the flag?

I need an advice, what would be better:
- to add new flag like F_HAS_GARBAGE,
- or to delete all mentions of F_TUPLES_DELETED and use it in gist
microvacuum.


[1]
http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/ivanitskiy_ilya/5629499534213120
-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] Microvacuum for gist. Question about GISTPageOpaqueData flag

2015-07-27 Thread Anastasia Lubennikova
2015-07-27 20:05 GMT+04:00 Heikki Linnakangas hlinn...@iki.fi:

 On 07/27/2015 06:46 PM, Teodor Sigaev wrote:

 I need an advice, what would be better:
 - to add new flag like F_HAS_GARBAGE,
 - or to delete all mentions of F_TUPLES_DELETED and use it in gist
 microvacuum.


 According to commit message:
 commit 2effb72e682a7dbdc9a8a60a80c22ec1fa9d8079
 Author: Heikki Linnakangas heikki.linnakan...@iki.fi
 Date:   Fri Nov 7 15:03:46 2014 +0200
 ..
   The code that generated a record to clear the F_TUPLES_DELETED flag
 hasn't
   existed since we got rid of old-style VACUUM FULL. I kept the code
 that sets
   the flag, although it's not used for anything anymore, because it
 might
   still be interesting information for debugging purposes that some
 tuples
   have been deleted from a page.
 ..

 If Heikki doesn't change his opinion then introduce new flag. Although I
 don't
 think that we need to keep F_TUPLES_DELETED.


 It's certainly not needed for anything at the moment, although conceivably
 we might reintroduce code that needs it in the future. There are plenty of
 flag bits available, so let's use a new flag. If there was a shortage, I
 wouldn't blink reusing F_TUPLES_DELETED.

 - Heikki


Thanks for the quick reply

-- 
Best regards,
Lubennikova Anastasia


[HACKERS]WIP: Covering + unique indexes.

2015-10-08 Thread Anastasia Lubennikova


Hi hackers,

I'm working on a patch that allows to combine covering and unique 
functionality for btree indexes.


_Previous discussion was here:_
1) Proposal thread 
<http://www.postgresql.org/message-id/55f2ccd0.7040...@postgrespro.ru>
2) Message with proposal clarification 
<http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru>


In a nutshell, the feature allows to create index with "key" columns and 
"included" columns.
"key" columns can be used as scan keys. Unique constraint relates only 
to "key" columns.

"included" columns may be used as scan keys if they have suitable opclass.
Both "key" and "included" columns can be returned from index by 
IndexOnlyScan.


Btree is the default index and it's used everywhere. So it requires 
properly testing. Volunteers are welcome)


_Use case:_
- We have a table (c1, c2, c3, c4);
- We need to have an unique index on (c1, c2).
- We would like to have a covering index on all columns to avoid reading 
of heap pages.


Old way:
CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);

What's wrong?
Two indexes contain repeated data. Overhead to data manipulation 
operations and database size.


New way:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);

The patch is attached.
In 'test.sql' you can find a test with detailed comments on each step, 
and comparison of old and new indexes.


New feature has following syntax:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
Keyword INCLUDING defines the "included" columns of index. These columns 
aren't concern to unique constraint.
Also, them are not stored in index inner pages. It allows to decrease 
index size.


_Results:_
1) Additional covering index is not required anymore.
2) New index can use IndexOnlyScan on queries, where old index can't.

For example,
explain analyze select c1, c2 from newt where c1<1 and c3<20;

*more examples in 'test.sql'

_Future work:_
To do opclasses for "included" columns optional.

CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000;
SELECT * FROM tbl WHERE c1=1000;

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company





test.sql
Description: application/sql
diff --git a/src/backend/access/common/indextuple.c b/src/backend/access/common/indextuple.c
index dc588d7..83f24c3 100644
--- a/src/backend/access/common/indextuple.c
+++ b/src/backend/access/common/indextuple.c
@@ -19,6 +19,7 @@
 #include "access/heapam.h"
 #include "access/itup.h"
 #include "access/tuptoaster.h"
+#include "utils/rel.h"
 
 
 /* 
@@ -441,3 +442,30 @@ CopyIndexTuple(IndexTuple source)
 	memcpy(result, source, size);
 	return result;
 }
+
+/*
+ * Reform index tuple. Truncate nonkey (INCLUDED) attributes.
+ */
+IndexTuple
+index_reform_tuple(Relation idxrel, IndexTuple olditup, int natts, int nkeyatts)
+{
+	TupleDesc   itupdesc = RelationGetDescr(idxrel);
+	Datum   values[INDEX_MAX_KEYS];
+	boolisnull[INDEX_MAX_KEYS];
+	IndexTuple	newitup;
+
+	Assert(natts <= INDEX_MAX_KEYS);
+	Assert(nkeyatts > 0);
+	Assert(nkeyatts <= natts);
+
+	index_deform_tuple(olditup, itupdesc, values, isnull);
+
+	/* form new tuple that will contain only key attributes */
+	itupdesc->natts = nkeyatts;
+	newitup = index_form_tuple(itupdesc, values, isnull);
+	newitup->t_tid = olditup->t_tid;
+
+	itupdesc->natts = natts;
+
+	return newitup;
+}
diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..d14df12 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -108,18 +108,23 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 			 IndexUniqueCheck checkUnique, Relation heapRel)
 {
 	bool		is_unique = false;
-	int			natts = rel->rd_rel->relnatts;
+	int			nkeyatts = rel->rd_rel->relnatts;
 	ScanKey		itup_scankey;
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
 
+	Assert (rel->rd_index != NULL);
+	Assert(rel->rd_index->indnatts != 0);
+	Assert(rel->rd_index->indnkeyatts != 0);
+	nkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
 
 top:
 	/* find the first page containing this key */
-	stack = _bt_search(rel, natts, itup_scankey, false, , BT_WRITE);
+	stack = _bt_search(rel, nkeyatts, itup_scankey, false, , BT_WRITE);
 
 	offset = InvalidOffset

Re: [HACKERS]WIP: Covering + unique indexes.

2015-10-09 Thread Anastasia Lubennikova



08.10.2015 19:31, Thom Brown пишет:

On 8 October 2015 at 16:18, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Hi hackers,

I'm working on a patch that allows to combine covering and unique
functionality for btree indexes.

Previous discussion was here:
1) Proposal thread
2) Message with proposal clarification

In a nutshell, the feature allows to create index with "key" columns and
"included" columns.
"key" columns can be used as scan keys. Unique constraint relates only to
"key" columns.
"included" columns may be used as scan keys if they have suitable opclass.
Both "key" and "included" columns can be returned from index by
IndexOnlyScan.

Btree is the default index and it's used everywhere. So it requires properly
testing.  Volunteers are welcome)

Use case:
- We have a table (c1, c2, c3, c4);
- We need to have an unique index on (c1, c2).
- We would like to have a covering index on all columns to avoid reading of
heap pages.

Old way:
CREATE UNIQUE INDEX olduniqueidx ON oldt USING btree (c1, c2);
CREATE INDEX oldcoveringidx ON oldt USING btree (c1, c2, c3, c4);

What's wrong?
Two indexes contain repeated data. Overhead to data manipulation operations
and database size.

New way:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);

The patch is attached.
In 'test.sql' you can find a test with detailed comments on each step, and
comparison of old and new indexes.

New feature has following syntax:
CREATE UNIQUE INDEX newidx ON newt USING btree (c1, c2) INCLUDING (c3, c4);
Keyword INCLUDING defines the "included" columns of index. These columns
aren't concern to unique constraint.
Also, them are not stored in index inner pages. It allows to decrease index
size.

Results:
1) Additional covering index is not required anymore.
2) New index can use IndexOnlyScan on queries, where old index can't.

For example,
explain analyze select c1, c2 from newt where c1<1 and c3<20;

*more examples in 'test.sql'

Future work:
To do opclasses for "included" columns optional.

CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass on
it.
But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000;
SELECT * FROM tbl WHERE c1=1000;

The definition output needs a space after "INCLUDING":

# SELECT 
pg_get_indexdef('people_first_name_last_name_email_idx'::regclass::oid);
  pg_get_indexdef
--
  CREATE UNIQUE INDEX people_first_name_last_name_email_idx ON people
USING btree (first_name, last_name) INCLUDING(email)
(1 row)


There is also no collation output:

# CREATE UNIQUE INDEX test_idx ON people (first_name COLLATE "en_GB",
last_name) INCLUDING (email COLLATE "pl_PL");
CREATE INDEX

# SELECT pg_get_indexdef('test_idx'::regclass::oid);
pg_get_indexdef
-
  CREATE UNIQUE INDEX test_idx ON people USING btree (first_name
COLLATE "en_GB", last_name) INCLUDING(email)
(1 row)


As for functioning, it works as described:

# EXPLAIN SELECT email FROM people WHERE (first_name,last_name) =
('Paul','Freeman');
 QUERY PLAN
--
  Index Only Scan using people_first_name_last_name_email_idx on people
  (cost=0.28..1.40 rows=1 width=21)
Index Cond: ((first_name = 'Paul'::text) AND (last_name = 'Freeman'::text))
(2 rows)


Typo:

"included columns must not intersects with key columns"

should be:

"included columns must not intersect with key columns"


Thank you for testing. Mentioned issues are fixed.


One thing I've noticed you can do with your patch, which you haven't
mentioned, is have a non-unique covering index:

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
CREATE INDEX

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul';
QUERY PLAN
-
  Index Only Scan using covering_idx on people  (cost=0.28..1.44 rows=4 
width=13)
Index Cond: (first_name = 'Paul'::text)
(2 rows)

But this appears to behave as if it were a regular multi-column index,
in that it will use the index for ordering rather than sort after
fetching from the index.  So is this really stored the same as a
multi-column index?  The index sizes aren't identical, so some

[HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-08-31 Thread Anastasia Lubennikova

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in B-tree 
index.
The main idea is to implement posting lists and posting trees for B-tree 
index pages as it's already done for GIN.


In a nutshell, effective storing of duplicates in GIN is organised as 
follows.
Index stores single index tuple for each unique key. That index tuple 
points to posting list which contains pointers to heap tuples (TIDs). If 
too many rows having the same key, multiple pages are allocated for the 
TIDs and these constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme 
<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> 
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to posting 
list/tree and significantly decrease index size. Read more in 
presentation (part 1) 
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.


Now new B-tree index tuple must be inserted for each table row that we 
index.
It can possibly cause page split. Because of MVCC even unique index 
could contain duplicates.

Storing duplicates in posting list/tree helps to avoid superfluous splits.

So it seems to be very useful improvement. Of course it requires a lot 
of changes in B-tree implementation, so I need approval from community.


1. Compatibility.
It's important to save compatibility with older index versions.
I'm going to change BTREE_VERSION to 3.
And use new (posting) features for v3, saving old implementation for v2.
Any objections?

2. There are several tricks to handle non-unique keys in B-tree.
More info in btree readme 
<https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/README> 
(chapter - Differences to the Lehman & Yao algorithm).

In the new version they'll become useless. Am I right?

3. Microvacuum.
Killed items are marked LP_DEAD and could be deleted from separate page 
at time of insertion.
Now it's fine, because each item corresponds with separate TID. But 
posting list implementation requires another way. I've got two ideas:

First is to mark LP_DEAD only those tuples where all TIDs are not visible.
Second is to add LP_DEAD flag to each TID in posting list(tree). This 
way requires a bit more space, but allows to do microvacuum of posting 
list/tree.

Which one is better?

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Adding since-version tags to the docs?

2015-08-31 Thread Anastasia Lubennikova

31.08.2015 14:06, Shulgin, Oleksandr пишет:

Hello,

I often find it pity that our docs are missing any information on 
since when a certain GUC setting, SQL-level command or function was 
introduced.


Clicking through the "this page in other versions" links at the top of 
a webpage does help, but you still need to do some guessing (binary 
search?) with the clicks. :-)


It would be nice if we could make a script that would parse the sgml 
files and for every symbol it finds it would add a tag like "Since 
version 9.x".  Such a script could start by checking out REL9_0_STABLE 
and looking through all symbols it can find, tagging them "Since 
9.0".  Then it could commit the result, check out the next version 
branch and apply said commit (some manual effort to merge it might be 
required), and repeat the process, assuming all newly found symbols 
must be introduced in this new version.


That is for the lists, tabular representation might require adding a 
new column, I'm not sure what would be the best format.


After this process is done once, we can have a requirement that every 
newly introduced symbol/command be tagged manually by the patch author.


Do you think such approach will work?  Is there interest in having 
this done?


--
Alex

I think that you're looking for Feature matrix 
<http://www.postgresql.org/about/featurematrix/>.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Should \o mean "everything?"

2015-09-01 Thread Anastasia Lubennikova



31.08.2015 23:25, David Fetter пишет:

On Mon, Aug 31, 2015 at 07:18:02PM +, Kevin Grittner wrote:

David Fetter <da...@fetter.org> wrote:


In a failed attempt to send the output of \pset to a pipe, I
noticed that for reasons I find difficult to explain, not every
output gets redirected with \o.

At first blush, I'd consider this inconsistency as a bug.

What have I missed?

The documentation says:

| Arranges to save future query results to the file filename or
| pipe future results to the shell command command. If no argument
| is specified, the query output is reset to the standard output.
|
| "Query results" includes all tables, command responses, and
| notices obtained from the database server, as well as output of
| various backslash commands that query the database (such as \d),
| but not error messages.

Are you seeing anything inconsistent with the documentation?  If
so, what?

Perhaps an example would help clarify...

postgres=# \o | perl -pE 's/^/PREFIXED!/'
postgres=# \dt
postgres=# PREFIXED!No relations found.

postgres=# \set
AUTOCOMMIT = 'on'
ON_ERROR_STOP = ''
PROMPT1 = '%/%R%# '
PROMPT2 = '%/%R%# '
PROMPT3 = '>> '
VERBOSITY = 'default'
VERSION = 'PostgreSQL 9.4.4 on x86_64-unknown-linux-gnu, compiled by gcc 
(Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit'
DBNAME = 'postgres'
USER = 'postgres'
HOST = '/var/run/postgresql'
PORT = '5432'
ENCODING = 'UTF8'
PSQL_EDITOR = '"/usr/local/bin/vim"'

Cheers,
David.

Thanks for your test case. It made the question clear.
That's definitely not a bug. Just not use case of "\o" command.
Maybe documentation is a bit vague here.

| "Query results" includes <...> output of various backslash commands 
that query the database (such as \d),

| but not error messages.

As I understand, this phrase means ONLY those commands which starts with 
'\d' (such as \dt, \di, \des etc.)


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] Effective storage of duplicates in B-tree index.

2015-09-03 Thread Anastasia Lubennikova



01.09.2015 21:23, Peter Geoghegan:

On Mon, Aug 31, 2015 at 12:41 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Now new B-tree index tuple must be inserted for each table row that we
index.
It can possibly cause page split. Because of MVCC even unique index could
contain duplicates.
Storing duplicates in posting list/tree helps to avoid superfluous splits.

I'm glad someone is thinking about this, because it is certainly
needed. I thought about working on it myself, but there is always
something else to do. I should be able to assist with review, though.

Thank you)

So it seems to be very useful improvement. Of course it requires a lot of
changes in B-tree implementation, so I need approval from community.

1. Compatibility.
It's important to save compatibility with older index versions.
I'm going to change BTREE_VERSION to 3.
And use new (posting) features for v3, saving old implementation for v2.
Any objections?

It might be better to just have a flag bit for pages that are
compressed -- there are IIRC 8 free bits in the B-Tree page special
area flags variable. But no real opinion on this from me, yet. You
have plenty of bitspace to work with to mark B-Tree pages, in any
case.

Hmm.. If we are talking about storing duplicates in posting lists (and 
trees) as in GIN, I don't see a way how to apply it to separate pages, 
while not applying to others. Look some notes below .



2. There are several tricks to handle non-unique keys in B-tree.
More info in btree readme (chapter - Differences to the Lehman & Yao
algorithm).
In the new version they'll become useless. Am I right?

I think that the L algorithm makes assumptions for the sake of
simplicity, rather than because they really believed that there were
real problems. For example, they say that deletion can occur offline
or something along those lines, even though that's clearly
impractical. They say that because they didn't want to write a paper
about deletion within B-Trees, I suppose.

See also, my opinion of how they claim to not need read locks [1].
Also, note that despite the fact that the GIN README mentions "Lehman
& Yao style right links", it doesn't actually do the L trick of
avoiding lock coupling -- the whole point of L -- so that remark is
misleading. This must be why B-Tree has much better concurrency than
GIN in practice.


Yes, thanks for extensive explanation.
I mean such tricks as moving right in _bt_findinsertloc(), for example.

/*--
 * If we will need to split the page to put the item on this page,
 * check whether we can put the tuple somewhere to the right,
 * instead.  Keep scanning right until we
 *(a) find a page with enough free space,
 *(b) reach the last page where the tuple can legally go, or
 *(c) get tired of searching.
 * (c) is not flippant; it is important because if there are many
 * pages' worth of equal keys, it's better to split one of the early
 * pages than to scan all the way to the end of the run of equal keys
 * on every insert.  We implement "get tired" as a random choice,
 * since stopping after scanning a fixed number of pages wouldn't work
 * well (we'd never reach the right-hand side of previously split
 * pages).  Currently the probability of moving right is set at 0.99,
 * which may seem too high to change the behavior much, but it does an
 * excellent job of preventing O(N^2) behavior with many equal keys.
 *--
 */

If there is no multiple tuples with the same key, we shouldn't care 
about it at all. It would be possible to skip these steps in "effective 
B-tree implementation". That's why I want to change btree_version.



  So I'm really talking about a slightly
different thing -- prefix compression, rather than handling
duplicates. Whether or not you should do prefix compression instead of
deduplication is certainly not clear to me, but it should be
considered. Also, I always imagined that prefix compression would use
the highkey as the thing that is offset for each "real" IndexTuple,
because it's there anyway, and that's simple. However, I suppose that
that means that duplicate handling can't really work in a way that
makes duplicates have a fixed cost, which may be a particularly
important property to you.


You're right, that is two different techniques.
1. Effective storing of duplicates, which I propose, works with equal 
keys. And allow us to delete repeats.

Index tuples are stored like this:

IndexTupleData + Attrs (key) | IndexTupleData + Attrs (key) | 
IndexTupleData + Attrs (key)


If all Attrs are equal, it seems reasonable not to repeat them. So we 
can store it in following structure:


MetaData + Attrs (key) | IndexTupleData | IndexTupleData | IndexTupleData

It is a posting list. It doesn't require significant changes in index 
page layout, because we can use ordinary IndexTupleData for meta 
informa

Re: [HACKERS] PATCH: index-only scans with partial indexes

2015-09-04 Thread Anastasia Lubennikova



25.08.2015 20:19, Jeff Janes пишет:
On Fri, Jul 10, 2015 at 11:29 AM, Tomas Vondra 
<tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> 
wrote:


Hi,

currently partial indexes end up not using index only scans in
most cases, because check_index_only() is overly conservative, as
explained in this comment:

 * XXX this is overly conservative for partial indexes, since we will
 * consider attributes involved in the index predicate as required
even
 * though the predicate won't need to be checked at runtime. (The same
 * is true for attributes used only in index quals, if we are certain
 * that the index is not lossy.)  However, it would be quite expensive
 * to determine that accurately at this point, so for now we take the
 * easy way out.

In other words, unless you include columns from the index
predicate to the index, the planner will decide index only scans
are not possible. Which is a bit unfortunate, because those
columns are not needed at runtime, and will only increase the
index size (and the main benefit of partial indexes is size
reduction).

The attached patch fixes this by only considering clauses that are
not implied by the index predicate. The effect is simple:

create table t as select i as a, i as b from
  generate_series(1,1000) s(i);

create index tidx_partial on t(b) where a > 1000 and a < 2000;

vacuum freeze t;
analyze t;

explain analyze select count(b) from t where a > 1000 and a < 2000;



However, "explain analyze select sum(b) from t where a > 1000 and a < 
1999;" still doesn't use the index only

scan.  Isn't that also implied by the predicate?



In this example it doesn't use IndexOnlyScan correctly. If I understand 
partial indexes right, if index predicate and search clause are not 
equal, index scan must recheck values when it's fetching them.
'tidx_partial' in example above has no information about 'a' attribute, 
beside the index->indpred, so it is impossible to recheck qual without 
referencing to table.


In example:
create index tidx_partial on t(a) where a > 1000 and a < 2000;
explain analyze select sum(a) from t where a > 1000 and a < 1999;
it can use IndexOnlyScan.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-07 Thread Anastasia Lubennikova

04.09.2015 15:11, Teodor Sigaev:

Some notices

1 gistvacuumpage():
OffsetNumber deletable[MaxOffsetNumber];
  Seems, MaxOffsetNumber is too much, MaxIndexTuplesPerPage is enough


Fixed.


2 Loop in gistkillitems() for searching heap pointer. It seems to me that
it could be a performance problem. To fix it, it's needed to remember 
index tuple's offset number somewhere near 
GISTScanOpaqueData->killedItems. And
gistkillitems() will loop over offsets and compare heap pointer from 
killedItems and index tuple, if they doesn't match then just skip this 
index tuple.
Thanks for suggestion. I've rewritten this function. Now killedItems[] 
contains only OffsetNumbers of tuples which we are going to delete. 
PageLSN check is enough to ensure that nothing has changed on the page. 
Heap pointer recheck is unnecessary. (It's really important for btree, 
where tuple could be inserted in the middle of page. But we can't have 
such situation for GiST index page).

It works 50% faster than before.


3 Connected with previous, could you show some performance tests?


Perfomance test is attached.
Test is following - portion of tuples is deleted and after that selected 
several times.


Without microvacuum. All 'select' queries are executed at about same time
Time: 360,468 ms
Time: 243,197 ms
Time: 238,310 ms
Time: 238,018 ms

With microvacuum. First 'select' invokes gistkillitems(). It's executed 
a bit slower than before.
But following queries are executed significantly faster than without 
microvacuum.

Time: 368,780 ms
Time: 69,769 ms
Time: 9,545 ms
Time: 12,427 ms

Please, review the patch again. I could have missed something.

P.S. Do I need to write any documentation update?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

*** a/src/backend/access/gist/gist.c
--- b/src/backend/access/gist/gist.c
***
*** 36,41  static bool gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
--- 36,42 
   bool unlockbuf, bool unlockleftchild);
  static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
  GISTSTATE *giststate, List *splitinfo, bool releasebuf);
+ static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
  
  
  #define ROTATEDIST(d) do { \
***
*** 209,214  gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
--- 210,226 
  	 * because the tuple vector passed to gistSplit won't include this tuple.
  	 */
  	is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 
+ 	/*
+ 	 * If leaf page is full, try at first to delete dead tuples. And then
+ 	 * check again.
+ 	 */
+ 	if (is_split && GistPageIsLeaf(page) && GistPageHasGarbage(page))
+ 	{
+ 		gistvacuumpage(rel, page, buffer);
+ 		is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+ 	}
+ 
  	if (is_split)
  	{
  		/* no space for insertion */
***
*** 1440,1442  freeGISTstate(GISTSTATE *giststate)
--- 1452,1523 
  	/* It's sufficient to delete the scanCxt */
  	MemoryContextDelete(giststate->scanCxt);
  }
+ 
+ /*
+  * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+  */
+ static void
+ gistvacuumpage(Relation rel, Page page, Buffer buffer)
+ {
+ 	OffsetNumber deletable[MaxOffsetNumber];
+ 	int			ndeletable = 0;
+ 	OffsetNumber offnum,
+ minoff,
+ maxoff;
+ 
+ 	/*
+ 	 * Scan over all items to see which ones need to be deleted according to
+ 	 * LP_DEAD flags.
+ 	 */
+ 	maxoff = PageGetMaxOffsetNumber(page);
+ 	for (offnum = FirstOffsetNumber;
+ 		 offnum <= maxoff;
+ 		 offnum = OffsetNumberNext(offnum))
+ 	{
+ 		ItemId		itemId = PageGetItemId(page, offnum);
+ 
+ 		if (ItemIdIsDead(itemId))
+ 			deletable[ndeletable++] = offnum;
+ 	}
+ 
+ 	if (ndeletable > 0)
+ 	{
+ 		START_CRIT_SECTION();
+ 
+ 		PageIndexMultiDelete(page, deletable, ndeletable);
+ 
+ 		/*
+ 		 * Mark the page as not containing any LP_DEAD items.  This is not
+ 		 * certainly true (there might be some that have recently been marked,
+ 		 * but weren't included in our target-item list), but it will almost
+ 		 * always be true and it doesn't seem worth an additional page scan to
+ 		 * check it. Remember that F_HAS_GARBAGE is only a hint anyway.
+ 		 */
+ 		GistClearPageHasGarbage(page);
+ 
+ 		MarkBufferDirty(buffer);
+ 
+ 		/* XLOG stuff */
+ 		if (RelationNeedsWAL(rel))
+ 		{
+ 			XLogRecPtr	recptr;
+ 
+ 			recptr = gistXLogUpdate(rel->rd_node, buffer,
+ 	deletable, ndeletable,
+ 	NULL, 0, InvalidBuffer);
+ 
+ 			PageSetLSN(page, recptr);
+ 		}
+ 		else
+ 			PageSetLSN(page, gistGetFakeLSN(rel));
+ 
+ 		END_CRIT_SECTION();
+ 	}
+ 
+ 	/*
+ 	 * Note: if we didn't find any LP_DEAD items, then the page's
+ 	 * F_HAS_GARBAGE hint bit is falsely set.  We do not bother expending a
+ 	 * separate write to clear it, however.  We will clear it when we split
+ 	 * the page.
+ 	 */
+ }
*** a/src/backend/access/gist/gistget.

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-08 Thread Anastasia Lubennikova


Fixed patch is attached.

08.09.2015 13:47, Teodor Sigaev:

BTW, I slightly modify your test to provide more stable results.


Thank you, I tried it, Results are nearly the same.

Without microvacuum
Time: 312,955 ms
Time: 264,597 ms
Time: 243,286 ms
Time: 243,679 ms

With microvacuum:
Time: 354,097 ms
Time: 82,206 ms
Time: 11,714 ms
Time: 11,277 ms


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0e49959..bb48782 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -36,6 +36,7 @@ static bool gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
  bool unlockbuf, bool unlockleftchild);
 static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
 GISTSTATE *giststate, List *splitinfo, bool releasebuf);
+static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
 
 
 #define ROTATEDIST(d) do { \
@@ -209,6 +210,17 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 	 * because the tuple vector passed to gistSplit won't include this tuple.
 	 */
 	is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+
+	/*
+	 * If leaf page is full, try at first to delete dead tuples. And then
+	 * check again.
+	 */
+	if (is_split && GistPageIsLeaf(page) && GistPageHasGarbage(page))
+	{
+		gistvacuumpage(rel, page, buffer);
+		is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+	}
+
 	if (is_split)
 	{
 		/* no space for insertion */
@@ -1440,3 +1452,70 @@ freeGISTstate(GISTSTATE *giststate)
 	/* It's sufficient to delete the scanCxt */
 	MemoryContextDelete(giststate->scanCxt);
 }
+
+/*
+ * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+ */
+static void
+gistvacuumpage(Relation rel, Page page, Buffer buffer)
+{
+	OffsetNumber deletable[MaxOffsetNumber];
+	int			ndeletable = 0;
+	OffsetNumber offnum, maxoff;
+
+	/*
+	 * Scan over all items to see which ones need to be deleted according to
+	 * LP_DEAD flags.
+	 */
+	maxoff = PageGetMaxOffsetNumber(page);
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemId		itemId = PageGetItemId(page, offnum);
+
+		if (ItemIdIsDead(itemId))
+			deletable[ndeletable++] = offnum;
+	}
+
+	if (ndeletable > 0)
+	{
+		START_CRIT_SECTION();
+
+		PageIndexMultiDelete(page, deletable, ndeletable);
+
+		/*
+		 * Mark the page as not containing any LP_DEAD items.  This is not
+		 * certainly true (there might be some that have recently been marked,
+		 * but weren't included in our target-item list), but it will almost
+		 * always be true and it doesn't seem worth an additional page scan to
+		 * check it. Remember that F_HAS_GARBAGE is only a hint anyway.
+		 */
+		GistClearPageHasGarbage(page);
+
+		MarkBufferDirty(buffer);
+
+		/* XLOG stuff */
+		if (RelationNeedsWAL(rel))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = gistXLogUpdate(rel->rd_node, buffer,
+	deletable, ndeletable,
+	NULL, 0, InvalidBuffer);
+
+			PageSetLSN(page, recptr);
+		}
+		else
+			PageSetLSN(page, gistGetFakeLSN(rel));
+
+		END_CRIT_SECTION();
+	}
+
+	/*
+	 * Note: if we didn't find any LP_DEAD items, then the page's
+	 * F_HAS_GARBAGE hint bit is falsely set.  We do not bother expending a
+	 * separate write to clear it, however.  We will clear it when we split
+	 * the page.
+	 */
+}
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 20f695c..8a8d121 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -24,6 +24,74 @@
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
+/*
+ * gistkillitems() -- set LP_DEAD state for items an indexscan caller has
+ * told us were killed.
+ *
+ * We re-read page here, so it's important to check page LSN. If the page
+ * has been modified since the last read (as determined by LSN), we cannot
+ * flag any entries because it is possible that the old entry was vacuumed
+ * away and the TID was re-used by a completely different heap tuple.
+ */
+static void
+gistkillitems(IndexScanDesc scan)
+{
+	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
+	Buffer		buffer;
+	Page		page;
+	OffsetNumber	offnum;
+	ItemId		iid;
+	int		i;
+	bool		killedsomething = false;
+
+	Assert(so->curBlkno != InvalidBlockNumber);
+	Assert(so->killedItems != NULL);
+
+	buffer = ReadBuffer(scan->indexRelation, so->curBlkno);
+	if (!BufferIsValid(buffer))
+		return;
+
+	LockBuffer(buffer, GIST_SHARE);
+	gistcheckpage(scan->indexRelation, buffer);
+	page = BufferGetPage(buffer);
+
+	/*
+	 * If page LSN differs it means that the page was modified since the last read.
+	 * killedItems could be not valid so LP_DEAD hints applying is not safe.
+	 */
+	if(PageGetLSN(page) != so->curPageLSN)
+	{
+		UnlockReleaseBuffer(buffer);
+		so-&g

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-03 Thread Anastasia Lubennikova



Hi,

I don't know too much about gist, but did a quick read through. Mostly
spotting some stylistic issues. Please fix those making it easier for
the next reviewer.

Thank you for review! All mentioned issues are fixed.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 0e49959..f658831 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -36,6 +36,7 @@ static bool gistinserttuples(GISTInsertState *state, GISTInsertStack *stack,
  bool unlockbuf, bool unlockleftchild);
 static void gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
 GISTSTATE *giststate, List *splitinfo, bool releasebuf);
+static void gistvacuumpage(Relation rel, Page page, Buffer buffer);
 
 
 #define ROTATEDIST(d) do { \
@@ -209,6 +210,17 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
 	 * because the tuple vector passed to gistSplit won't include this tuple.
 	 */
 	is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+
+	/*
+	 * If leaf page is full, try at first to delete dead tuples. And then
+	 * check again.
+	 */
+	if (is_split && GistPageIsLeaf(page) && GistPageHasGarbage(page))
+	{
+		gistvacuumpage(rel, page, buffer);
+		is_split = gistnospace(page, itup, ntup, oldoffnum, freespace);
+	}
+
 	if (is_split)
 	{
 		/* no space for insertion */
@@ -1440,3 +1452,72 @@ freeGISTstate(GISTSTATE *giststate)
 	/* It's sufficient to delete the scanCxt */
 	MemoryContextDelete(giststate->scanCxt);
 }
+
+/*
+ * gistvacuumpage() -- try to remove LP_DEAD items from the given page.
+ */
+static void
+gistvacuumpage(Relation rel, Page page, Buffer buffer)
+{
+	OffsetNumber deletable[MaxOffsetNumber];
+	int			ndeletable = 0;
+	OffsetNumber offnum,
+minoff,
+maxoff;
+
+	/*
+	 * Scan over all items to see which ones need to be deleted according to
+	 * LP_DEAD flags.
+	 */
+	maxoff = PageGetMaxOffsetNumber(page);
+	for (offnum = FirstOffsetNumber;
+		 offnum <= maxoff;
+		 offnum = OffsetNumberNext(offnum))
+	{
+		ItemId		itemId = PageGetItemId(page, offnum);
+
+		if (ItemIdIsDead(itemId))
+			deletable[ndeletable++] = offnum;
+	}
+
+	if (ndeletable > 0)
+	{
+		START_CRIT_SECTION();
+
+		PageIndexMultiDelete(page, deletable, ndeletable);
+
+		/*
+		 * Mark the page as not containing any LP_DEAD items.  This is not
+		 * certainly true (there might be some that have recently been marked,
+		 * but weren't included in our target-item list), but it will almost
+		 * always be true and it doesn't seem worth an additional page scan to
+		 * check it. Remember that F_HAS_GARBAGE is only a hint anyway.
+		 */
+		GistClearPageHasGarbage(page);
+
+		MarkBufferDirty(buffer);
+
+		/* XLOG stuff */
+		if (RelationNeedsWAL(rel))
+		{
+			XLogRecPtr	recptr;
+
+			recptr = gistXLogUpdate(rel->rd_node, buffer,
+	deletable, ndeletable,
+	NULL, 0, InvalidBuffer);
+
+			PageSetLSN(page, recptr);
+		}
+		else
+			PageSetLSN(page, gistGetFakeLSN(rel));
+
+		END_CRIT_SECTION();
+	}
+
+	/*
+	 * Note: if we didn't find any LP_DEAD items, then the page's
+	 * F_HAS_GARBAGE hint bit is falsely set.  We do not bother expending a
+	 * separate write to clear it, however.  We will clear it when we split
+	 * the page.
+	 */
+}
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index 20f695c..e655a05 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -24,6 +24,97 @@
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
+/*
+ * gistkillitems() -- set LP_DEAD state for items an indexscan caller has
+ * told us were killed.
+ *
+ * We match items by heap TID before mark them. If an item has moved off
+ * the current page due to a split, we'll fail to find it and do nothing
+ * (this is not an error case --- we assume the item will eventually get
+ * marked in a future indexscan).
+ *
+ * We re-read page here, so it's important to check page LSN. If the page
+ * has been modified since the last read (as determined by LSN), we cannot
+ * flag any antries because it is possible that the old entry was vacuumed
+ * away and the TID was re-used by a completely different heap tuple.
+ */
+static void
+gistkillitems(IndexScanDesc scan)
+{
+	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
+	Buffer		buffer;
+	Page		page;
+	OffsetNumber minoff;
+	OffsetNumber maxoff;
+	int			i;
+	bool		killedsomething = false;
+
+	Assert(so->curBlkno != InvalidBlockNumber);
+
+	buffer = ReadBuffer(scan->indexRelation, so->curBlkno);
+	if (!BufferIsValid(buffer))
+		return;
+
+	LockBuffer(buffer, GIST_SHARE);
+	gistcheckpage(scan->indexRelation, buffer);
+	page = BufferGetPage(buffer);
+
+	/*
+	 * If page LSN differs it means that the page was modified since the last read.
+	 * killedItemes could be not valid so LP_DEA

Re: [HACKERS] [PATCH] Microvacuum for gist.

2015-09-16 Thread Anastasia Lubennikova



16.09.2015 07:30, Jeff Janes:


The commit of this patch seems to have created a bug in which updated 
tuples can disappear from the index, while remaining in the table.


It looks like the bug depends on going through a crash-recovery cycle, 
but I am not sure of that yet.


I've looked through the commit diff and don't see anything obviously 
wrong.  I notice index tuples are marked dead with only a buffer 
content share lock, and the page is defragmented with only a buffer 
exclusive lock (as opposed to a super-exclusive buffer clean up 
lock).  But as far as I can tell, both of those should be safe on an 
index.  Also, if that was the bug, it should happen without 
crash-recovery.


The test is pretty simple.  I create a 10,000 row table with a 
unique-by-construction id column with a btree_gist index on it and a 
counter column, and fire single-row updates of the counter for random 
ids in high concurrency (8 processes running flat out).  I force the 
server to crash frequently with simulated torn-page writes in which 
md.c writes a partial page and then PANICs.  Eventually (1 to 3 hours) 
the updates start indicating they updated 0 rows.  At that point, a 
forced table scan will find the row, but the index doesn't.


Any hints on how to proceed with debugging this?  If I can't get it to 
reproduce the problem in the absence of crash-recovery cycles with an 
overnight run, then I think my next step will be to run it over 
hot-standby and see if WAL replay in the absence of crashes might be 
broken as well.




I've found the bug. It's because of mixed calls of
PageIndexMultiDelete() in gistvacuumpage() and
PageIndexTupleDelete() in gistRedoPageUpdateRecord().
These functions are conflicting.

I've fixed my patch by change MultiDelete to TupleDelete in 
gistvacuumpage(). Patch is attached.
But It seems to me that it would be better to rewrite all mentions of 
TupleDelete to MultiDelete in gist code.

I'm working on it.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company()
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 4edc5a7..1d02e1b 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -1478,14 +1478,20 @@ gistvacuumpage(Relation rel, Page page, Buffer buffer)
 		ItemId		itemId = PageGetItemId(page, offnum);
 
 		if (ItemIdIsDead(itemId))
-			deletable[ndeletable++] = offnum;
+		{
+			deletable[ndeletable] = offnum - ndeletable;
+			ndeletable++;
+		}
 	}
 
 	if (ndeletable > 0)
 	{
+		int i;
+
 		START_CRIT_SECTION();
 
-		PageIndexMultiDelete(page, deletable, ndeletable);
+		for (i = 0; i < ndeletable; i++)
+			PageIndexTupleDelete(page, deletable[i]);
 
 		/*
 		 * Mark the page as not containing any LP_DEAD items.  This is not

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-15 Thread Anastasia Lubennikova

Proposal Clarification.
I see that discussion become too complicated. So, I'd like to clarify 
what we are talking about.


We are discussing 2 different improvements of index.
The one  is "partially unique index" and the other  "index with included 
columns".

Let's look at example.

- We have a table tbl(f1, f2, f3, f4).
- We want to have an unique index on (f1,f2).
- We want to have an index on (f1, f2, f3) which allow us to use index 
for complex "where" clauses.
- We would like to have a covering index on all columns to avoid reading 
of heap pages.


What are we doing now:
CREATE UNIQUE INDEX on tbl(f1,f2);
CREATE INDEX on tbl(f1, f2, f3, f4);

What's wrong?
- Two indexes with repeated data. Overhead to data manipulation 
operations and database size.

- We don't need f4 as index key. But we have to...
- Problem related to previous. It's possible that f4 has no opclass for 
our index. So there's no way to include it to index.

While we don't need any opclass at all.

Suggestions.
CREATE INDEX idx ON tbl (f1, f2, f3) [UNIQUE ON (f1, f2)] [INCLUDE (f4)];
Let's review it stepby step.

1. "partially unique index"
CREATE INDEX idx ON tbl (f1, f2, f3) UNIQUE ON (f1, f2);
It means that we want to have columns (f1, f2, f3) as index keys in our 
index.

But we want enforce uniqueness only on first two.
It allows us insert (1,1,1), (1,2,1) and restricts insert (1,1,1), (1,1,2).

It doesn't affect "select" queries.
Following query can use index-only scan.
SELECT f1,f2, f3 where f1 ... and f2 ... and f3 ;

We haven't to maintain two indexes now. Just one!

_bt_iseual() 
<http://doxygen.postgresql.org/nbtinsert_8c.html#abcfb7a3dcd40a5d1759652239f3a0115> 
works with (f1, f2)
_bt_compare() 
<http://doxygen.postgresql.org/nbtsearch_8c.html#af689dabb25e99f551b68aa9b7a1e6ea6> 
works with (f1,f2,f3)


2. "index with included columns" It goes well with both unique and 
non-unique indexes.

CREATE INDEX idx ON tbl (f1, f2, f3) INCLUDE (f4);
What we get here:
- f4 is not search key.
- f4 could not have opclass for our index
- f4 is only in the leaf pages and it's not bloating internal nodes of 
b-tree.

- f4 can still be retrieved without going to the heap, so including it
in the leaf nodes makes it possible to do index-only scans more often

Following query can use index-only scan:
SELECT f1,f2, f3, f4 where f1 ... and f2 ... and f3 ;

And this one wouldn't use index-only scan because recheck on f4 is required.
SELECT f1,f2, f3, f4 where f1 ... and f2 ... and f3  and f4;


Catalog changes:
Now:
pg_index 
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>

int16 indnatts; /* number of columns in index */
bool indisunique; /* is this a unique index? */

New:
pg_index
int16 ind_n_unique_atts; /* number of unique columns in index. counted 
from begin of index. 0 means that index is not unique */
int16 ind_n_key_atts; /* number of index key columns in index. counted 
from begin of index.*/

int16 ind_n_total_atts; /* number of columns in index.*/

In our case:
ind_n_unique_atts = 2; // f1, f2
ind_n_key_atts = 3; // f1, f2, f3
ind_n_total_atts = 4; // f1, f2, f3, f4


P.S. I use many ideas from discussion without quotations just because 
I'd like to keep this message readable. Thanks to everyone.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-15 Thread Anastasia Lubennikova

15.09.2015 12:11, Vik Fearing:

On 09/15/2015 10:57 AM, David Rowley wrote:

I agree, that form

CREATE UNIQUE INDEX i ON t (f1, f2, f3) INCLUDE (f4)
is clear. f4 will be used in row compare and actually planner will be able
to use it as unique index (f1, f2, f3) with additional f4 or as
as unique index (f1, f2, f3, f4), because uniqueness on (f1, f2, f3) gives
warranty of uniqueness on (f1, f2, f3, f4)



I'd vote for this too. However, INCLUDE does not seem to be a reserved word
at the moment.

What about CREATE UNIQUE INDEX i ON t (f1, f2, f3) WITH (f4); ?


WITH seems ambiguity to me. It refers to CTE, so I expect to see after 
that a kind of query expression. But maybe that's just matter of habit.


BTW, that's the first syntax change I'm working with.
Is there any convention in PostgreSQL about new keywords and so on? 
Where can I find it?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-11 Thread Anastasia Lubennikova

Hi, hackers!

Use case:
Index-only scans is a wonderful feature that allows to speed up select 
queries of indexed columns.
Therefore some users want to create multicolumn indexes on columns which 
are queried often.
But if there's unique constraint on some column, they have to maintain 
another unique index.

Even if the column is already in covering index.
This adds overhead to data manipulation operations and database size.

I've started work on a patch that allows to combine covering and unique 
functionality.
The main idea is to allow user create multicolumn indexes with a 
definite number of unique columns.

For example (don't mind SQL syntax here, please):
CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c2);
Created index has three columns, but only two of them have unique 
constraint.


This idea has obvious restriction. We can set unique only for first 
index columns.

There is no clear way to maintain following index.
CREATE INDEX index ON table (c1, c2, c3) UNIQUE ON (c1, c3);

So I suggest following syntax:
CREATE [UNIQUE {ON FIRST {COLUMN | n_unique_column COLUMNS}} INDEX ON 
table_name (column_name1, column_name2 ...);


Examples:
CREATE UNIQUE INDEX ON table_name (c1, c2, c3); // (c1,c2, c3) must be 
UNIQUE. That's how it works now.


CREATE UNIQUE ON FIRST COLUMN INDEX ON table_name (c1, c2, c3); // (c1) 
must be UNIQUE
CREATE UNIQUE ON FIRST 2 COLUMNS INDEX ON table_name (c1, c2, c3); // 
(c1,c2) must be UNIQUE
CREATE UNIQUE ON FIRST 3 COLUMNS INDEX ON table_name (c1, c2, c3); // 
(c1,c2, c3) must be UNIQUE


Next issue is pg_index changes.
Now there's only a boolean flag

 * bool indisunique; /* is this a unique index? */

But new algorithm requires to store a single number

 * unit16n_unique_columns; /* number of first columns of index which
   has unique constrains. */

I think, that numbers of all attributes themselves are not needed. Is it 
right?


I'd like to see your suggestions about syntax changes.
And of course any other comments are welcome.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS][PROPOSAL] Covering + unique indexes.

2015-09-15 Thread Anastasia Lubennikova

15.09.2015 12:18, David Rowley:
On 12 September 2015 at 00:45, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:


I've started work on a patch that allows to combine covering and
unique functionality.


Great to hear someone is working on this!


Thank you! It looks like very interesting project to me)


Next issue is pg_index changes.
Now there's only a boolean flag

  * bool indisunique; /* is this a unique index? */

But new algorithm requires to store a single number

  * unit16n_unique_columns; /* number of first columns of index
which has unique constrains. */

I think, that numbers of all attributes themselves are not needed.
Is it right?


I think the total number of attributes is already in indnatts.
I imagine you're planning to keep the indexed columns at the start of 
the indkey[] array, and just use n_unique_columns to mark how many of 
the indkey[] attributes to check as indexed columns. I'd imagine the 
change would be fairly simple from a planner point of view as you'd 
just need to check columns 1 to  n_unique_columns instead of 1 to 
indnatts. Although I have a tendency to under estimate these things :(


I've asked that for the same reason. I'm not too deep in access method 
and btree code, so I don't want to miss any hidden dependencies.
As I see it, changes are required in _bt_isequal() 
<http://doxygen.postgresql.org/nbtinsert_8c.html#abcfb7a3dcd40a5d1759652239f3a0115>. 
Instead of


for (i = 1; i <= keysz; i++) {} // where /keysz/ is actually /natts//= 
rel->rd_rel->relnatts;
/Look at _bt_check_uinque 
<http://doxygen.postgresql.org/nbtinsert_8c.html#a96eb8c53ffdf53f139b037194a9721a3> 
and pg_class 
<http://doxygen.postgresql.org/pg__class_8h.html#ac8bf924d36feee5f3ca4c36aa01c75ec> 
for more info.


cycle should be
for (i = 1; i <= nuinique; i++) {} // where /nunique /is value of 
/rel->rd_index->//indnuinque


//indnuinque /is a new field, which I suggest to add to pg_index 
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a> 
instead of /indisunique/ (or may be in addition to it).


But I'm doubt about the problem which Jim has mentioned.

>Are we certain that no index type could ever support an index on (f1, 
f2, f3) UNIQUE(f1, f3)? Even if it >doesn't make sense for btree, 
perhaps some other index could handle it.


If it's really so, we certainly have to keep in pg_index 
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a> 
not just number of unique columns (/indnunique/), but their numbers 
themselves (an array /indnuniqueatts/ on the model of /indnatts/).


I imagine you don't want to name the new column n_unique_columns, 
since it does not fit too well with non-unique indexes.


Hm, I think that it would be quite clear to set it to zero for 
non-unique indexes.

/(nunique == 0)/ is equal to /(indisunique==false)/.

But maybe I've missed some reason why we should to save /indisunique/ flag.

Perhaps just indindexedatts, or something slightly along those lines. 
But perhaps it would be a good idea to also rename "ncolumns" in code, 
to ensure that any non-updated code does not even compile. Perhaps 
including "tot" or "total" in there might help indicate it's new meaning.
I hope, that all these changes are not needed. Just continue to use 
/ncolumns/ for all indexed columns.  And new variable /nuinique/ (or 
something like that) is for number of unique columns in the index.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-04 Thread Anastasia Lubennikova



03.12.2015 04:03, Robert Haas пишет:

On Tue, Dec 1, 2015 at 7:53 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

If we don't need c4 as an index scankey, we don't need any btree opclass on
it.
But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included
attributes.

OK, I don't get it.  Why have an opclass here at all, even optionally?


We haven't opclass on c4 and there's no need to have it.
But now (without a patch) it's impossible to create covering index, 
which contains columns with no opclass for btree.


test=# create index on tbl using btree (c1, c4);
ERROR:  data type box has no default operator class for access method 
"btree"


ComputeIndexAttrs() processes the list of index attributes and trying to 
get an opclass for each of them via GetIndexOpClass().
The patch drops this check for included attributes. So it makes possible 
to store any datatype in btree  and use IndexOnlyScan advantages.


I hope that this helps to clarify.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-01 Thread Anastasia Lubennikova

Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread, regression 
tests and docs update.

I think it's quite ready for review.

_Future work:_
Besides that, I'd like to get feedback about attached patch 
"optional_opclass_3.0.patch".

It should be applied on the "covering_unique_3.0.patch".

Actually, this patch is the first step to do opclasses for "included" 
columns optional

and implement real covering indexing.

Example:
CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included 
attributes.

To see the difference, look at the explain analyze output:

explain analyze select * from tbl where c1=2 and c4 && box '(0,0,1,1)';
  QUERY PLAN
---
 Index Only Scan using idx on tbl  (cost=0.13..4.15 rows=1 width=36) 
(actual time=0.010..0.013 rows=1 loops=1)

   Index Cond: (c1 = 2)
   Filter: (c4 && '(1,1),(0,0)'::box)

"Index Cond" shows the index ScanKey conditions and "Filter" is for 
conditions which are used after index scan. Anyway it is faster than 
SeqScan that we had before, because IndexOnlyScan avoids extra heap fetches.


As I already said, this patch is just WIP, so included opclass is not 
"optional" but actually "ignored".
And following example works worse than without the patch. Please, don't 
care about it.


CREATE TABLE tbl2 (c1 int, c2 int);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2);
explain analyze select * from tbl2 where c1<20 and c2<5;
  QUERY PLAN
---
 Index Only Scan using idx2 on tbl2  (cost=0.28..4.68 rows=10 width=8) 
(actual time=0.055..0.066 rows=9 loops=1)

   Index Cond: (c1 < 20)
   Filter: (c2 < 5)

The question is more about suitable syntax.
We have two different optimizations here:
1. INCLUDED columns
2. Optional opclasses
It's logical to provide optional opclasses only for included columns.
Is it ok, to handle it using the same syntax and resolve all opclass 
conflicts while create index?


CREATE TABLE tbl2 (c1 int, c2 int, c4 box);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2, c4);
CREATE UNIQUE INDEX idx3 ON tbl2 USING btree (c1) INCLUDING (c4, c2);

Of course, order of attributes is important.
Attrs which have oplass and want to use it in ScanKey must be situated 
before the others.
idx2 will use c2 in IndexCond, while idx3 will not. But I think that 
it's the job for DBA.


If you see any related changes in planner, please mention them. I 
haven't explored that part of code yet and could have missed something.


--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b4ea227..4973e1b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcanincluding
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3704,6 +3711,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..0287c62 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,11 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   An access method that supports this feature sets pg_am.amcanincluding true.
+   (At present, only b-tree supports them.)
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/i

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-13 Thread Anastasia Lubennikova



13.01.2016 04:47, David Rowley :
On 13 January 2016 at 06:47, Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>> wrote:



Why is omit_opclass a separate patch?  If the included columns now
never participate in the index ordering, shouldn't it be an inherent
property of the main patch that you can "cover" things without btree
opclasses?


I don't personally think the covering_unique_4.0.patch is that close 
to being too big to review, I think things would make more sense of 
the omit_opclass_4.0.patch was included together with this.




I agree that these patches should be merged. It'll be fixed it the next 
updates.
I kept them separate only for historical reasons, it was more convenient 
for me to debug them. Furthermore, I wanted to show some performance 
degradation caused by "omit_opclass" and give a way to reproduce it 
performing test with and whithot the patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-13 Thread Anastasia Lubennikova

13.01.2016 04:27, David Rowley:

I've also done some testing:

create table ab (a int, b int);
insert into ab select a,b from generate_Series(1,10) a(a), 
generate_series(1,1) b(b);

set enable_bitmapscan=off;
set enable_indexscan=off;

select * from ab where a = 1 and b=1;
 a | b
---+---
 1 | 1
(1 row)

set enable_indexscan = on;
select * from ab where a = 1 and b=1;
 a | b
---+---
(0 rows)

This is broken. I've not looked into why yet, but from looking at the 
EXPLAIN output I was a bit surprised to see b=1 as an index condition. 
I'd have expected a Filter maybe, but I've not looked at the EXPLAIN 
code to see how those are determined yet.


Hmm... Do you use both patches?
And could you provide index definition, I can't reproduce the problem 
assuming that index is created by the statement

CREATE INDEX idx ON ab (a) INCLUDING (b);

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova


04.01.2016 11:49, David Rowley:
On 2 December 2015 at 01:53, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:


Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread,
regression tests and docs update.
I think it's quite ready for review.


Hi Anastasia,

I've maybe mentioned before that I think this is a great feature and I 
think it will be very useful to have, so I've signed up to review the 
patch, and below is the results of my first pass from reading the 
code. Apologies if some of the things seem like nitpicks, I've 
basically just listed everything I've noticed during, no matter how small.


First of all, I would like to thank you for writing such a detailed review.
All mentioned style problems, comments and typos are fixed in the patch 
v4.0.
+   An access method that supports this feature sets 
pg_am.amcanincluding true.


I don't think this belongs under the "Index Uniqueness Checks" title. 
I think the "Columns included with clause INCLUDING  aren't used to 
enforce uniqueness." that you've added before it is a good idea, but 
perhaps the details of amcanincluding are best explained elsewhere.

agree


+This clause specifies additional columns to be appended to 
the set of index columns.
+Included columns don't support any constraints 
(UNIQUE, PRMARY KEY, EXCLUSION CONSTRAINT).
+These columns can improve the performance of some queries 
 through using advantages of index-only scan
+(Or so called covering indexes. 
Covering index is the index that
+covers all columns required in the query and prevents a table 
access).
+Besides that, included attributes are not stored in index 
inner pages.
+It allows to decrease index size and furthermore it provides 
a way to extend included
+columns to store atttributes without suitable opclass (not 
implemented yet).
+This clause could be applied to both unique and nonunique 
indexes.
+It's possible to have non-unique covering index, which 
behaves as a regular

+multi-column index with a bit smaller index-size.
+Currently, only the B-tree access method supports this feature.

"PRMARY KEY" should be "PRIMARY KEY". I ended up rewriting this 
paragraph as follows.


"An optional INCLUDING clause allows a list of columns to 
be specified which will be included in the index, in the non-key 
portion of the index. Columns which are part of this clause cannot 
also exist in the indexed columns portion of the index, and vice 
versa. The INCLUDING columns exist solely to allow more 
queries to benefit from index only scans by including 
certain columns in the index, the value of which would otherwise have 
to be obtained by reading the table's heap. Having these columns in 
the INCLUDING clause in some cases allows 
PostgreSQL to skip the heap read completely. This also 
allows UNIQUE indexes to be defined on one set of columns, 
which can include another set of column in the INCLUDING 
clause, on which the uniqueness is not enforced upon. This can also be 
useful for non-unique indexes as any columns which are not required 
for the searching or ordering of records can defined in the 
INCLUDING clause, which can often reduce the size of the 
index."


Maybe not perfect, but maybe it's an improvement?



Yes, this explanation is much better. I've just added couple of notes.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..a102391 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included wit

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-12 Thread Anastasia Lubennikova
for the data type.

create index on tbl (a) including (b);
CREATE INDEX

This functionality is provided by the attached patch "omit_opclass_4.0", 
which must be applied over covering_unique_4.0.patch.


I see what you were confused about, I'd had the same question at the 
very beginning of the discussion of this patch.
Now it seems a bit more clear to me. INCLUDING columns are not used for 
the searching or ordering of records, so there is no need to check 
whether they have an opclass.  INCLUDING columns perform as expected and 
it agrees with other database experience. And this patch is completed.


But it isn't perfect definitely... I found test case to explain that. 
See below.
That's why we need optional_opclass functionality, which will use 
opclass where possible and omit it in other cases.
This idea have been already described in a message Re: [PROPOSAL] 
Covering + unique indexes 
<http://www.postgresql.org/message-id/55f84df4.5030...@postgrespro.ru>as 
"partially unique index".
I suggest to separate optional_opclass task to ease syntax discussion 
and following review. And I'll implement it in the next patch a bit later.


Test case:
1) patch covering_unique_4.0 + test_covering_unique_4.0
If included columns' opclasses are used, new query plan is the same with 
the old one.

and have nearly the same execution time:

 QUERY PLAN

 Index Only Scan using oldcoveringidx on oldt  (cost=0.43..301.72 
rows=1 width=8) (actual time=0.021..0.676 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.101 ms
 Execution time: 0.697 ms
(5 rows)

 QUERY PLAN

 Index Only Scan using newidx on newt  (cost=0.43..276.51 rows=1 
width=8) (actual time=0.020..0.665 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.082 ms
 Execution time: 0.687 ms
(5 rows)

2) patch covering_unique_4.0 + patch omit_opclass_4.0 + 
test_covering_unique_4.0
Otherwise, new query can not use included column in Index Cond and uses 
filter instead. It slows down the query significantly.

 QUERY PLAN

 Index Only Scan using oldcoveringidx on oldt  (cost=0.43..230.39 
rows=1 width=8) (actual time=0.021..0.722 rows=6 loops=1)

   Index Cond: ((c1 < 1) AND (c3 < 20))
   Heap Fetches: 0
 Planning time: 0.091 ms
 Execution time: 0.744 ms
(5 rows)

 QUERY PLAN

 Index Only Scan using newidx on newt  (cost=0.43..374.68 rows=1 
width=8) (actual time=0.018..2.595 rows=6 loops=1)

   Index Cond: (c1 < 1)
   Filter: (c3 < 20)
   Rows Removed by Filter: 9993
   Heap Fetches: 0
 Planning time: 0.078 ms
 Execution time: 2.612 ms

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..a102391 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,10 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   (At present, only b-tree supports them.)
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..44dddb6 100644

[HACKERS] Some refactoring of index structures .

2016-02-10 Thread Anastasia Lubennikova

Hi, hackers.

Long story short, I'd like to do some refactoring of the code related to 
indexes.


I work on patch which provides INCLUDING columns functional [1]. The 
patch was reviewed again and again, I fixed many issues, but we still 
don't sure enough that all occurrences of /rd_index->indnatts/ are 
replaced with /rd_index->indnkeyatts /accurately. I have no idea how to 
do it without thorough reading of code, so I am doing it now.


I already replaced all these /rd_index->indnatts ///with macro 
/IndexRelationGetNumberOfAttributes/ and /rd_index->indnkeyatts /with 
macro /IndexRelationGetNumberOfKeyAttributes/. But still there are a lot 
of places with vague naming.
For example, /indnatts/, /natts/, /ncolumns/ and so on for the very same 
/rd_index->indnatts. /I changed them with /indnatts./
Or /indexStruct, index, idxForm, index_form/ for /index->rd_index./ They 
are definitely should be unified. I'd prefer indexForm. Any objections?


One more point is indkey in FormData_pg_index 
<http://doxygen.postgresql.org/pg__index_8h.html#a5065be0408f03576083a30c97b43a09a>. 
The same for indkeys 
<http://doxygen.postgresql.org/struct__indxInfo.html#a66d6eab0efe4bcb9fcbb472d403fa981>, 
ii_KeyAttrNumbers 
<http://doxygen.postgresql.org/structIndexInfo.html#a7723798af613a780d505231bad72c0a3> 
, indexkeys 
<http://doxygen.postgresql.org/structIndexOptInfo.html#ab95b0da6ff1e527506239dab1a480b82>. 
With my patch index can have key and non-key (included) columns. 
Therefore indkey is not just the key now. It contains both key and 
included columns. And its length is not indnkeyatts as one might think, 
but indnatts. I suppose it's worth to rename it somehow. But I can't 
find the proper name. Do you have any ideas?


If you see any related changes, please, let me know.

1. 
http://www.postgresql.org/message-id/flat/56168952.4010...@postgrespro.ru#56168952.4010...@postgrespro.ru


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

28.01.2016 20:03, Thom Brown:
On 28 January 2016 at 16:12, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:



28.01.2016 18:12, Thom Brown:


On 28 January 2016 at 14:06, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru
<mailto:a.lubennik...@postgrespro.ru>> wrote:


    31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate
keys in B-tree index.
The main idea is to implement posting lists and posting
trees for B-tree index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is
organised as follows.
Index stores single index tuple for each unique key. That
index tuple points to posting list which contains pointers
to heap tuples (TIDs). If too many rows having the same key,
multiple pages are allocated for the TIDs and these
constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme

<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to
posting list/tree and significantly decrease index size.
Read more in presentation (part 1)
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.

Now new B-tree index tuple must be inserted for each table
row that we index.
It can possibly cause page split. Because of MVCC even
unique index could contain duplicates.
Storing duplicates in posting list/tree helps to avoid
superfluous splits.


I'd like to share the progress of my work. So here is a WIP
patch.
It provides effective duplicate handling using posting lists
the same way as GIN does it.

Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) +
key, TID (ip_blkid, ip_posid) + key
with patch:
TID (N item pointers, posting list offset) + key, TID
(ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID
(ip_blkid, ip_posid)

It seems that backward compatibility works well without any
changes. But I haven't tested it properly yet.

Here are some test results. They are obtained by test
functions test_btbuild and test_ginbuild, which you can find
in attached sql file.
i - number of distinct values in the index. So i=1 means that
all rows have the same key, and i=1000 means that all
keys are different.
The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes
for B-tree, which is quite logical - there is no compression
if all the keys are distinct.
Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list
compression yet. So there is still potential to decrease size
of compressed btree.

I'm almost sure, there are still some tiny bugs and missed
functions, but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some
real datasets. Any bug reports and suggestions are welcome.

Here is a couple of useful queries to inspect the data inside
the index pages:
create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral
bt_page_stats('idx', n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral
bt_page_items('idx', n) as bt;

And at last, the list of items I'm going to complete in the
near future:
1. Add storage_parameter 'enable_compression' for btree
access method which specifies whether the index handles
duplicates. default is 'off'
2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly
slower with compressed btree, which is obviously not what we
do want.
4

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-28 Thread Anastasia Lubennikova

28.01.2016 18:12, Thom Brown:
On 28 January 2016 at 14:06, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:



31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in
B-tree index.
The main idea is to implement posting lists and posting trees for
B-tree index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is
organised as follows.
Index stores single index tuple for each unique key. That index
tuple points to posting list which contains pointers to heap
tuples (TIDs). If too many rows having the same key, multiple
pages are allocated for the TIDs and these constitute so called
posting tree.
You can find wonderful detailed descriptions in gin readme

<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README>
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to posting
list/tree and significantly decrease index size. Read more in
presentation (part 1)
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.

Now new B-tree index tuple must be inserted for each table row
that we index.
It can possibly cause page split. Because of MVCC even unique
index could contain duplicates.
Storing duplicates in posting list/tree helps to avoid
superfluous splits.


I'd like to share the progress of my work. So here is a WIP patch.
It provides effective duplicate handling using posting lists the
same way as GIN does it.

Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key,
TID (ip_blkid, ip_posid) + key
with patch:
TID (N item pointers, posting list offset) + key, TID (ip_blkid,
ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)

It seems that backward compatibility works well without any
changes. But I haven't tested it properly yet.

Here are some test results. They are obtained by test functions
test_btbuild and test_ginbuild, which you can find in attached sql
file.
i - number of distinct values in the index. So i=1 means that all
rows have the same key, and i=1000 means that all keys are
different.
The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes for
B-tree, which is quite logical - there is no compression if all
the keys are distinct.
Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list
compression yet. So there is still potential to decrease size of
compressed btree.

I'm almost sure, there are still some tiny bugs and missed
functions, but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some real
datasets. Any bug reports and suggestions are welcome.

Here is a couple of useful queries to inspect the data inside the
index pages:
create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral
bt_page_stats('idx', n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral
bt_page_items('idx', n) as bt;

And at last, the list of items I'm going to complete in the near
future:
1. Add storage_parameter 'enable_compression' for btree access
method which specifies whether the index handles duplicates.
default is 'off'
2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly slower
with compressed btree, which is obviously not what we do want.
4. Clean the code and comments, add related documentation.


This doesn't apply cleanly against current git head. Have you caught 
up past commit 65c5fcd35?


Thank you for the notice. New patch is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9673fe0..0c8e4fb 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

29.01.2016 19:01, Thom Brown:

On 29 January 2016 at 15:47, Aleksander Alekseev
<a.aleks...@postgrespro.ru> wrote:

I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.
Thank you for testing. I rechecked that, and insertions are really very 
very very slow. It seems like a bug.

Okay, now for some badness.  I've restored a database containing 2
tables, one 318MB, another 24kB.  The 318MB table contains 5 million
rows with a sequential id column.  I get a problem if I try to delete
many rows from it:
# delete from contacts where id % 3 != 0 ;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


  ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom


Thank you for this report.
I tried to reproduce it, but I couldn't. Debug will be much easier now.

I hope I'll fix these issueswithin the next few days.

BTW, I found a dummy mistake, the previous patch contains some unrelated 
changes. I fixed it in the new version (attached).


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..3908cc1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,7 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
 
 typedef struct
 {
@@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel,
   ScanKey scankey,
   IndexTuple newtup,
   BTStack stack,
-  Relation heapRel);
+  Relation heapRel,
+  bool *updposing);
 static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf,
 			   BTStack stack,
 			   IndexTuple itup,
@@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	bool updposting = false;
 
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
@@ -162,8 +165,9 @@ top:
 	{
 		TransactionId xwait;
 		uint32		speculativeToken;
+		bool fakeupdposting = false; /* Never update posting in unique index */
 
-		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false, );
 		xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey,
  checkUnique, _unique, );
 
@@ -200,8 +204,54 @@ top:
 		CheckForSerializableConflictIn(rel, NULL, buf);
 		/* do the insertion */
 		_bt_findinsertloc(rel, , , natts, itup_scankey, itup,
-		  stack, heapRel);
-		_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
+		  stack, heapRel, );
+
+		if (IsSystemRelation(rel))
+			updposting = false;
+
+		/*
+		 * New tuple has the same key with tuple at the page.
+		 * Unite them into one posting.
+		 */
+		if (updposting)
+		{
+			Page		page;
+			IndexTuple olditup, newitup;
+			ItemPointerData *ipd;
+			int nipd;
+
+			page = BufferGetPage(buf);
+			olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+
+			if (BtreeTupleIsPosting(olditup))
+nipd = BtreeGetNPosting(olditup);
+			else
+nipd = 1;
+
+			ipd = palloc0(sizeof(ItemPointerData)*(nipd + 1));
+			/* copy item pointers from old tuple into ipd */
+			if (BtreeTupleIsPosting(olditup))
+memcpy(ipd, BtreeGetPosting(olditup), sizeof(ItemPointerData)*nipd);
+			else
+memcpy(ipd, olditup, sizeof(ItemPointerData));
+
+			/* add item pointer of the new tuple into ipd */
+			memcpy(ipd+nipd, itup, sizeof(ItemPointerData));
+
+			/*
+			 * Form posting tuple, 

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-02-02 Thread Anastasia Lubennikova



29.01.2016 20:43, Thom Brown:

On 29 January 2016 at 16:50, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

29.01.2016 19:01, Thom Brown:

On 29 January 2016 at 15:47, Aleksander Alekseev
<a.aleks...@postgrespro.ru>  wrote:

I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.

Thank you for testing. I rechecked that, and insertions are really very very
very slow. It seems like a bug.


Okay, now for some badness.  I've restored a database containing 2
tables, one 318MB, another 24kB.  The 318MB table contains 5 million
rows with a sequential id column.  I get a problem if I try to delete
many rows from it:
# delete from contacts where id % 3 != 0 ;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


   ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom

Thank you for this report.
I tried to reproduce it, but I couldn't. Debug will be much easier now.

I hope I'll fix these issueswithin the next few days.

BTW, I found a dummy mistake, the previous patch contains some unrelated
changes. I fixed it in the new version (attached).

Thanks.  Well I've tested this latest patch, and the warnings are no
longer generated.  However, the index sizes show that the patch
doesn't seem to be doing its job, so I'm wondering if you removed too
much from it.


Huh, this patch seems to be enchanted) It works fine for me. Did you 
perform "make distclean"?

Anyway, I'll send a new version soon.
I just write here to say that I do not disappear and I do remember about 
the issue.
I even almost fixed the insert speed problem. But I'm very very busy 
this week.

I'll send an updated patch next week as soon as possible.

Thank you for attention to this work.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-28 Thread Anastasia Lubennikova

31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate keys in 
B-tree index.
The main idea is to implement posting lists and posting trees for 
B-tree index pages as it's already done for GIN.


In a nutshell, effective storing of duplicates in GIN is organised as 
follows.
Index stores single index tuple for each unique key. That index tuple 
points to posting list which contains pointers to heap tuples (TIDs). 
If too many rows having the same key, multiple pages are allocated for 
the TIDs and these constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme 
<https://github.com/postgres/postgres/blob/master/src/backend/access/gin/README> 
and articles <http://www.cybertec.at/gin-just-an-index-type/>.
It also makes possible to apply compression algorithm to posting 
list/tree and significantly decrease index size. Read more in 
presentation (part 1) 
<http://www.pgcon.org/2014/schedule/attachments/329_PGCon2014-GIN.pdf>.


Now new B-tree index tuple must be inserted for each table row that we 
index.
It can possibly cause page split. Because of MVCC even unique index 
could contain duplicates.

Storing duplicates in posting list/tree helps to avoid superfluous splits.


I'd like to share the progress of my work. So here is a WIP patch.
It provides effective duplicate handling using posting lists the same 
way as GIN does it.


Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) + key, TID 
(ip_blkid, ip_posid) + key

with patch:
TID (N item pointers, posting list offset) + key, TID (ip_blkid, 
ip_posid), TID (ip_blkid, ip_posid), TID (ip_blkid, ip_posid)


It seems that backward compatibility works well without any changes. But 
I haven't tested it properly yet.


Here are some test results. They are obtained by test functions 
test_btbuild and test_ginbuild, which you can find in attached sql file.
i - number of distinct values in the index. So i=1 means that all rows 
have the same key, and i=1000 means that all keys are different.

The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes for B-tree, 
which is quite logical - there is no compression if all the keys are 
distinct.

Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list compression 
yet. So there is still potential to decrease size of compressed btree.


I'm almost sure, there are still some tiny bugs and missed functions, 
but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some real 
datasets. Any bug reports and suggestions are welcome.


Here is a couple of useful queries to inspect the data inside the index 
pages:

create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral bt_page_stats('idx', 
n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral 
bt_page_items('idx', n) as bt;


And at last, the list of items I'm going to complete in the near future:
1. Add storage_parameter 'enable_compression' for btree access method 
which specifies whether the index handles duplicates. default is 'off'

2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly slower with 
compressed btree, which is obviously not what we do want.

4. Clean the code and comments, add related documentation.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index 77c2fdf..3b61e8f 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,7 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
 
 typedef struct
 {
@@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel,
   ScanKey scankey,
   IndexTuple newtup,
   BTStack stack,
-  Relation heapRel);
+  Relation heapRel,
+  bool *updposing);
 static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf,
 			   BTStack stack,
 			   IndexTuple itup,
@@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	bool updposting = false;
 
 	/

Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-29 Thread Anastasia Lubennikova

25.02.2016 21:39, Jeff Janes:

As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.

Thanks for the update patch.  I get a compiler warning:

genam.c: In function 'BuildIndexValueDescription':
genam.c:259: warning: unused variable 'tupdesc'


Thank you for the notice, I'll fix it in the next update.

Also, I can't create a primary key INCLUDING columns directly:

jjanes=# create table foobar (a int, b int, c int);
jjanes=# alter table foobar add constraint foobar_pkey primary key
(a,b) including (c);
ERROR:  syntax error at or near "including"

But I can get there using a circuitous route:

jjanes=# create unique index on foobar (a,b) including (c);
jjanes=# alter table foobar add constraint foobar_pkey primary key
using index foobar_a_b_c_idx;

The description of the table's index knows to include the including column:

jjanes=# \d foobar
 Table "public.foobar"
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
Indexes:
 "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c)


Since the machinery appears to all be in place to have primary keys
with INCLUDING columns, it would be nice if the syntax for adding
primary keys allowed one to implement them directly.

Is this something or future expansion, or could it be added at the
same time as the main patch?


Good point.
At quick glance, this looks easy to implement it. The only problem is 
that there are too many places in code which must be updated.
I'll try to do it, and if there would be difficulties, it's fine with me 
to delay this feature for the future work.


I found one more thing to do. Pgdump does not handle included columns 
now. I will fix it in the next version of the patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-21 Thread Anastasia Lubennikova

30.12.2015 16:01, Aleksander Alekseev:

Here is a clean version of the patch. Step 1 is an optimization. Step 2
refactors this:

  HTAB *
  ShmemInitHash(const char *name, /* table string name for shmem index */
- long init_size,   /* initial table size */
+ long init_size,   /* initial table size XXX ignored,
  refactor! */


Hi, I found that the patch is still not reviewed, so I've looked it over.
I haven't dived into this subject before, so my comments will be more 
general than relating to your investigation.
Sorry if some things seem like nitpicks, I just suppose that clear patch 
would be more convenient for reviewers.


First of all, why not merge both patches into one? They aren't too big 
anyway.
Then, this thread became too tangled. I think it's worth to write a new 
message with the patch, the test script, some results and brief overview 
of how does it really works. It will make following review much easier.


+   /* number of entries in hash table */
+   longnentries[NUM_LOCK_PARTITIONS];
+   /* linked list of free elements */
+   HASHELEMENT *freeList[NUM_LOCK_PARTITIONS];

I think comments should be changed, to be more informative here.

+   if (IS_PARTITIONED(hashp->hctl))
+   partitions_number = NUM_LOCK_PARTITIONS;
+   else
+   partitions_number = 1;
+
+   nelem_alloc = ((int) nelem) / partitions_number;
+   if (nelem_alloc == 0)
+   nelem_alloc = 1;

Add a comment here too.

-/* Number of partitions of the shared buffer mapping hashtable */
-#define NUM_BUFFER_PARTITIONS  128
-
 /* Number of partitions the shared lock tables are divided into */
-#define LOG2_NUM_LOCK_PARTITIONS  4
+#define LOG2_NUM_LOCK_PARTITIONS  7
 #define NUM_LOCK_PARTITIONS  (1 << LOG2_NUM_LOCK_PARTITIONS)

+/* Number of partitions of the shared buffer mapping hashtable */
+#define NUM_BUFFER_PARTITIONS NUM_LOCK_PARTITIONS
+

I'm sure, it was discussed above. but these definitions do not look 
obvious at all.

Maybe you should explain this magic number 7 in the comment above?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Batch update of indexes

2016-01-21 Thread Anastasia Lubennikova


20.01.2016 17:55, Konstantin Knizhnik:

Hi,


Hi, I glad to see that you interested in that too.
I think this is a good feature and I think it will be very useful to 
have.
I have already mentioned some related problems and possible 
improvements in my presentation.
http://www.slideshare.net/AnastasiaLubennikova/indexes-dont-mean-slow-inserts 

Last two slides concern to this thread. Briefly, I've suggested to 
think about insertion buffer. Something very similar to it is already 
implemented in BRIN. It does not index last data from heap, while the 
number of last pages is less than pages_per_block.


Do you mean GIN-like usage of insertion buffer (here it is called 
"pending list")?
So that we have to combine search in the main tree and in the insert 
buffer?
Actually this is what I want to avoided (because at least in case of 
GIN pending list cause significant degrade of performance,

while up-to-date state of full text index is rarely required).



What I meant is more like a BRIN-like combination of an index scan and 
heap scan.

Maybe it could be called "deferred inserts" or "temporary read-only index"
Maybe it's similar with mysql insert buffer 
http://dev.mysql.com/doc/refman/5.7/en/innodb-insert-buffering.html

I think it'll be more clear with example. Please don't care about syntax.

CREATE TABLE tbl (c1 int);
CREATE INDEX idx on tbl(c1);

SET enable_deferred_insert(idx) = on;
At this moment, we save the last_indexed_item (its TID) somewhere in 
index metapage.


Since that moment, the data inserted into the table doesn't touch the index.
We perform some heavy insert and then go back to the normal index behavior.

SET enable_deferred_insert(idx) = off;
This command takes all the data between the last_indexed_item and the 
end of the table, and inserts it into the index at a time.


Of course there are new problems to deal with, but it's really useful 
for the use case to balance irregular heavy write load, isn't it?


BTW, could you explain, what is the reason to copy data into the pending 
list and then copy it again while flushing pending list into the index? 
Why not read this data directly from the table? I feel that I've missed 
something important here.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-26 Thread Anastasia Lubennikova

25.01.2016 03:32, Jeff Janes:

On Fri, Jan 22, 2016 at 7:19 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

Done. I hope that my patch is close to the commit too.


Thanks for the update.

I've run into this problem:

create table foobar (x text, w text);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;

ERROR:  index "foobar_pkey" does not have default sorting behavior
LINE 1: alter table foobar add constraint foobar_pkey primary key us...
^
DETAIL:  Cannot create a primary key or unique constraint using such an index.
Time: 1.577 ms


If I instead define the table as
create table foobar (x int, w xml);

Then I can create the index and then the primary key the first time I
do this in a session.  But then if I drop the table and repeat the
process, I get "does not have default sorting behavior" error even for
this index that previously succeeded, so I think there is some kind of
problem with the backend syscache or catcache.

create table foobar (x int, w xml);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;
drop table foobar ;
create table foobar (x int, w xml);
create unique index foobar_pkey on foobar (x) including (w);
alter table foobar add constraint foobar_pkey primary key using index
foobar_pkey;
ERROR:  index "foobar_pkey" does not have default sorting behavior
LINE 1: alter table foobar add constraint foobar_pkey primary key us...
^
DETAIL:  Cannot create a primary key or unique constraint using such an index.


Great, I've fixed that. Thank you for the tip about cache.

I've also found and fixed related bug in copying tables with indexes:
create table tbl2 (like tbl including all);
And there's one more tiny fix in get_pkey_attnames in dblink module.

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a separate 
patch. Just to ease the review and debug.


I've changed size of pg_index.indclass array. It contains indnkeyatts 
elements now.
While pg_index.indkey still contains all attributes. And this query 
Retrieve primary key columns 
<https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns> provides 
pretty non-obvious result. Is it a normal behavior here or some changes 
are required? Do you know any similar queries?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..ef6aee5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2058,7 +2058,7 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
+			*numatts = index->indnkeyatts;
 			if (*numatts > 0)
 			{
 result = (char **) palloc(*numatts * sizeof(char *));
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 412c845..c201f8b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -3515,6 +3515,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 5f7befb..aaed5a7 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -111,6 +111,8 @@ typedef struct IndexAmRoutine
 boolamclusterable;
 /* does AM handle predicate locks? */
 boolampredlocks;
+/* does AM support columns included with clause INCLUDING? */
+boolamcaninclude;
 /* type of data stored in index, or InvalidOid if variable */
 Oid amkeytype;
 
@@ -852,7 +854,8 @@ amrestrpos (IndexScanDesc scan);
using unique indexes, which are indexes that disallow
multiple entries with identical keys.  An access method that supports this
feature sets amcanunique true.
-   (At present, only b-tree supports it.)
+   (At present, only b-tree supports it.) Columns which are present in the
+   INCLUDING clause are not used to enforce uniqueness.
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..09d4e6b 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Indexes can also be used to enforce uniqueness of a column's value,
or the uniqueness of the combined values of more than one column.
 
-CREATE UNIQUE INDEX name ON table (column , ...);
+CREATE UNIQUE INDEX name ON ta

Re: [HACKERS] Batch update of indexes

2016-01-20 Thread Anastasia Lubennikova
tion plan while 
it is not needed in this case: search condition is exactly the same as 
partial index condition.

Optimal plan should be:

   Index Scan using idx1 on t (cost=0.00..4.13 rows=12263 width=0)
   Index Cond: (c1 < '10'::double precision)


There was the discussion of the patch for partial indexes.
http://postgresql.nabble.com/PATCH-index-only-scans-with-partial-indexes-td5857568.html
 Since I haven't watched it closely, It seems to be open still. I think 
it'll be interesting to you.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-19 Thread Anastasia Lubennikova



18.01.2016 01:02, David Rowley пишет:
On 14 January 2016 at 08:24, David Rowley 
<david.row...@2ndquadrant.com <mailto:david.row...@2ndquadrant.com>> 
wrote:


I will try to review the omit_opclass_4.0.patch soon.


Hi, as promised, here's my review of the omit_opclass_4.0.patch patch.


Thank you again. All mentioned points are fixed and patches are merged.
I hope it's all right now. Please check comments one more time. I rather 
doubt that I wrote everything correctly.
Also this makes me think that the name ii_KeyAttrNumbers is now 
out-of-date, as it contains
the including columns too by the looks of it. Maybe it just needs to 
drop the "Key" and become
"ii_AttrNumbers". It would be interesting to hear what others think of 
that.


I'm also wondering if indexkeys is still a good name for the 
IndexOptInfo struct member.
Including columns are not really keys, but I feel renaming that might 
cause a fair bit of code churn, so I'd be interested to hear what 
other's have to say.


I agree that KeyAttrNumbers and indexkeys are a bit confusing names, but 
I'd like to keep them at least in this patch.

It's may be worth doing "index structures refactoring" as a separate patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 97ef618..d17a06c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcaninclude
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3714,6 +3721,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..d01af17 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -767,7 +767,8 @@ amrestrpos (IndexScanDesc scan);
using unique indexes, which are indexes that disallow
multiple entries with identical keys.  An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   (At present, only b-tree supports it.) Columns included with clause
+   INCLUDING  aren't used to enforce uniqueness.
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 23bbec6..09d4e6b 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX test3_desc_index ON test3 (id DESC NULLS LAST);
Indexes can also be used to enforce uniqueness of a column's value,
or the uniqueness of the combined values of more than one column.
 
-CREATE UNIQUE INDEX name ON table (column , ...);
+CREATE UNIQUE INDEX name ON table (column , ...)
+INCLUDING (column , ...);
 
Currently, only B-tree indexes can be declared unique.
   
@@ -642,7 +643,9 @@ CREATE UNIQUE INDEX name ON tableINCLUDING aren't used to enforce constraints (UNIQUE,
+   PRIMARY KEY, etc).
   
 
   
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index ce36a1b..8360bb6 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -23,6 +23,7 @@ PostgreSQL documentation
 
 CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
 ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
+[ INCLUDING ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
 [ WITH ( storage_parameter = value [, ... ] ) ]
 [ TABLESPACE tablespace_name ]
 [ WHERE predicate ]
@@ -139,6 +140,32 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 
  
+  INCLUDING
+  
+   
+An optional INCLUDING clause allows a list of columns to be
+specified which will be included in the index, in the non-key portion of
+the index. Columns which are part of this clause cannot also exist in the
+key columns portion of the index, and vice versa. The
+INCLUDING columns exist solely to allow more queries to benefit
+from index-only scans by including certain columns in the
+index, the value of which would otherwise have to be obtained by reading
+the table's heap. Having these columns in the INCLUDING clause
+in some cases allows PostgreSQL to skip the heap read
+completely. This also allows UNIQUE indexes to be defined on
+one set of columns, which can include another set of co

Re: [HACKERS]WIP: Covering + unique indexes.

2016-01-19 Thread Anastasia Lubennikova



12.01.2016 20:47, Jeff Janes:

It looks like the "covering" patch, with or without the "omit_opclass"
patch, does not support expressions as included columns:

create table foobar (x text, y xml);
create index on foobar (x) including  (md5(x));
ERROR:  unrecognized node type: 904
create index on foobar (x) including  ((y::text));
ERROR:  unrecognized node type: 911

I think we would probably want it to work with those (or at least to
throw a better error message).
Thank you for the notice. I couldn't fix it quickly and added a stub in 
the latest patch.

But I'll try to fix it and add expressions support a bit later.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: fix lock contention for HASHHDR.mutex

2016-01-27 Thread Anastasia Lubennikova



22.01.2016 13:48, Aleksander Alekseev:

Then, this thread became too tangled. I think it's worth to write a
new message with the patch, the test script, some results and brief
overview of how does it really works. It will make following review
much easier.

Sure.

HASHHDR represents a hash table. It could be usual or partitioned.
Partitioned table is stored in a shared memory and accessed by multiple
processes simultaneously. To prevent data corruption hash table is
partitioned and each process has to acquire a lock for a corresponding
partition before accessing data in it. Number of partition is determine
by lower bits of key's hash value. Most tricky part is --- dynahash
knows nothing about these locks, all locking is done on calling side.

Since shared memory is pre-allocated and can't grow to allocate memory
in a hash table we use freeList. Also we use nentries field to count
current number of elements in a hash table. Since hash table is used by
multiple processes these fields are protected by a spinlock. Which as
it turned out could cause lock contention and create a bottleneck.

After trying a few approaches I discovered that using one spinlock per
partition works quite well. Here are last benchmark results:

http://www.postgresql.org/message-id/20151229184851.1bb7d1bd@fujitsu

Note that "no locks" solution cant be used because it doesn't guarantee
that all available memory will be used in some corner cases.

You can find a few more details and a test script in the first message
of this thread. If you have any other questions regarding this patch
please don't hesitate to ask.

InitLocks
/*
 * Compute init/max size to request for lock hashtables.  Note these
 * calculations must agree with LockShmemSize!
 */

This comment certainly requires some changes.
BTW, could you explain why init_table_size was two times less than 
max_table_size?

Although, I don't see any problems with your changes.


-hctl->nentries = 0;
-hctl->freeList = NULL;

Why did you delete these two lines? I wonder if you should rewrite them 
instead?


+ * This particular number of partitions significantly reduces lock 
contention

+ * in partitioned hash tables, almost if partitioned tables didn't use any
+ * locking at all

As far as I understood, this number was obtained experimentally? Maybe 
you should note that in the comment.



And the last, but not least.

+nelem_alloc = ((int) nelem) / partitions_number;
+if (nelem_alloc == 0)
+nelem_alloc = 1;
+
+for (i = 0; i < partitions_number; i++)
+if (!element_alloc(hashp, nelem_alloc, i))
+ereport(ERROR,
+(errcode(ERRCODE_OUT_OF_MEMORY),
+ errmsg("out of memory")));


It looks like you forgot to round the result of the division.
For example, if you have nelem=25 and partitions_number=6.
25 / 6 = 4. And then you allocate 24 nelems, while 1 is lost.

What about productivity, I did a test on my notebook. Patched version 
shows small performance improvement.


pgbench -j 1 -c 1 -f pgbench.sql -T 300 postgres
base patched
127tps  145tps
pgbench -j 8 -c 8 -f pgbench.sql -T 300 postgres
base patched
247tps  270tps

But I haven't any big machine to perform tests, where the patch is 
supposed to make significant changes.

So I would rely on your and the other reviewers results.

Except mentioned notes, I suppose the patch is good enough to pass it to 
a reviewer.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-02-18 Thread Anastasia Lubennikova

18.02.2016 20:18, Anastasia Lubennikova:

04.02.2016 20:16, Peter Geoghegan:

On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

I fixed it in the new version (attached).


Thank you for the review.
At last, there is a new patch version 3.0. After some refactoring it 
looks much better.
I described all details of the compression in this document 
https://goo.gl/50O8Q0 (the same text without pictures is attached in 
btc_readme_1.0.txt).
Consider it as a rough copy of readme. It contains some notes about 
tricky moments of implementation and questions about future work.

Please don't hesitate to comment it.


Sorry, previous patch was dirty. Hotfix is attached.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Compression. To be correct, it’s not actually compression, but just effective layout of ItemPointers on an index page.
compressed tuple  = IndexTuple (with metadata in TID field+ key) + PostingList


1. Gin index fits extremely good for really large sets of repeating keys, but on the other hand, it completely fails to handle unique keys. To btree it is essential to have good performance and concurrency in any corner cases with any number of duplicates. That’s why we can’t just copy the gin implementation of  item pointers compression. The first difference is that btree algorithm performs compression (or, in other words, changes index tuple layout) only if there’s more than one tuple with this key. It allows us to avoid the overhead of storing useless metadata for mostly different keys (see picture below). It seems that compression could be useful for unique indexes under heavy write/update load (because of MVCC copies), but I don’t sure whether this use-case really exists. Those tuples should be deleted by microvacuum as soon as possible. Anyway, I think that it’s worth to add storage_parameter for btree which enables/disables compression for each particular index. And set compression of unique indexes to off by default. System indexes do not support compression for several reasons. First of all because of WIP state of the patch (debugging system catalog isn’t a big pleasure). The next reason is that I know many places in the code where hardcode or some non-obvious syscache routines are used. I do not feel brave enough to change this code. And last but not least, I don’t see good reasons to do that.

2. If the index key is very small (smaller than metadata) and the number of duplicates is small, compression could lead to index bloat instead of index size decrease (see picture below). I don’t sure whether it’s worth to handle this case separately because it’s really rare and I consider that it’s the DBA’s job to disable compression on such indexes. But if you see any clear way to do this, it would be great.

3. For GIN indexes, if a posting list is too large, a posting tree is created. It proceeded on assumptions that:
Indexed keys are never deleted. It makes all tree algorithms much easier.
There are always many duplicates. Otherwise, gin becomes really inefficient.
There’s no big concurrent rate. In order to add a new entry into a posting tree, we hold a lock on its root, so only 1 backend at a time can perform insertion. 

In btree we can’t afford these assumptions. So we should handle big posting lists in another way. If there are too many ItemPointers to fit into a single posting list, we will just create another one. The overhead of this approach is that we have to store a duplicate of the key and metadata. It leads to the problem of big keys. If the keysize is close to BTMaxItemSize, compression will give us really small benefit, if any at all (see picture below).

4. The more item pointers fit into the single posting list, the rare we have to split it and repeat the key. Therefore, the bigger BTMaxItemSize is the better. The comment in nbtree.h says: “We actually need to be able to fit three items on every page, so restrict any one item to 1/3 the per-page available space.” That is quite right for regular items, but if the index tuple is compressed it already contains more than one item. Taking it into account, we can assert that BTMaxItemSize ~ ⅓ pagesize for regular items, and ~ ½ pagesize for compressed items. Are there any objections? I wonder if we can increase BTMaxItemSize with some other assumption? The problem I see here is that varlena highkey could be as big as the compressed tuple.

5. CREATE INDEX. _bt_load. The algorithm of btree build is following: do the heap scan, add tuples into spool, sort the data, insert ordered data from spool into leaf index pages (_bt_load), build inner pages and root. The main changes are applied to _bt_load function. While loading tuples, we do not insert them one by one, but instead, compare each tuple with the previous one, and if they are equal we put them into posting list. If the posting list is large 

Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-11 Thread Anastasia Lubennikova

02.02.2016 15:50, Anastasia Lubennikova:

31.01.2016 11:04, David Rowley:

On 27 January 2016 at 03:35, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a 
separate patch.

Just to ease the review and debug.

Hi,

I've made another pass over the patch. There's still a couple of
things that I think need to be looked at.

Thank you again.
I just write here to say that I do not disappear and I do remember 
about the issue.
But I'm very very busy this week. I'll send an updated patch next week 
as soon as possible.




As promised, here's the new version of the patch "including_columns_4.0".
I fixed all issues except some points mentioned below.
Besides, I did some refactoring:
- use macros IndexRelationGetNumberOfAttributes, 
IndexRelationGetNumberOfKeyAttributes where possible. Use macro 
RelationGetNumberOfAttributes. Maybe that's a bit unrelated changes, but 
it'll make development much easier in future.

- rename related variables to indnatts,  indnkeyatts.

I'm also not that keen on index_reform_tuple() in general. I wonder if
there's a way we can just keep the Datum/isnull arrays a bit longer,
and only form the tuple when needed. I've not looked into this in
detail, but it does look like reforming the tuple is not going to be
cheap.
It is used in splits, for example. There is no datum array, we just 
move tuple key from a child page to a parent page or something like that.
And according to INCLUDING algorithm we need to truncate nonkey 
attributes.

If we do need to keep this function, I think a better name might be
index_trim_tuple() and I don't think you need to pass the original
length. It might make sense to Assert() that the trim length is
smaller than the tuple size


As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, 
newtuplength)?


I've tested it some more, and still didn't find any performance issues.


in gistrescan() there is some code:

for (attno = 1; attno <= natts; attno++)
{
TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
   scan->indexRelation->rd_opcintype[attno - 1],
   -1, 0);
}

Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
be sized by the number of key columns, but this loop goes over the
number of attribute columns.
Perhaps this is not a big problem since GIST does not support
INCLUDING columns, but it does seem wrong still.


GiST doesn't support INCLUDING clause, so natts and nkeyatts are 
always equal. I don't see any problem here.
And I think that it's an extra work to this patch. Maybe I or someone 
else would add this feature to other access methods later.


Still the same.

Which brings me to the fact that I've spent a bit of time trying to
look for places where you've forgotten to change natts to nkeyatts. I
did find this one, but I don't have much confidence that there's not
lots more places that have been forgotten. Apart from this one, how
confident are you that you've found all the places? I'm getting
towards being happy with the code that I see that's been changed, but
I'm hesitant to mark as "Ready for committer" due to not being all
that comfortable that all the code that needs to be updated has been
updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, 
and replaced (or expand) them with nkeyatts where it was necessary.

As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough 
inspection, so I'll recheck it again. But I'm almost sure that it's okay.


I rechecked everything again and fixed couple of omissions. Thank you 
for being exacting reviewer)
I don't know how to ensure that everything is ok, but I have no idea 
what else I can do.



I wondering if hacking the code so that each btree index which is
created with > 1 column puts all but the first column into the
INCLUDING columns, then run the regression tests to see if there are
any crashes. I'm really not that sure of how else to increase the
confidence levels on this. Do you have ideas?


Do I understand correctly that you suggest to replace all multicolumn 
indexes with (1key column) + included?
I don't think it's a good idea. INCLUDING clause brings some 
disadvantages. For example, included columns must be filtered after 
the search, while key columns could be used in scan key directly. I 
already mentioned this in test example:


explain analyze select c1, c2 from tbl where c1<1 and c3<20;
If columns' opclasses are used, new query plan uses them in  Index 
Cond: ((c1 < 1) AND (c3 < 20))
Otherwise, new query can not use included column in Index Cond and 
uses filter instead:

Index Cond: (c1 < 1)
Filter: (c3 < 20)
Rows Removed by Filter: 999

Re: [HACKERS]WIP: Covering + unique indexes.

2016-02-02 Thread Anastasia Lubennikova

31.01.2016 11:04, David Rowley:

On 27 January 2016 at 03:35, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

including_columns_3.0 is the latest version of patch.
And changes regarding the previous version are attached in a separate patch.
Just to ease the review and debug.

Hi,

I've made another pass over the patch. There's still a couple of
things that I think need to be looked at.

Thank you again.
I just write here to say that I do not disappear and I do remember about 
the issue.
But I'm very very busy this week. I'll send an updated patch next week 
as soon as possible.



Do we need the "b (included)" here? The key is (a) = (1). Having
irrelevant details might be confusing.

postgres=# create table a (a int not null, b int not null);
CREATE TABLE
postgres=# create unique index on a (a) including(b);
CREATE INDEX
postgres=# insert into a values(1,1);
INSERT 0 1
postgres=# insert into a values(1,1);
ERROR:  duplicate key value violates unique constraint "a_a_b_idx"
DETAIL:  Key (a, b (included))=(1, 1) already exists.
I thought that it could be strange if user inserts two values and then 
sees only one of them in error message.
But now I see that you're right. I'll also look at the same functional 
in other DBs and fix it.



In index_reform_tuple() I find it a bit scary that you change the
TupleDesc's number of attributes then set it back again once you're
finished reforming the shortened tuple.
Maybe it would be better to modify index_form_tuple() to accept a new
argument with a number of attributes, then you can just Assert that
this number is never higher than the number of attributes in the
TupleDesc.

Good point.
I agree that this function is a bit strange. I have to set 
tupdesc->nattrs to support compatibility with index_form_tuple().
I didn't want to add neither a new field to tupledesc nor a new 
parameter to index_form_tuple(), because they are used widely.

I'm also not that keen on index_reform_tuple() in general. I wonder if
there's a way we can just keep the Datum/isnull arrays a bit longer,
and only form the tuple when needed. I've not looked into this in
detail, but it does look like reforming the tuple is not going to be
cheap.
It is used in splits, for example. There is no datum array, we just move 
tuple key from a child page to a parent page or something like that.

And according to INCLUDING algorithm we need to truncate nonkey attributes.

If we do need to keep this function, I think a better name might be
index_trim_tuple() and I don't think you need to pass the original
length. It might make sense to Assert() that the trim length is
smaller than the tuple size


As regards the performance, I don't think that it's a big problem here.
Do you suggest to do it in a following way memcpy(oldtup, newtup, 
newtuplength)?

I will

in gistrescan() there is some code:

for (attno = 1; attno <= natts; attno++)
{
TupleDescInitEntry(so->giststate->fetchTupdesc, attno, NULL,
   scan->indexRelation->rd_opcintype[attno - 1],
   -1, 0);
}

Going by RelationInitIndexAccessInfo() rd_opcintype[] is allocated to
be sized by the number of key columns, but this loop goes over the
number of attribute columns.
Perhaps this is not a big problem since GIST does not support
INCLUDING columns, but it does seem wrong still.


GiST doesn't support INCLUDING clause, so natts and nkeyatts are always 
equal. I don't see any problem here.
And I think that it's an extra work to this patch. Maybe I or someone 
else would add this feature to other access methods later.

Which brings me to the fact that I've spent a bit of time trying to
look for places where you've forgotten to change natts to nkeyatts. I
did find this one, but I don't have much confidence that there's not
lots more places that have been forgotten. Apart from this one, how
confident are you that you've found all the places? I'm getting
towards being happy with the code that I see that's been changed, but
I'm hesitant to mark as "Ready for committer" due to not being all
that comfortable that all the code that needs to be updated has been
updated. I'm not quite sure of a good way to find all these places.
I found all mentions of natts and other related variables with grep, and 
replaced (or expand) them with nkeyatts where it was necessary.

As mentioned before, I didn't change other AMs.
I strongly agree that any changes related to btree require thorough 
inspection, so I'll recheck it again. But I'm almost sure that it's okay.



I wondering if hacking the code so that each btree index which is
created with > 1 column puts all but the first column into the
INCLUDING columns, then run the regression tests to see if there are
any crashes. I'm really not that sure of how else to increase the
confidence levels on this. Do you have ideas?


Do I understand correctly that you suggest to replace all multicolumn 
indexes with (1key column) + included?
I don't think it's a good idea.

[HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-14 Thread Anastasia Lubennikova

14.03.2016 16:23, David Steele:

On 2/25/16 4:44 PM, Vitaly Burovoy wrote:


Added to the commitfest 2016-03.

[CF] https://commitfest.postgresql.org/9/540/


This looks like a fairly straight-forward bug fix (the size of the 
patch is deceptive because there a lot of new tests included).  It 
applies cleanly.


Anastasia, I see you have signed up to review.  Do you have an idea 
when you will get the chance to do that?


Thanks,


I've read the patch thoroughly and haven't found any problems. I think 
that the patch is in a very good shape.

It fixes a bug and has an excellent set of tests.

There is an issue, mentioned in the thread above:


postgres=# select
postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
 to_char  |  to_char  |  to_char
---+---+---
monday| monday| thursday
(1 row)



since 4714-12-28 BC and to the past detection when a week is starting
is broken (because it is boundary of isoyears -4713 and -4712).
Is it worth to break undocumented range or leave it as is?


But I suppose that behavior of undocumented dates is not essential.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [PATCH] Integer overflow in timestamp[tz]_part() and date/time boundaries check

2016-03-15 Thread Anastasia Lubennikova

15.03.2016 03:21, Vitaly Burovoy:

On 3/14/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:

14.03.2016 16:23, David Steele:

On 2/25/16 4:44 PM, Vitaly Burovoy wrote:


Added to the commitfest 2016-03.

[CF] https://commitfest.postgresql.org/9/540/

This looks like a fairly straight-forward bug fix (the size of the
patch is deceptive because there a lot of new tests included).  It
applies cleanly.

Anastasia, I see you have signed up to review.  Do you have an idea
when you will get the chance to do that?

Thanks,

I've read the patch thoroughly and haven't found any problems. I think
that the patch is in a very good shape.
It fixes a bug and has an excellent set of tests.

There is an issue, mentioned in the thread above:


postgres=# select
postgres-#  to_char(date_trunc('week', '4713-01-01 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-29 BC'::date),'day')
postgres-# ,to_char(date_trunc('week', '4714-12-28 BC'::date),'day');
  to_char  |  to_char  |  to_char
---+---+---
monday| monday| thursday
(1 row)
since 4714-12-28 BC and to the past detection when a week is starting
is broken (because it is boundary of isoyears -4713 and -4712).
Is it worth to break undocumented range or leave it as is?

But I suppose that behavior of undocumented dates is not essential.

I'm sorry... What should I do with "Waiting on Author" state if you
don't have complaints?



I was going to set "Ready for Committer", but then I've noticed message 
from Mark Dilger and changed my mind.

Now, when you answered him, I have no objections.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-15 Thread Anastasia Lubennikova

14.03.2016 16:02, David Steele:

Hi Anastasia,

On 2/18/16 12:29 PM, Anastasia Lubennikova wrote:

18.02.2016 20:18, Anastasia Lubennikova:

04.02.2016 20:16, Peter Geoghegan:

On Fri, Jan 29, 2016 at 8:50 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru>  wrote:

I fixed it in the new version (attached).


Thank you for the review.
At last, there is a new patch version 3.0. After some refactoring it
looks much better.
I described all details of the compression in this document
https://goo.gl/50O8Q0 (the same text without pictures is attached in
btc_readme_1.0.txt).
Consider it as a rough copy of readme. It contains some notes about
tricky moments of implementation and questions about future work.
Please don't hesitate to comment it.


Sorry, previous patch was dirty. Hotfix is attached.


This looks like an extremely valuable optimization for btree indexes 
but unfortunately it is not getting a lot of attention. It still 
applies cleanly for anyone interested in reviewing.




Thank you for attention.
I would be indebted to all reviewers, who can just try this patch on 
real data and workload (except WAL for now).

B-tree needs very much testing.

It's not clear to me that you answered all of Peter's questions in 
[1].  I understand that you've provided a README but it may not be 
clear if the answers are in there (and where).


I described in README all the points Peter asked.
But I see that it'd be better to answer directly.
Thanks for reminding, I'll do it tomorrow.


Also, at the end of the README it says:

13. Xlog. TODO.

Does that mean the patch is not yet complete?


Yes, you're right.
Frankly speaking, I supposed that someone will help me with that stuff,
but now I almost completed it. I'll send updated patch in the next letter.

I'm still doubtful about some patch details. I mentioned them in readme 
(bold type).

But they are mostly about future improvements.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-11 Thread Anastasia Lubennikova
fer, copy it into local memory allocated by
palloc(), and then immediately release the buffer lock and drop the
pin. This is the same in all instances. There is never more than one
buffer lock or pin held at a time.

We do, on the other hand, have a detailed rationale for why it's okay
that we don't have an ExclusiveLock on the index relation for checks
that span the key space of more than one page by following right links
to compare items across sibling pages. This isn't the same thing as
having an explicit interlock like a pin -- our interlock is one
against *recycling* by vacuum, which is based on recentGlobalXmin.
This rationale requires expert review.


I'm not an expert, but I promise to take a closer look at locking.
I will send another email soon.

Performance
==

Trying to keep the tool as simple as possible, while still making it
do verification that is as useful as possible was my priority here,
not performance. Still, verification completes fairly quickly.
Certainly, it takes far less time than having to REINDEX the index,
and doesn't need too much memory. I think that in practice most
problems that can be detected by the B-Tree checker functions will be
detected with the lighter variant.


I do not see any performance issues.
I'm sure that if someone wants to check whether the index is corrupted, 
he certainly can wait a minute (.


But I have some concerns about compatibility with my patches.
I've tried to call bt_index_check() over my "including" patch [1] and 
caught a segfault.


LOG:  server process (PID 31794) was terminated by signal 11: 
Segmentation fault

DETAIL:  Failed process was running: select bt_index_check('idx');

I do hope that my patch will be accepted in 9.6, so this conflict looks 
really bad.
I think that error is caused by changes in pages layout. To save some 
space nonkey attributes are truncated
when btree copies the indexed value into inner pages or into high key. 
You can look at index_reform_tuple() calls.


I wonder, whether you can add some check of catalog version to your 
module or this case requires more changes?


With second patch which implements compression [2], amcheck causes 
another error.

postgres=# insert into tbl select 1 from generate_series(0,5);
INSERT 0 6
postgres=# SELECT * FROM bt_page_items('idx', 1);
 itemoffset |  ctid  | itemlen | nulls | vars | data

++-+---+--+
-
  1 | (2147483664,6) |  56 | f | f| 01 00 00 00 00 
00 00 00 00 00 00 00 01 00 00 00 00 00 02 00 00 00 00 00 03 00 00 00 00

00 04 00 00 00 00 00 05 00 00 00 00 00 06 00 00 00 00 00

postgres=# select bt_index_check('idx');
ERROR:  cannot check index "idx"
DETAIL:  index is not yet ready for insertions

But I'm sure that it's a problem of my patch. So I'll fix it and try again.

[1] https://commitfest.postgresql.org/9/433/
[2] https://commitfest.postgresql.org/9/494/

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-19 Thread Anastasia Lubennikova
Please, find the new version of the patch attached. Now it has WAL 
functionality.


Detailed description of the feature you can find in README draft 
https://goo.gl/50O8Q0


This patch is pretty complicated, so I ask everyone, who interested in 
this feature,

to help with reviewing and testing it. I will be grateful for any feedback.
But please, don't complain about code style, it is still work in progress.

Next things I'm going to do:
1. More debugging and testing. I'm going to attach in next message 
couple of sql scripts for testing.

2. Fix NULLs processing
3. Add a flag into pg_index, that allows to enable/disable compression 
for each particular index.
4. Recheck locking considerations. I tried to write code as less 
invasive as possible, but we need to make sure that algorithm is still 
correct.

5. Change BTMaxItemSize
6. Bring back microvacuum functionality.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..72acc0f 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,8 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
+#include "utils/datum.h"
 
 typedef struct
 {
@@ -82,6 +84,7 @@ static bool _bt_pgaddtup(Page page, Size itemsize, IndexTuple itup,
 			 OffsetNumber itup_off);
 static bool _bt_isequal(TupleDesc itupdesc, Page page, OffsetNumber offnum,
 			int keysz, ScanKey scankey);
+
 static void _bt_vacuum_one_page(Relation rel, Buffer buffer, Relation heapRel);
 
 
@@ -113,6 +116,11 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	Page 		page;
+	TupleDesc	itupdesc;
+	int			nipd;
+	IndexTuple 	olditup;
+	Size 		sizetoadd;
 
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
@@ -190,6 +198,7 @@ top:
 
 	if (checkUnique != UNIQUE_CHECK_EXISTING)
 	{
+		bool updposting = false;
 		/*
 		 * The only conflict predicate locking cares about for indexes is when
 		 * an index tuple insert conflicts with an existing lock.  Since the
@@ -201,7 +210,42 @@ top:
 		/* do the insertion */
 		_bt_findinsertloc(rel, , , natts, itup_scankey, itup,
 		  stack, heapRel);
-		_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
+
+		/*
+		 * Decide, whether we can apply compression
+		 */
+		page = BufferGetPage(buf);
+
+		if(!IsSystemRelation(rel)
+			&& !rel->rd_index->indisunique
+			&& offset != InvalidOffsetNumber
+			&& offset <= PageGetMaxOffsetNumber(page))
+		{
+			itupdesc = RelationGetDescr(rel);
+			sizetoadd = sizeof(ItemPointerData);
+			olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+
+			if(_bt_isbinaryequal(itupdesc, olditup,
+	rel->rd_index->indnatts, itup))
+			{
+if (!BtreeTupleIsPosting(olditup))
+{
+	nipd = 1;
+	sizetoadd = sizetoadd*2;
+}
+else
+	nipd = BtreeGetNPosting(olditup);
+
+if ((IndexTupleSize(olditup) + sizetoadd) <= BTMaxItemSize(page)
+	&& PageGetFreeSpace(page) > sizetoadd)
+	updposting = true;
+			}
+		}
+
+		if (updposting)
+			_bt_pgupdtup(rel, buf, offset, itup, olditup, nipd);
+		else
+			_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
 	}
 	else
 	{
@@ -1042,6 +1086,7 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 		itemid = PageGetItemId(origpage, P_HIKEY);
 		itemsz = ItemIdGetLength(itemid);
 		item = (IndexTuple) PageGetItem(origpage, itemid);
+
 		if (PageAddItem(rightpage, (Item) item, itemsz, rightoff,
 		false, false) == InvalidOffsetNumber)
 		{
@@ -1072,13 +1117,39 @@ _bt_split(Relation rel, Buffer buf, Buffer cbuf, OffsetNumber firstright,
 		itemsz = ItemIdGetLength(itemid);
 		item = (IndexTuple) PageGetItem(origpage, itemid);
 	}
-	if (PageAddItem(leftpage, (Item) item, itemsz, leftoff,
+
+	if (BtreeTupleIsPosting(item))
+	{
+		Size hikeysize =  BtreeGetPostingOffset(item);
+		IndexTuple hikey = palloc0(hikeysize);
+
+		/* Truncate posting before insert it as a hikey. */
+		memcpy (hikey, item, hikeysize);
+		hikey->t_info &= ~INDEX_SIZE_MASK;
+		hikey->t_info |= hikeysize;
+		ItemPointerSet(&(hikey->t_tid), origpagenumber, P_HIKEY);
+
+		if (PageAddItem(leftpage, (Item) hikey, hikeysize, leftoff,
 	false, false) == InvalidOffsetNumber)
+		{
+			memset(rightpage, 0, BufferGetPageSize(rbuf));
+			elog(ERROR, "failed to add hikey to the left sibling"
+" while splitting block %u of index \"%s\"",
+origpagenumber, RelationGetRelationName(rel));
+		}
+
+		pfree(hikey);
+	}
+	else
 	{
-		memset(rightpage, 0, BufferGetPageSize(rbuf));
-		elog(ERROR, "failed to add hikey to the left siblin

Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-16 Thread Anastasia Lubennikova

15.03.2016 22:28, David Steele:

On 3/4/16 2:56 PM, Vitaly Burovoy wrote:


On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:


I think that you should update documentation. At least description of
epoch on this page:
http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

Vitaly, have you decided how to update this yet?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
abbreviation, but it seems slightly confusing to me.

It doesn't matter for me what it is called, it is short enough and
reflects a type on which it is applied.
What would the best name be for it?

Anastasia, any suggestions for a better name, or just leave it as is?

I'm not in favor of the "4", either.  I think I would prefer
JULIAN_MAXYEAR_STAMP.



This point is related to another patch 
https://commitfest.postgresql.org/9/540/.

And added to this patch just for compatibility.
If Tom wouldn't change the name of the macros there, I don't see any 
reasons why should we do it in this patch.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS]WIP: Covering + unique indexes.

2016-03-14 Thread Anastasia Lubennikova

02.03.2016 08:50, Michael Paquier:

On Wed, Mar 2, 2016 at 2:10 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

01.03.2016 19:55, Anastasia Lubennikova:

It is not the final version, because it breaks pg_dump for previous
versions. I need some help from hackers here.
pgdump. line 5466
if (fout->remoteVersion >= 90400)

What does 'remoteVersion' mean? And what is the right way to change it? Or
it changes between releases?
I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so?
That is totally new to me.

Yes, you got it. That's basically PG_VERSION_NUM as compiled on the
server that has been queried, in this case the server from which a
dump is taken. If you are changing the system catalog layer, you would
need to provide a query at least equivalent to what has been done
until now for your patch, the modify pg_dump as follows:
if (fout->remoteVersion >= 90600)
{
 query = my_new_query;
}
else if (fout->remoteVersion >= 90400)
{
 query = the existing 9.4 query
}
etc.

In short you just need to add a new block so as remote servers newer
than 9.6 will be able to dump objects correctly. pg_upgrade is a good
way to check the validity of pg_dump actually, this explains why some
objects are not dropped in the regression tests. Perhaps you'd want to
do the same with your patch if the current test coverage of pg_dump is
not enough. I have not looked at your patch so I cannot say for sure.


Thank you for the explanation.
New version of the patch implements pg_dump well.
Documentation related to constraints is updated.

I hope, that patch is in a good shape now. Brief overview for reviewers:

This patch allows unique indexes to be defined on one set of columns
and include another set of column in the INCLUDING clause, on which
the uniqueness is not enforced upon. It allows more queries to benefit
from using index-only scan. Currently, only the B-tree access method
supports this feature.

Syntax example:
CREATE TABLE tbl (c1 int, c2 int, c3 box);
CREATE INDEX idx ON TABLE tbl (c1) INCLUDING (c2, c3);

In opposite to key columns (c1),  included columns (c2,c3) are not used
in index scankeys neither in "search" scankeys nor in "insertion" scankeys.
Included columns are stored only in leaf pages and it can help to slightly
reduce index size. Hence, included columns do not require any opclass
for btree access method. As you can see from example above, it's possible
to add into index columns of "box" type.

The most common use-case for this feature is combination of UNIQUE or
PRIMARY KEY constraint on columns (a,b) and covering index on columns 
(a,b,c).

So, there is a new syntax for constraints.

CREATE TABLE tblu (c1 int, c2 int, c3 box, UNIQUE (c1,c2) INCLUDING (c3));
Index, created for this constraint contains three columns.
"tblu_c1_c2_c3_key" UNIQUE CONSTRAINT, btree (c1, c2) INCLUDING (c3)

CREATE TABLE tblpk (c1 int, c2 int, c3 box, PRIMARY KEY (c1) INCLUDING 
(c3));

Index, created for this constraint contains two columns. Note that NOT NULL
constraint is applied only to key column(s) as well as unique constraint.

postgres=# \d tblpk
 Table "public.tblpk"
 Column |  Type   | Modifiers
+-+---
 c1 | integer | not null
 c2 | integer |
 c3 | box |
Indexes:
"tblpk_pkey" PRIMARY KEY, btree (c1) INCLUDING (c3)

Same for ALTER TABLE statements:
CREATE TABLE tblpka (c1 int, c2 int, c3 box);
ALTER TABLE tblpka ADD PRIMARY KEY (c1) INCLUDING (c3);

pg_dump is updated and seems to work fine with this kind of indexes.

I see only one problem left (maybe I've mentioned it before).
Queries like this [1] must be rewritten, because after catalog changes,
i.indkey contains both key and included attrs.
One more thing to do is some refactoring of names, since "indkey"
looks really confusing to me. But it could be done as a separate patch [2].


[1] https://wiki.postgresql.org/wiki/Retrieve_primary_key_columns
[2] http://www.postgresql.org/message-id/56bb7788.30...@postgrespro.ru

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknuma

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-05 Thread Anastasia Lubennikova

05.04.2016 01:48, Peter Geoghegan :

On Mon, Mar 21, 2016 at 9:53 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:

It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
 /*
  * We copy the last item on the page into the new page, and then
  * rearrange the old page so that the 'last item' becomes its high
key
  * rather than a true data item.  There had better be at least two
  * items on the page already, else the page would be empty of useful
  * data.
  */
 /*
  * Move 'last' into the high key position on opage
  */

To be consistent with other steps of algorithm ( all high keys must be
truncated tuples), I had to update this high key on place:
delete the old one, and insert truncated high key.

Hmm. But the high key comparing equal to the Scankey gives insertion
the choice of where to put its IndexTuple (it can go on the page with
the high key, or its right-sibling, according only to considerations
about fillfactor, etc). Is this changed? Does it not matter? Why not?
Is it just worth it?


I would say, this is changed, but it doesn't matter.
Performing any search in btree (including choosing suitable page for 
insertion), we use only key attributes.

We assume that included columns are stored in index unordered.
Simple example.
create table tbl(id int, data int);
create index idx on tbl (id) including (data);

Select query does not consider included columns in scan key.
It selects all tuples satisfying the condition on key column. And only 
after that it applies filter to remove wrong rows from the result.
If key attribute doesn't satisfy query condition, there are no more 
tuples to return and we can interrupt scan.


You can find more explanations in the attached sql script,
that contains queries to recieve detailed information about index 
structure using pageinspect.



The right-most page on every level has no high-key. But you say those
pages have an "imaginary" *positive* infinity high key, just as
internal pages have (non-imaginary) minus infinity downlinks as their
first item/downlink. So tuples in a (say) leaf page are always bound
by the downlink lower bound in parent, while their own high key is an
upper bound. Either (and, rarely, both) could be (positive or
negative) infinity.

Maybe you now see why I talked about special _bt_compare() logic for
this. I proposed special logic that is similar to the existing minus
infinity thing _bt_compare() does (although _bt_binsrch(), an
important caller of _bt_compare() also does special things for
internal .vs leaf case, so I'm not sure any new special logic must go
in _bt_compare()).


In my view, it's the correct way to fix this problem, because the caller is
responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it an
assertion (see the patch attached).

I see what you mean, but I think we need to decide what to do about
the key space when leaf high keys are truncated. I do think that
truncating the high key was the right idea, though, and it nicely
illustrates that nothing special should happen in upper levels. Suffix
truncation should only happen when leaf pages are split, generally
speaking.
As I said, the high key is very similar to the downlinks, in that both
bound the things that go on each page. Together with downlinks they
represent *discrete* ranges *unambiguously*, so INCLUDING truncation
needs to make it clear which page new items go on. As I said,
_bt_binsrch() already takes special actions for internal pages, making
sure to return the item that is < the scankey, not <= the scankey
which is only allowed for leaf pages. (See README, from "Lehman and
Yao assume that the key range for a subtree S is described by Ki < v
<= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent
page...").

To give a specific example, I worry about the case where two sibling
downlinks in a parent page are distinct, but per specific-to-Postgres
"Ki <= v <= Ki+1" thing (which differs from the classic L
invariant), some tuples with all right downlink's attributes matching
end up in left child page, not right child page. I worry that since
_bt_findsplitloc() doesn't consider this (for example), the split
point doesn't *reliably* and unambiguously divide the key space
between the new halves of a page being split. I think the "Ki <= v <=
Ki+1"/_bt_binsrch() thing might save you in common cases where all
downlink attributes are distinct, so maybe that simpler case is okay.
But to be even more specific, what about the more complicated case
where the downlinks *are* fully _bt_compare()-wise equal? This could
happen even though they're constrained to be unique in leaf pages, due
to bloat. Unique indexes aren't special here; they just make it far
less likely that this w

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-07 Thread Anastasia Lubennikova

06.04.2016 23:52, Peter Geoghegan:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.

You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.



Mentioned issues are fixed. Patch is attached.

I'd like to remind you that the commitfest will be closed very-very 
soon, so I'd like to get your final resolution about the patch.

Not to have it in the 9.6 release will be very disappointing.

I agree that b-tree is a crucial subsystem. But it seems to me, that we 
have lack of improvements in this area
not only because of the algorithm's complexity but also because of lack 
of enthusiasts to work on it and struggle through endless discussions.
But it's off-topic here. Attention to these development difficulties 
will be one of the messages of my pgcon talk.


You know, we lost a lot of time discussing various b-tree problems.
Besides that, I am sure that the patch is really in a good shape. It 
hasn't any open problems to fix.

And possible subtle bugs can be found at the testing stage of the release.

Looking forward to your reply.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary 

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-08 Thread Anastasia Lubennikova

08.04.2016 15:06, Teodor Sigaev:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.


You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.


Some notices:
- index_truncate_tuple(Relation idxrel, IndexTuple olditup, int indnatts,
   int  indnkeyatts)
  Why we need indnatts/indnkeyatts? They are presented in idxrel struct
  already
- follow code where index_truncate_tuple() is called, it should never 
called in
  case where indnatts == indnkeyatts. So, indnkeyatts should be 
strictly less
  than indnatts, pls, change assertion. If they are equal the this 
function

  becomes complicated variant of CopyIndexTuple()


Good point. These attributes seem to be there since previous versions of 
the function.

But now they are definitely unnecessary. Updated patch is attached

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(P

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-04-08 Thread Anastasia Lubennikova
ake it as simple as possible despite significant
effort. It's hard to describe these things in an accessible way. Maybe
someone can try and understand what I've said there, and let me know
how I might have fallen short. I have used a new term, "cousin page"
in this explanation (cf. sibling page). This seems like useful
terminology that makes these difficult explanations a bit more
accessible.


I'm not an expert in btree locking races, but I tested the patch it on 
different loads and it works as promised.

I didn't find mistakes in the code logic as well.

As a reviewer, I don't have any objections to mark it "Ready for Committer".
The only notice I'd like to add is about it's compatibility with 
covering indexes.
We already discussed it in the thread 
https://commitfest.postgresql.org/9/433/

Tiny attached patch fixes this issue.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/amcheck/amcheck.c b/contrib/amcheck/amcheck.c
index 66d1f4f..91b2bc5 100644
--- a/contrib/amcheck/amcheck.c
+++ b/contrib/amcheck/amcheck.c
@@ -1118,6 +1118,9 @@ invariant_key_less_than_equal_offset(BtreeCheckState *state, ScanKey key,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, state->target, upperbound);
 
 	return cmp <= 0;
@@ -1139,6 +1142,9 @@ invariant_key_greater_than_equal_offset(BtreeCheckState *state, ScanKey key,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, state->target, lowerbound);
 
 	return cmp >= 0;
@@ -1164,6 +1170,9 @@ invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
 	int16		natts = state->rel->rd_rel->relnatts;
 	int32		cmp;
 
+#if (PG_VERSION_NUM >= 90600)
+		natts = state->rel->rd_index->indnkeyatts;
+#endif
 	cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);
 
 	return cmp <= 0;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-08 Thread Anastasia Lubennikova

08.04.2016 15:45, Anastasia Lubennikova:

08.04.2016 15:06, Teodor Sigaev:

On Wed, Apr 6, 2016 at 1:50 PM, Peter Geoghegan <p...@heroku.com> wrote:

Personally, I like documenting assertions, and will sometimes write
assertions that the compiler could easily optimize away. Maybe going
*that* far is more a matter of personal style, but I think an
assertion about the new index tuple size being <= the old one is just
a good idea. It's not about a problem in your code at all.


You should make index_truncate_tuple()/index_reform_tuple() promise to
always do this in its comments/contract with caller as part of this,
IMV.


Some notices:
- index_truncate_tuple(Relation idxrel, IndexTuple olditup, int 
indnatts,

   int  indnkeyatts)
  Why we need indnatts/indnkeyatts? They are presented in idxrel struct
  already
- follow code where index_truncate_tuple() is called, it should never 
called in
  case where indnatts == indnkeyatts. So, indnkeyatts should be 
strictly less
  than indnatts, pls, change assertion. If they are equal the this 
function

  becomes complicated variant of CopyIndexTuple()


Good point. These attributes seem to be there since previous versions 
of the function.

But now they are definitely unnecessary. Updated patch is attached


One more improvement - note about expressions into documentation.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index b5f67af..61a21a9 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -161,6 +161,8 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] INCLUDING clause, which can slightly reduce the size of the index,
 due to storing included attributes only in leaf index pages.
 Currently, only the B-tree access method supports this feature.
+Expressions as included columns are not supported since they cannot be used
+in index-only scan.

   
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-12 Thread Anastasia Lubennikova
Attached version has fix of pg_dump suggested by Stephen 
Frost<http://postgresql.nabble.com/template/NamlServlet.jtp?macro=user_nodes=75583> 
in -committers thread.

http://postgresql.nabble.com/pgsql-CREATE-INDEX-INCLUDING-column-td5897653.html
Sooner or later, I'd like to see this patch finished.

For now, it has two complaints:
- support of expressions as included columns.
Frankly, I don't understand, why it's a problem of the patch.
The patch is  already big enough and it will be much easier to add 
expressions support in the following patch, after the first one will be 
stable.

I wonder, if someone has objections to that?
Yes, it's a kind of delayed feature. But should we wait for every patch 
when it will be entirely completed?


- lack of review and testing
Obviously I did as much testing as I could.
So, if reviewers have any concerns about the patch, I'm waiting forward 
to see them.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i < numatts; i++)
+for (i = 0; i < indnkeyatts; i++)
 {
 	int			colno = index->indkey.values[i];
 
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index d6b60db..3

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-06 Thread Anastasia Lubennikova
 b;
==12310== Invalid read of size 4
==12310==at 0x656615: match_clause_to_indexcol (indxpath.c:2226)
==12310==by 0x656615: match_clause_to_index (indxpath.c:2144)
==12310==by 0x656DBC: match_clauses_to_index (indxpath.c:2115)
==12310==by 0x658054: match_restriction_clauses_to_index (indxpath.c:2026)
==12310==by 0x658054: create_index_paths (indxpath.c:269)
==12310==by 0x64D1DB: set_plain_rel_pathlist (allpaths.c:649)
==12310==by 0x64D1DB: set_rel_pathlist (allpaths.c:427)
==12310==by 0x64D93B: set_base_rel_pathlists (allpaths.c:299)
==12310==by 0x64D93B: make_one_rel (allpaths.c:170)
==12310==by 0x66876C: query_planner (planmain.c:246)
==12310==by 0x669FBA: grouping_planner (planner.c:1666)
==12310==by 0x66D0C9: subquery_planner (planner.c:751)
==12310==by 0x66D3DA: standard_planner (planner.c:300)
==12310==by 0x66D714: planner (planner.c:170)
==12310==by 0x6FD692: pg_plan_query (postgres.c:798)
==12310==by 0x59082D: ExplainOneQuery (explain.c:350)
==12310==  Address 0xbff290c is 2,508 bytes inside a block of size 8,192 alloc'd
==12310==at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==12310==by 0x81B7FA: AllocSetAlloc (aset.c:853)
==12310==by 0x81D257: palloc (mcxt.c:907)
==12310==by 0x4B6F65: RelationGetIndexScan (genam.c:94)
==12310==by 0x4C135D: btbeginscan (nbtree.c:431)
==12310==by 0x4B7A5C: index_beginscan_internal (indexam.c:279)
==12310==by 0x4B7C5A: index_beginscan (indexam.c:222)
==12310==by 0x4B73D1: systable_beginscan (genam.c:379)
==12310==by 0x7E8CF9: ScanPgRelation (relcache.c:341)
==12310==by 0x7EB3C4: RelationBuildDesc (relcache.c:951)
==12310==by 0x7ECD35: RelationIdGetRelation (relcache.c:1800)
==12310==by 0x4A4D37: relation_open (heapam.c:1118)
==12310==
{

Memcheck:Addr4
fun:match_clause_to_indexcol
fun:match_clause_to_index
fun:match_clauses_to_index
fun:match_restriction_clauses_to_index
fun:create_index_paths
fun:set_plain_rel_pathlist
fun:set_rel_pathlist
fun:set_base_rel_pathlists
fun:make_one_rel
fun:query_planner
fun:grouping_planner
fun:subquery_planner
fun:standard_planner
fun:planner
fun:pg_plan_query
fun:ExplainOneQuery
}

Separately, I tried "make installcheck-tests TESTS=index_including"
from Postgres + Valgrind, with Valgrind's --track-origins option
enabled (as it was above). I recommend installing Valgrind, and making
sure that the patch shows no errors. I didn't actually find a Valgrind
issue from just using your regression tests (nor did I find an issue
from separately running the regression tests with
CLOBBER_CACHE_ALWAYS, FWIW).


Thank you for advice.
Another miss of index->ncolumns to index->nkeycolumns replacement in 
match_clause_to_index. Fixed.

I also updated couple of typos in documentation.

Thank you again for the detailed review.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary

Re: [HACKERS] WIP: Covering + unique indexes.

2016-04-06 Thread Anastasia Lubennikova

06.04.2016 16:15, Anastasia Lubennikova :

06.04.2016 03:05, Peter Geoghegan:

* There is some stray whitespace within RelationGetIndexAttrBitmap().
I think you should have updated it with code, though. I don't think
it's necessary for HOT updates to work, but I think it could be
necessary so that we don't need to get a row lock that blocks
non-conflict foreign key locking (see heap_update() callers). I think
you need to be careful for non-key columns within the loop in
RelationGetIndexAttrBitmap(), basically, because it seems to still go
through all columns. UPSERT also must call this code, FWIW.

* I think that a similar omission is also made for the replica
identity stuff in RelationGetIndexAttrBitmap(). Some thought is needed
on how this patch interacts with logical decoding, I guess.


Good point. Indexes are everywhere in the code.
I missed that RelationGetIndexAttrBitmap() is used not only for REINDEX.
I'll discuss it with Theodor and send an updated patch tomorrow.


As promised, updated patch is in attachments.
But, I'm not an expert in this area, so it needs a 'critical look'.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums, int pknumatts, char **tgt_pkattvals);
@@ -1485,7 +1485,7 @@ PG_FUNCTION_INFO_V1(dblink_get_pkey);
 Datum
 dblink_get_pkey(PG_FUNCTION_ARGS)
 {
-	int16		numatts;
+	int16		indnkeyatts;
 	char	  **results;
 	FuncCallContext *funcctx;
 	int32		call_cntr;
@@ -1511,7 +1511,7 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
 
 		/* get the array of attnums */
-		results = get_pkey_attnames(rel, );
+		results = get_pkey_attnames(rel, );
 
 		relation_close(rel, AccessShareLock);
 
@@ -1531,9 +1531,9 @@ dblink_get_pkey(PG_FUNCTION_ARGS)
 		attinmeta = TupleDescGetAttInMetadata(tupdesc);
 		funcctx->attinmeta = attinmeta;
 
-		if ((results != NULL) && (numatts > 0))
+		if ((results != NULL) && (indnkeyatts > 0))
 		{
-			funcctx->max_calls = numatts;
+			funcctx->max_calls = indnkeyatts;
 
 			/* got results, keep track of them */
 			funcctx->user_fctx = results;
@@ -2023,10 +2023,10 @@ dblink_fdw_validator(PG_FUNCTION_ARGS)
  * get_pkey_attnames
  *
  * Get the primary key attnames for the given relation.
- * Return NULL, and set numatts = 0, if no primary key exists.
+ * Return NULL, and set indnkeyatts = 0, if no primary key exists.
  */
 static char **
-get_pkey_attnames(Relation rel, int16 *numatts)
+get_pkey_attnames(Relation rel, int16 *indnkeyatts)
 {
 	Relation	indexRelation;
 	ScanKeyData skey;
@@ -2036,8 +2036,8 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 	char	  **result = NULL;
 	TupleDesc	tupdesc;
 
-	/* initialize numatts to 0 in case no primary key exists */
-	*numatts = 0;
+	/* initialize indnkeyatts to 0 in case no primary key exists */
+	*indnkeyatts = 0;
 
 	tupdesc = rel->rd_att;
 
@@ -2058,12 +2058,12 @@ get_pkey_attnames(Relation rel, int16 *numatts)
 		/* we're only interested if it is the primary key */
 		if (index->indisprimary)
 		{
-			*numatts = index->indnatts;
-			if (*numatts > 0)
+			*indnkeyatts = index->indnkeyatts;
+			if (*indnkeyatts > 0)
 			{
-result = (char **) palloc(*numatts * sizeof(char *));
+result = (char **) palloc(*indnkeyatts * sizeof(char *));
 
-for (i = 0; i < *numatts; i++)
+for (i = 0; i < *indnkeyatts; i++)
 	result[i] = SPI_fname(tupdesc, index->indkey.values[i]);
 			}
 			break;
diff --git a/contrib/tcn/tcn.c b/contrib/tcn/tcn.c
index 7352b29..142730a 100644
--- a/contrib/tcn/tcn.c
+++ b/contrib/tcn/tcn.c
@@ -138,9 +138,9 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 		/* we're only interested if it is the primary key and valid */
 		if (index->indisprimary && IndexIsValid(index))
 		{
-			int			numatts = index->indnatts;
+			int			indnkeyatts = index->indnkeyatts;
 
-			if (numatts > 0)
+			if (indnkeyatts > 0)
 			{
 int			i;
 
@@ -150,7 +150,7 @@ triggered_change_notification(PG_FUNCTION_ARGS)
 appendStringInfoCharMacro(payload, ',');
 appendStringInfoCharMacro(payload, operation);
 
-for (i = 0; i <

Re: [HACKERS] WIP: Covering + unique indexes.

2016-03-21 Thread Anastasia Lubennikova

19.03.2016 08:00, Peter Geoghegan:

On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> wrote:

It looks like this patch should be marked "needs review" and I have done so.

Uh, no it shouldn't. I've posted an extensive review on the original
design thread. See CF entry:

https://commitfest.postgresql.org/9/433/

Marked "Waiting on Author".

Thanks to David,
I've missed these letters at first.
I'll answer here.


* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, indnkeyatts);
+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
 ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past)


I agree that _bt_insertonpg() is not right for truncation.
Furthermore, I've noticed that all internal keys are solely the copies 
of "High keys" from the leaf pages. Which is pretty logical.
Therefore, if we have already truncated the tuple, when it became a High 
key, we do not need the same truncation within _bt_insert_parent() or 
any other function.
So the only thing to worry about is the HighKey truncation. I rewrote 
the code. Now only _bt_split cares about truncation.


It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
/*
 * We copy the last item on the page into the new page, and then
 * rearrange the old page so that the 'last item' becomes its 
high key

 * rather than a true data item.  There had better be at least two
 * items on the page already, else the page would be empty of 
useful

 * data.
 */
/*
 * Move 'last' into the high key position on opage
 */

To be consistent with other steps of algorithm ( all high keys must be 
truncated tuples), I had to update this high key on place:

delete the old one, and insert truncated high key.
The very same logic I use to truncate posting list of a compressed tuple 
in the "btree_compression" patch. [1]

I hope, both patches will be accepted, and then I'll thoroughly merge them .


* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal.


It is a very important issue. But I don't think it's a bug there.
I've read amcheck sources thoroughly and found that the problem appears at
"invariant_key_less_than_equal_nontarget_offset()


static bool
invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
   Page nontarget, ScanKey key,
   OffsetNumber upperbound)
{
int16natts = state->rel->rd_rel->relnatts;
int32cmp;

cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);

return cmp <= 0;
}

It uses scankey, made with _bt_mkscankey() which uses only key 
attributes, but calls _bt_compare with wrong keysz.
If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of 
natts, all the checks would be passed successfully.


Same for invariant_key_greater_than_equal_offset() and 
invariant_key_less_than_equal_nontarget_offset().


In my view, it's the correct way to fix this problem, because the caller 
is responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it an 
assertion (see the patch attached).


I'll add a flag to distinguish regular and truncated tuples, but it will 
not be used in this patch. Please, comment, if I've missed something.
As you've already mentioned, neither high keys, nor tuples on internal 
pages are using  "itup->t_tid.ip_posid", so I'll take one bit of it.


It will definitely require changes in the future works on suffix 
truncation or something like that, but IMHO for now it's enough.


Do you have any objections or comments?

[1] https://commitfest.postgresql.org/9/494/

--
Anastasia Lubennikova
Pos

Re: [HACKERS][PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-04 Thread Anastasia Lubennikova

27.02.2016 09:57, Vitaly Burovoy:

Hello, Hackers!

I worked on a patch[1] allows "EXTRACT(epoch FROM
+-Inf::timestamp[tz])" to return "+-Inf::float8".
There is an opposite function "to_timestamp(float8)" which now defined as:
SELECT ('epoch'::timestamptz + $1 * '1 second'::interval)


Hi,
thank you for the patches.
Could you explain, whether they depend on each other?


Since intervals do not support infinity values, it is impossible to do
something like:

SELECT to_timestamp('infinity'::float8);

... which is not good.

Supporting of such converting is in the TODO list[2] (by "converting
between infinity timestamp and float8").


You mention intervals here, and TODO item definitely says about 
'infinity' interval,

while patch and all the following discussion concerns to timestamps.
Is it a typo or I misunderstood something important?
I assumed that following query will work, but it isn't. Could you 
clarify that?

select to_timestamp('infinity'::interval);


Proposed patch implements it.

There is an other patch in the CF[3] 2016-03 implements checking of
timestamp[tz] for being in allowed range. Since it is wise to set
(fix) the upper boundary of timestamp[tz]s, I've included the file
"src/include/datatype/timestamp.h" from there to check that an input
value and a result are in the allowed range.

There is no changes in a documentation because allowed range is the
same as officially supported[4] (i.e. until 294277 AD).


I think that you should update documentation. At least description of 
epoch on this page:

http://www.postgresql.org/docs/devel/static/functions-datetime.html

Here is how you can convert an epoch value back to a time stamp:

SELECT TIMESTAMP WITH TIME ZONE 'epoch' + 982384720.12 * INTERVAL '1 second';

(The |to_timestamp| function encapsulates the above conversion.)


More thoughts about the patch:

1. When I copy value from hints for min and max values (see examples 
below), it works fine for min, while max still leads to error.
It comes from the check   "if (seconds >= epoch_ubound)". I wonder, 
whether you should change hint message?


select to_timestamp(-210866803200.00);
  to_timestamp
-
 4714-11-24 02:30:17+02:30:17 BC
(1 row)


select to_timestamp(9224318016000.00);
ERROR:  UNIX epoch out of range: "9224318016000.00"
HINT:  Maximal UNIX epoch value is "9224318016000.00"

2. There is a comment about JULIAN_MAXYEAR inaccuracy in timestamp.h:

 * IS_VALID_JULIAN checks the minimum date exactly, but is a bit sloppy
 * about the maximum, since it's far enough out to not be especially
 * interesting.

Maybe you can expand it?
- Is JULIAN_MAXYEAR4STAMPS helps to avoid overflow in all possible cases?
- Why do we need to hold both definitions? I suppose, it's a matter of 
backward compatibility, isn't it?


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice 
abbreviation, but it seems slightly confusing to me.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS]WIP: Covering + unique indexes.

2016-03-01 Thread Anastasia Lubennikova


29.02.2016 18:17, Anastasia Lubennikova:

25.02.2016 21:39, Jeff Janes:
As promised, here's the new version of the patch 
"including_columns_4.0".

I fixed all issues except some points mentioned below.

Thanks for the update patch.  I get a compiler warning:

genam.c: In function 'BuildIndexValueDescription':
genam.c:259: warning: unused variable 'tupdesc'


Thank you for the notice, I'll fix it in the next update.

Also, I can't create a primary key INCLUDING columns directly:

jjanes=# create table foobar (a int, b int, c int);
jjanes=# alter table foobar add constraint foobar_pkey primary key
(a,b) including (c);
ERROR:  syntax error at or near "including"

But I can get there using a circuitous route:

jjanes=# create unique index on foobar (a,b) including (c);
jjanes=# alter table foobar add constraint foobar_pkey primary key
using index foobar_a_b_c_idx;

The description of the table's index knows to include the including 
column:


jjanes=# \d foobar
 Table "public.foobar"
  Column |  Type   | Modifiers
+-+---
  a  | integer | not null
  b  | integer | not null
  c  | integer |
Indexes:
 "foobar_pkey" PRIMARY KEY, btree (a, b) INCLUDING (c)


Since the machinery appears to all be in place to have primary keys
with INCLUDING columns, it would be nice if the syntax for adding
primary keys allowed one to implement them directly.

Is this something or future expansion, or could it be added at the
same time as the main patch?


Good point.
At quick glance, this looks easy to implement it. The only problem is 
that there are too many places in code which must be updated.
I'll try to do it, and if there would be difficulties, it's fine with 
me to delay this feature for the future work.


I found one more thing to do. Pgdump does not handle included columns 
now. I will fix it in the next version of the patch.




As promised, fixed patch is in attachments. It allows to perform 
following statements:


create table utbl (a int, b box);
alter table utbl add unique (a) including(b);
create table ptbl (a int, b box);
alter table ptbl add primary key (a) including(b);

And now they can be dumped/restored successfully.
I used following settings
pg_dump --verbose -Fc postgres -f pg.dump
pg_restore -d newdb pg.dump

It is not the final version, because it breaks pg_dump for previous 
versions. I need some help from hackers here.

pgdump. line 5466
if (fout->remoteVersion >= 90400)

What does 'remoteVersion' mean? And what is the right way to change it? 
Or it changes between releases?
I guess that 90400 is for 9.4 and 80200 is for 8.2 but is it really so? 
That is totally new to me.
BTW, While we are on the subject, maybe it's worth to replace these 
magic numbers with some set of macro?


P.S. I'll update documentation for ALTER TABLE in the next patch.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-03-28 Thread Anastasia Lubennikova

21.03.2016 19:53, Anastasia Lubennikova:

19.03.2016 08:00, Peter Geoghegan:
On Fri, Mar 18, 2016 at 5:15 AM, David Steele <da...@pgmasters.net> 
wrote:
It looks like this patch should be marked "needs review" and I have 
done so.

Uh, no it shouldn't. I've posted an extensive review on the original
design thread. See CF entry:

https://commitfest.postgresql.org/9/433/

Marked "Waiting on Author".

Thanks to David,
I've missed these letters at first.
I'll answer here.


* You truncate (remove suffix attributes -- the "included" attributes)
within _bt_insertonpg():

-   right_item = CopyIndexTuple(item);
+   indnatts = IndexRelationGetNumberOfAttributes(rel);
+   indnkeyatts = IndexRelationGetNumberOfKeyAttributes(rel);
+
+   if (indnatts != indnkeyatts)
+   {
+   right_item = index_reform_tuple(rel, item, indnatts, 
indnkeyatts);

+   right_item_sz = IndexTupleDSize(*right_item);
+   right_item_sz = MAXALIGN(right_item_sz);
+   }
+   else
+   right_item = CopyIndexTuple(item);
 ItemPointerSet(&(right_item->t_tid), rbkno, P_HIKEY);

I suggest that you do this within _bt_insert_parent(), instead, iff
the original target page is know to be a leaf page. That's where it
needs to happen for conventional suffix truncation, which has special
considerations when determining which attributes are safe to truncate
(or even which byte in the first distinguishing attribute it is okay
to truncate past)


I agree that _bt_insertonpg() is not right for truncation.
Furthermore, I've noticed that all internal keys are solely the copies 
of "High keys" from the leaf pages. Which is pretty logical.
Therefore, if we have already truncated the tuple, when it became a 
High key, we do not need the same truncation within 
_bt_insert_parent() or any other function.
So the only thing to worry about is the HighKey truncation. I rewrote 
the code. Now only _bt_split cares about truncation.


It's a bit more complicated to add it into index creation algorithm.
There's a trick with a "high key".
/*
 * We copy the last item on the page into the new page, and then
 * rearrange the old page so that the 'last item' becomes its 
high key
 * rather than a true data item.  There had better be at least 
two
 * items on the page already, else the page would be empty of 
useful

 * data.
 */
/*
 * Move 'last' into the high key position on opage
 */

To be consistent with other steps of algorithm ( all high keys must be 
truncated tuples), I had to update this high key on place:

delete the old one, and insert truncated high key.
The very same logic I use to truncate posting list of a compressed 
tuple in the "btree_compression" patch. [1]
I hope, both patches will be accepted, and then I'll thoroughly merge 
them .



* I think the comparison logic may have a bug.

Does this work with amcheck? Maybe it works with bt_index_check(), but
not bt_index_parent_check()? I think that you need to make sure that
_bt_compare() knows about this, too. That's because it isn't good
enough to let a truncated internal IndexTuple compare equal to a
scankey when non-truncated attributes are equal.


It is a very important issue. But I don't think it's a bug there.
I've read amcheck sources thoroughly and found that the problem 
appears at

"invariant_key_less_than_equal_nontarget_offset()


static bool
invariant_key_less_than_equal_nontarget_offset(BtreeCheckState *state,
   Page nontarget, ScanKey 
key,

   OffsetNumber upperbound)
{
int16natts = state->rel->rd_rel->relnatts;
int32cmp;

cmp = _bt_compare(state->rel, natts, key, nontarget, upperbound);

return cmp <= 0;
}

It uses scankey, made with _bt_mkscankey() which uses only key 
attributes, but calls _bt_compare with wrong keysz.
If we wiil use nkeyatts = state->rel->rd_index->relnatts; instead of 
natts, all the checks would be passed successfully.


Same for invariant_key_greater_than_equal_offset() and 
invariant_key_less_than_equal_nontarget_offset().


In my view, it's the correct way to fix this problem, because the 
caller is responsible for passing proper arguments to the function.
Of course I will add a check into bt_compare, but I'd rather make it 
an assertion (see the patch attached).


I'll add a flag to distinguish regular and truncated tuples, but it 
will not be used in this patch. Please, comment, if I've missed 
something.
As you've already mentioned, neither high keys, nor tuples on internal 
pages are using  "itup->t_tid.ip_posid", so I'll take one bit of it.


It will definitely require changes in the future works on suffix 
truncation or something like that, but IMHO for now it's enough.


Do you have any objections or comments?

[1] https://commitfest.pos

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-03-28 Thread Anastasia Lubennikova

25.03.2016 01:12, Peter Geoghegan:

On Thu, Mar 24, 2016 at 7:17 AM, Robert Haas <robertmh...@gmail.com> wrote:

I really like this idea, and the performance results seem impressive,
but I think we should push this out to 9.7.  A btree patch that didn't
have WAL support until two and a half weeks into the final CommitFest
just doesn't seem to me like a good candidate.  First, as a general
matter, if a patch isn't code-complete at the start of a CommitFest,
it's reasonable to say that it should be reviewed but not necessarily
committed in that CommitFest.

You're right.
Frankly, I thought that someone will help me with the path, but I had to 
finish it myself.

*off-topic*
I wonder, if we can add new flag to commitfest. Something like "Needs 
assistance",

which will be used to mark big and complicated patches in progress.
While "Needs review" means that the patch is almost ready and only 
requires the final review.



   This patch has had some review, but I'm
not sure how deep that review is, and I think it's had no code review
at all of the WAL logging changes, which were submitted only a week
ago, well after the CF deadline.  Second, the btree AM is a
particularly poor place to introduce possibly destabilizing changes.
Everybody depends on it, all the time, for everything.  And despite
new tools like amcheck, it's not a particularly easy thing to debug.

Regrettably, I must agree. I don't see a plausible path to commit for
this patch in the ongoing CF.

I think that Anastasia did an excellent job here, and I wish I could
have been of greater help sooner. Nevertheless, it would be unwise to
commit this given the maturity of the code. There have been very few
instances of performance improvements to the B-Tree code for as long
as I've been interested, because it's so hard, and the standard is so
high. The only example I can think of from the last few years is
Kevin's commit 2ed5b87f96 and Tom's commit 1a77f8b63d both of which
were far less invasive, and Simon's commit c7111d11b1, which we just
outright reverted from 9.5 due to subtle bugs (and even that was
significantly less invasive than this patch). Improving nbtree is
something that requires several rounds of expert review, and that's
something that's in short supply for the B-Tree code in particular. I
think that a new testing strategy is needed to make this easier, and I
hope to get that going with amcheck. I need help with formalizing a
"testing first" approach for improving the B-Tree code, because I
think it's the only way that we can move forward with projects like
this. It's *incredibly* hard to push forward patches like this given
our current, limited testing strategy.


Unfortunately, I must agree. This patch seems to be far from final 
version until the feature freeze.

I'll move it to the future commitfest.

Anyway it means, that now we have more time to improve the patch.
If you have any ideas related to this patch like prefix/suffix 
compression, I'll be glad to discuss them.

Same for any other ideas of B-tree optimization.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Supporting +-Infinity values by to_timestamp(float8)

2016-03-29 Thread Anastasia Lubennikova

17.03.2016 06:27, Vitaly Burovoy:

On 2016-03-15, David Steele <da...@pgmasters.net> wrote:

On 3/4/16 2:56 PM, Vitaly Burovoy wrote:

On 3/4/16, Anastasia Lubennikova <a.lubennik...@postgrespro.ru> wrote:


I think that you should update documentation. At least description of
epoch on this page:
http://www.postgresql.org/docs/devel/static/functions-datetime.html

Thank you very much for pointing where it is located (I saw only
"to_timestamp(TEXT, TEXT)").
I'll think how to update it.

Vitaly, have you decided how to update this yet?

Yes, there are three versions:
* remove mentioning how to get timestamptz from UNIX stamp;
* leave a note how to get timestamptz and add a note that such
encapsulation existed prior to 9.6;
* replace to the proposed current behavior (without interval).

I decided to implement the third case (but a result there has a time
zone which can be different, so the result can be not exactly same for
a concrete user). If a committer decides to do somehow else, he is
free to choose one from the list above or to do something else.


3. (nitpicking) I don't sure about "4STAMPS" suffix. "4" is nice
abbreviation, but it seems slightly confusing to me.

It doesn't matter for me what it is called, it is short enough and
reflects a type on which it is applied.
What would the best name be for it?

Anastasia, any suggestions for a better name, or just leave it as is?

I'm not in favor of the "4", either.  I think I would prefer
JULIAN_MAXYEAR_STAMP.

It turns out that Tom has changed almost one third of timestamp.h and
now the constant has a name "TIMESTAMP_END_JULIAN".

I've rebased the patch to the current master (5db5146) and changed it
according to the new timestamp.h.

Now it passes both versions (integer and float timestamps).
I deleted test for the upper boundary for integer timestamps, changed
a little comments.

I decided to delete hints about minimum and maximum allowed values
because no one type has such hint.

Please find attached a new version of the patch.



I think, I should write something as a reviewer.
I read the patch again and I don't see any issues with it.
It applies to the master and works as expected. So I'll mark it as 
"Ready for committer"


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Processes and caches in postgresql

2016-04-27 Thread Anastasia Lubennikova

Hi, hackers.
There's a couple of questions about processes.

I found EXEC_BACKEND flag, while reading the code.
As I understood, it exists because we have to emulate fork() on WIN32.
And also it allows to debug the same behavior on Linux.
Is it right? Are there any other use cases?

Another question is about "--fork" argument (see code below).
I didn't find it in documentation, so I'm a bit confused.
I wonder if it exists only for internal purposes?

#ifdef EXEC_BACKEND
if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);/* does not return */
#endif

And the last, but not least. Do we have any 
presentations/articles/READMEs/whatever

about caches (src/backend/utils/cache/*) in postgresql?
I found nothing, besides comments in the code.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Leaking memory in text_overlay function

2016-08-04 Thread Anastasia Lubennikova
I found this thread on the open CF. As I see, the discussion is ended 
with decision to reject patch.


I've just changed the status to "Rejected".


02.07.2016 18:11, Dirk Rudolph:
Well that's good to know. It was just curious that, in my case, I only 
saw this in this particular function.


Anyway. I will consider to handle the memory the same way, if this is 
the recommendation.


Many thanks.

/Closed

On Sat, Jul 2, 2016 at 4:45 PM, Tom Lane <t...@sss.pgh.pa.us 
<mailto:t...@sss.pgh.pa.us>> wrote:


Dirk Rudolph <dirk.rudo...@netcentric.biz
<mailto:dirk.rudo...@netcentric.biz>> writes:
> while implementing my own C function, I mentioned that some
memory is not freed by the text_overlay function in varlena.c

By and large, that's intentional.  SQL-called functions normally run
in short-lived memory contexts, so that any memory they don't
bother to
pfree will be cleaned up automatically at the end of the tuple cycle.
If we did not follow that approach, there would need to be many
thousands
more explicit pfree calls than there are today, and the system
would be
net slower overall because multiple retail pfree's are slower than a
MemoryContextReset.

There are places where it's important to avoid leaking memory,
certainly.
But I don't think this is one of them, unless you can demonstrate a
scenario with query-lifespan or worse leakage.

> Particularly I mean the both substrings allocated by
text_substring (according to the documentation of text_substring
"The result is always a freshly palloc'd datum.") and one of the
concatenation results. I’m aware of the MemoryContext being
deleted in my case but IMHO builtin functions should be memory safe.

That is not the project policy.

regards, tom lane




--

Dirk Rudolph | Senior Software Engineer

Netcentric AG

M: +41 79 642 37 11
D: +49 174 966 84 34

dirk.rudo...@netcentric.biz <mailto:dirk.rudo...@netcentric.biz> | 
www.netcentric.biz <http://www.netcentric.biz/>




--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



[HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

There is a number of problems I see in the existing code:
Problem 1. Heap != Relation.

File heapam.c has a note:
 *  This file contains the heap_ routines which implement
 *  the POSTGRES heap access method used for all POSTGRES
 *  relations.
This statement is wrong, since we also have index relations,
that are implemented using other access methods.

Also I think that it's quite strange to have a function heap_create(), 
that is called
from index_create(). It has absolutely nothing to do with heap access 
method.


And so on, and so on.

One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

Problem 2.
Some functions are shared between heap and indexes access methods.
Maybe someday it used to be handy, but now, I think, it's time to separate
them, because IndexTuples and HeapTuples have really little in common.

Problem 3. The word "heap" is used in many different contexts.
Heap - a storage where we have tuples and visibility information.
Generally speaking, it can be implemented over any data structure,
not just a plain table as it is done now.

Heap - an access method, that provides a set of routines for plain table
storage, such as insert, delete, update, gettuple, vacuum and so on.
We use this access method not only for user's tables.

There are many types of relations (pg_class.h):
#define  RELKIND_RELATION'r'/* ordinary table */
#define  RELKIND_INDEX   'i'/* secondary index */
#define  RELKIND_SEQUENCE'S'/* sequence object */
#define  RELKIND_TOASTVALUE  't'/* for out-of-line 
values */

#define  RELKIND_VIEW'v'/* view */
#define  RELKIND_COMPOSITE_TYPE  'c'/* composite type */
#define  RELKIND_FOREIGN_TABLE   'f'/* foreign table */
#define  RELKIND_MATVIEW 'm'/* materialized view */

They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Now, for some reasons, we sometimes use name "heap" for both
"primary storage" and "virual relations". See src/backend/catalog/heap.c.
But some functions work only with "primary storage". See pgstat_relation().
And we constantly have to check relkind everywhere.
I'd like it would be clear from the code interface and naming.

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

Patch 1:
There is a macro "PageGetItem" in bufpage.h that is used to retrieve an item
from the given page. It is used for both heap and index tuples.
It doesn't seems a problem, until you are going to find anything in this 
code.


The first patch "PageGetItem_refactoring.patch" is intended to fix it.
It is just renaming:
(IndexTuple) PageGetItem ---> PageGetItemIndex
(HeapTupleHeader) PageGetItem ---> PageGetItemHeap
Other types of tuples (BrinTuple, SpGistInnerTuple, SpGistLeafTuple, 
SpGistDeadTuple)

still use PageGetItem.

I also changed functions that do not access tuple's data, but only need
HeapTupleHeader fields. They use a macro PageGetItemHeapHeaderOnly.

I do not insist on these particular names, by the way.

Patch 2.
heapam.c, hio.c and src/backend/catalog/heap.c have a lot of confusing names
and outdated comments. The patch is intended to improve it.
Fun fact: I found several "check it later" comments unchanged since
 "Postgres95 1.01 Distribution - Virgin Sources" commit.

I wonder if we can wind better name relation_drop_with_catalog() since, 
again, it's
not about all kinds of relations. We use another function to drop index 
relations.


I hope these changes will be useful for the future development.
Suggested patches are mostly about renaming and rearrangement of functions
between files, so, if nobody has conceptual objections, all I need from 
reviewers

is an attentive look to find typos, grammar mistakes and overlooked areas.

--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/pageinspect/btreefuncs.c b/contrib/pageinspect/btreefuncs.c
index 4983bbb..257b609 100644
--- a/contrib/pageinspect/btreefuncs.c
+++ b/contrib/pageinspect/btreefuncs.c

Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-05 Thread Anastasia Lubennikova

30.07.2016 14:00, Andrew Borodin:

Here is BRIN-compatible version of patch. Now function
PageIndexTupleOverwrite consumes tuple size as a parameter, this
behavior is coherent with PageAddItem.
Also, i had to remove asserstion that item has a storage in the loop
that adjusts line pointers. It would be great if someone could check
that place (commented Assert(ItemIdHasStorage(ii)); in
PageIndexTupleOverwrite). I suspect that there might be a case when
linp's should not be adjusted.


Hi, I reviewed the code and I have couple of questions about
following code:

/* may have negative size here if new tuple is larger */
size_diff = oldsize - alignednewsize;
offset = ItemIdGetOffset(tupid);

if (offset < phdr->pd_upper || (offset + size_diff) > 
phdr->pd_special ||

offset != MAXALIGN(offset) || size_diff != MAXALIGN(size_diff))
ereport(ERROR,
(errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("corrupted item offset: offset = %u, size = %u",
offset, (unsigned int) size_diff)));

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.

I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add  the check: (offset+size_diff < pd_lower).

Besides that moment, the patch seems pretty good.

BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-05 Thread Anastasia Lubennikova

05.08.2016 16:30, Kevin Grittner:

On Fri, Aug 5, 2016 at 2:54 AM, Anastasia Lubennikova
<a.lubennik...@postgrespro.ru> wrote:


They can be logically separated into three categories:
"primary storage" - r, S, t, v. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, m. They have no physical storage, only entries
in caches and catalogs.

Materialized views (relkind == "m") have physical storage, and may have indexes.


Good point. I сonfused letters for views (virtual relations) and
materialized views (primary storage).
Should be:

"primary storage" - r, S, t, m. They store data and visibility information.
The only implementation is heapam.c
"secondary index" - i. They store some data and pointers to primary storage.
Various existing AMs and managed by AM API.
"virtual relations" - c, f, v. They have no physical storage, only entries
in caches and catalogs.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-08 Thread Anastasia Lubennikova

06.08.2016 19:51, Andrew Borodin:

Anastasia , thank you for your attentive code examine.

2016-08-05 21:19 GMT+05:00 Anastasia Lubennikova <a.lubennik...@postgrespro.ru>:

First of all, shouldn't we use MAXALIGN(oldsize) instead of oldsize?
Although, I'm quite sure that it was already aligned somewhere before.
I doubt that the check (size_diff != MAXALIGN(size_diff)) is necessary.
I'd rather add  the check: (offset+size_diff < pd_lower).

Actually, that's a tricky question. There may be different code expectations.
1. If we expect that not-maxaligned tuple may be placed by any other
routine, we should remove check (size_diff != MAXALIGN(size_diff)),
since it will fail for not-maxaligned tuple.
2. If we expect that noone should use PageIndexTupleOverwrite with
not-maxaligned tuples, than current checks are OK: we will break
execution if we find non-maxaligned tuples. With an almost correct
message.
I suggest that this check may be debug-only assertion: in a production
code routine will work with not-maxaligned tuples if they already
reside on the page, in a dev build it will inform dev that something
is going wrong. Is this behavior Postgres-style compliant?


I agree that pd_lower check makes sense.


Pointing out to this comparison I worried about possible call of
MAXALIGN for negative size_diff value. Although I don't sure if it can 
cause any problem.


Tuples on a page are always maxaligned.
But, as far as I recall,  itemid->lp_len can contain non-maxaligned value.

So, I'd suggest you to perform MAXALIGN(oldsize) before computing size_diff:
 size_diff = oldsize - alignednewsize;

Since both arguments are maxaligned the check of size_diff is not needed.


BTW, I'm very surprised that it improves performance so much.
And also size is reduced significantly. 89MB against 289MB without patch.
Could you explain in details, why does it happen?

Size reduction is unexpected for me.
There might be 4 plausible explanations. I'll list them ordered by
descending of probability:
1. Before this patch every update was throwing recently updated tuple
to the end of a page. Thus, in some-how ordered data, recent best-fit
will be discovered last. This can cause increase of MBB's overlap in
spatial index and slightly higher tree. But 3 times size decrease is
unlikely.
How did you obtained those results? Can I look at a test case?


Nothing special, I've just run the test you provided in this thread.
And check index size via select pg_size_pretty(pg_relation_size('idx'));


2. Bug in PageIndexTupleDelete causing unused space emersion. I've
searched for it, unsuccessfully.
3. Bug in PageIndexTupleOVerwrite. I cannot imagine nature of such a
bug. May be we are not placing something not very important and big on
a page?
4. Magic.
I do not see what one should do with the R-tree to change it's size by
a factor of 3. First three explanations are not better that forth,
actually.
Those 89 MB, they do not include WAL, right?


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Patch] Temporary tables that do not bloat pg_catalog (a.k.a fast temp tables)

2016-08-08 Thread Anastasia Lubennikova

05.08.2016 19:41, Robert Haas:


2. This inserts additional code in a bunch of really low-level places
like heap_hot_search_buffer, heap_update, heap_delete, etc.  I think
what you've basically done here is create a new, in-memory heap AM and
then, because we don't have an API for adding new storage managers,
you've bolted it onto the existing heapam layer.  That's certainly a
reasonable approach for creating a PoC, but I think we actually need a
real API here.  Otherwise, when the next person comes along and wants
to add a third heap implementation, they've got to modify all of these
same places again.  I don't think this code is reasonably maintainable
in this form.


As I can see, you recommend to clean up the API of storage
management code. I strongly agree that it's high time to do it.

So, I started the discussion about refactoring and improving API
of heapam and heap relations.
You can find it on commitfest:
https://commitfest.postgresql.org/10/700/

I'll be glad to see your thoughts on the thread.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

05.08.2016 20:56, Alvaro Herrera:

Anastasia Lubennikova wrote:

Working on page compression and some other issues related to
access methods, I found out that the code related to heap
looks too complicated. Much more complicated, than it should be.
Since I anyway got into this area, I want to suggest a set of improvements.

Hm.  I'm hacking on heapam.c pretty heavily ATM.  Your first patch
causes no problem I think, or if it does it's probably pretty minor, so
that's okay.  I'm unsure about the second one -- I think it's okay too,
because it mostly seems to be about moving stuff from heapam.c to hio.c
and shuffling some code around that I don't think I'm modifying.


I'm sure these patches will not cause any problems. The second one is
mostly about moving unrelated stuff from heapam.c to hio.c and renaming
couple of functions in heap.c.
Anyway, the are not a final solution just proof of concept.

By the way, I thought about looking at the mentioned patch and
probably reviewing it, but didn't find any of your patches on the
current commitfest. Could you point out the thread?


Also I think that it's quite strange to have a function heap_create(), that
is called
from index_create(). It has absolutely nothing to do with heap access
method.

Agreed.  But changing its name while keeping it in heapam.c does not
really improve things enough.  I'd rather have it moved elsewhere that's
not specific to "heaps" (somewhere in access/common perhaps).  However,
renaming the functions would break third-party code for no good reason.
I propose that we only rename things if we also break it for other
reasons (say because the API changes in a fundamental way).


Yes, I agree that it should be moved to another file.
Just to be more accurate: it's not in heapam.c now, it is in
"src/backend/catalog/heap.c" which requires much more changes
than I did.

Concerning to the third-party code. It's a good observation and we
definitely should keep it in mind. Nevertheless, I doubt that there's a 
lot of

external calls to these functions. And I hope this thread will end up with
fundamental changes introducing clear API for all mentioned problems.




One more thing that requires refactoring is "RELKIND_RELATION" name.
We have a type of relation called "relation"...

I don't see us fixing this bit, really.  Historical baggage and all
that.  For example, a lot of SQL queries use "WHERE relkind = 'r'".  If
we change the name, we should probably also change the relkind constant,
and that would break the queries.  If we change the name and do not
change the constant, then we have to have a comment "we call them
RELKIND_TABLE but the char is r because it was called RELATION
previously", which isn't any better.


Good point.
I'd rather change it to RELKIND_BASIC_RELATION or something like that,
which is more specific and still goes well with 'r' constant. But minor 
changes

like that can wait until we clarify the API in general.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-08-08 Thread Anastasia Lubennikova

08.08.2016 03:51, Michael Paquier:

On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote:

Anastasia Lubennikova wrote:

So there is a couple of patches. They do not cover all mentioned problems,
but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.

Well, I am interested in the topic. And just had a look at the patches proposed.

+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Covering + unique indexes.

2016-08-15 Thread Anastasia Lubennikova

14.08.2016 20:11, Andrey Borodin:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   tested, passed
Documentation:tested, passed

Hi hackers!

I've read the patch and here is my code review.

==PURPOSE
I've used this feature from time to time with MS SQL. From my experience 
INCLUDE is a 'sugar on top' feature.
Some MS SQL classes do not even mention INCLUDE despite it's there from 2005 
(though classes do not mention lots of important things, so it's not kind of 
valuable indicator).
But those who use it, use it whenever possible. For example, system view with 
recommended indices rarely list one without INCLUDE columns.
So, this feature is very important from perspective of converting MS SQL DBAs 
to PostgreSQL. This is how I see it.


Thank you for the review, I hope this feature will be useful for many 
people.



SUGGESTIONS==
0. Index build is broken. This script 
https://github.com/x4m/pggistopt/blob/8ad65d2e305e98c836388a07909af5983dba9c73/test.sql
 SEGFAULTs and may cause situation when you cannot insert anything into table 
(I think drop of index would help, but didn't tested this)


Thank you for reporting. That was a bug caused by high key truncation, 
that occurs

when index has more than 3 levels.
Fixed. See attached file.


1. I think MS SQL syntax INCLUDE instead of INCLUDING would be better (for a 
purpose listed above)


I've chosen this particular name to avoid using of new keyword. We 
already have INCLUDING
in postgres in a context of inheritance that will never intersect with 
covering indexes.

I'm sure it won't be a big problem of migration from MsSQL.


2. Empty line added in ruleutils.c. Is it for a reason?


No, just a missed line.
Fixed.


3. Now we have indnatts and indnkeyatts instead of indnatts. I think it is 
worth considering renaming indnatts to something different from old name. 
Someone somewhere could still suppose it's a number of keys.


I agree that naming became vague after this patch.
I've already suggested to replace "indkeys[]" with more specific name, and
AFAIR there was no reaction. So I didn't do that.
But I don't sure about your suggestion regarding indnatts. Old queries 
(and old indexes)
can still use it correctly. I don't see a reason to break compatibility 
for all users.
Those who will use this new feature, should ensure that their queries to 
pg_index

behave as expected.


PERFORMANCE==
Due to suggestion number 0 I could not measure performance of index build. 
Index crashes when there's more than 1.1 million of rows in a table.
Performance test script is here 
https://github.com/x4m/pggistopt/blob/f206b4395baa15a2fa42897eeb27bd555619119a/test.sql
Test scenario is following:
1. Create table, then create index, then add data.
2. Make a query touching data in INCLUDING columns.
This scenario is tested against table with:
A. Table with index, that do not contain touched columns, just PK.
B. Index with all columns in index.
C. Index with PK in keys and INCLUDING all other columns.

Tests were executed 5 times on Ubuntu VM under Hyper-V i5 2500 CPU, 16 Gb of 
RAM, SSD disk.
Time to insert 10M rows:
A. AVG 110 seconds STD 4.8
B. AVG 121 seconds STD 2.0
C. AVG 111 seconds STD 5.7
Inserts to INCLUDING index is almost as fast as inserts to index without extra 
columns.

Time to run SELECT query:
A. AVG 2864 ms STD 794
B. AVG 2329 ms STD 84
C. AVG 2293 ms STD 58
Selects with INCLUDING columns is almost as fast as with full index.

Index size (deterministic measure, STD = 0)
A. 317 MB
B. 509 MB
C. 399 MB
Index size is in the middle between full index and minimal index.

I think this numbers agree with expectation from the feature.

CONCLUSION==
This patch brings useful and important feature. Build shall be repaired; other 
my suggestions are only suggestions.



Best regards, Andrey Borodin, Octonica & Ural Federal University.

The new status of this patch is: Waiting on Author



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 9c8e308..891325d 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -100,7 +100,7 @@ static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
 static void deleteConnection(const char *name);
-static char **get_pkey_attnames(Relation rel, int16 *numatts);
+static char **get_pkey_attnames(Relation rel, int16 *indnkeyatts);
 static char **get_text_array_contents(ArrayType *array, int *numitems);
 static char *get_sql_insert(Relation rel, int *pkattnums, int pknumatts, char **src_pkattvals, char **tgt_pkattvals);
 static char *get_sql_delete(Relation rel, int *pkattnums,

Re: [HACKERS] Refactoring of heapam code.

2016-08-15 Thread Anastasia Lubennikova

08.08.2016 12:43, Anastasia Lubennikova:

08.08.2016 03:51, Michael Paquier:
On Sat, Aug 6, 2016 at 2:56 AM, Alvaro Herrera 
<alvhe...@2ndquadrant.com> wrote:

Anastasia Lubennikova wrote:
So there is a couple of patches. They do not cover all mentioned 
problems,

but I'd like to get a feedback before continuing.

I agree that we could improve things in this general area, but I do not
endorse any particular changes you propose in these patches; I haven't
reviewed your patches.
Well, I am interested in the topic. And just had a look at the 
patches proposed.


+ /*
+ * Split PageGetItem into set of different macros
+ * in order to make code more readable.
+ */
+#define PageGetItemHeap(page, itemId) \
+( \
+   AssertMacro(PageIsValid(page)), \
+   AssertMacro(ItemIdHasStorage(itemId)), \
+   (HeapTupleHeader)(((char *)(page)) + ItemIdGetOffset(itemId)) \
+)
Placing into bufpage.h macros that are specific to heap and indexes is
not that much a good idea. I'd suggest having the heap ones in
heapam.h, and the index ones in a more generic place, as
src/access/common as already mentioned by Alvaro.

+   onpage_tup = PageGetItemHeapHeaderOnly(page, newitemid);
Just PageGetItemHeapHeader would be fine.

@@ -2324,7 +2087,6 @@ FreeBulkInsertState(BulkInsertState bistate)
 pfree(bistate);
  }

-
  /*
   * heap_insert - insert tuple into a heap
Those patches have some noise.

Patch 0002 is doing two things: reorganizing the order of the routines
in heapam.c and move some routines from heapam.c to hio.c. Those two
could be separate patches/

If the final result is to be able to extend custom AMs for tables, we
should have a structure like src/access/common/index.c,
src/access/common/table.c and src/access/common/relation.c for
example, and have headers dedicated to them. But yes, there is
definitely a lot of room for refactoring as we are aiming, at least I
guess so, at having more various code paths for access methods.


Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more concrete.

Thank you for attention and feedback.
Since there are no objections to the idea in general, I'll write an 
exhaustive

README about current state of the code and then propose API design
to discuss details.

Stay tuned.



Here is the promised design draft.
https://wiki.postgresql.org/wiki/HeapamRefactoring

Looking forward to your comments.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2016-08-15 Thread Anastasia Lubennikova
alone.  This is somewhat messy in the
case of heap_multi_insert because it returns several items; I think it's
acceptable to return an array of ItemPointers in the same order as the
input tuples.  This works fine for the only caller, which is COPY in
batch mode.  For the other routines, they don't really care where the
TID is returned AFAICS.


Additional noteworthy items:

i) Speculative insertion: the speculative insertion token is no longer
installed directly in the heap tuple by the executor (of course).
Instead, the token becomes part of the slot.  When the tuple_insert
method is called, the insertion routine is in charge of setting the
token from the slot into the storage tuple.  Executor is in charge of
calling method->speculative_finish() / abort() once the insertion has
been confirmed by the indexes.

ii) execTuples has additional accessors for tuples-in-slot, such as
ExecFetchSlotTuple and friends.  I expect to have some of them to return
abstract StorageTuples, others HeapTuple or MinimalTuples (possibly
wrapped in Datum), depending on callers.  We might be able to cut down
on these later; my first cut will try to avoid API changes to keep
fallout to a minimum.

iii) All tuples need to be identifiable by ItemPointers.  Storages that
have different requirements will need careful additional thought across
the board.

iv) System catalogs cannot use pluggable storage.  We continue to use
heap_open etc in the DDL code, in order not to make this more invasive
that it already is.  We may lift this restriction later for specific
catalogs, as needed.

v) Currently, one Buffer may be associated with one HeapTuple living in a
slot; when the slot is cleared, the buffer pin is released.  My current
patch moves the buffer pin to inside the heapam-based storage AM and the
buffer is released by the ->slot_clear_tuple method.  The rationale for
doing this is that some storage AMs might want to keep several buffers
pinned at once, for example, and must not to release those pins
individually but in batches as the scan moves forwards (say a batch of
tuples in a columnar storage AM has column values spread across many
buffers; they must all be kept pinned until the scan has moved past the
whole set of tuples).  But I'm not really sure that this is a great
design.


I welcome comments on these ideas.  My patch for this is nowhere near
completion yet; expect things to change for items that I've overlooked,
but I hope I didn't overlook any major.  If things are handwavy, it is
probably because I haven't fully figured them out yet.



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2016-08-17 Thread Anastasia Lubennikova
es forwards (say a batch of
tuples in a columnar storage AM has column values spread across many
buffers; they must all be kept pinned until the scan has moved past the
whole set of tuples).  But I'm not really sure that this is a great
design.


Frankly, I doubt that it's real to implement columnar storage just as
a variant of pluggable storage. It requires a lot of changes in executor
and optimizer and so on, which are hardly compatible with existing
tuple-oriented model. However I'm not so good in this area, so if you
feel that it's possible, go ahead.


I welcome comments on these ideas.  My patch for this is nowhere near
completion yet; expect things to change for items that I've overlooked,
but I hope I didn't overlook any major.  If things are handwavy, it is
probably because I haven't fully figured them out yet.


Thank you again for beginning the big project.
Looking forward to the prototype. I think it will make the discussion
more concrete and useful.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cast jsonb to numeric, int, float, bool

2017-02-01 Thread Anastasia Lubennikova
Now the simplest way to extract booleans and numbers from json/jsonb is 
to cast it

to text and then cast to the appropriate type:

postgres=# select 'true'::jsonb::text::bool;
 bool
--
 t
postgres=# select '1.0'::jsonb::text::numeric;
 numeric
-
 1.0


This patch implements direct casts from jsonb numeric (jbvNumeric) to 
numeric, int4 and float8,

and from jsonb bool (jbvBool) to bool.

postgres=# select 'true'::jsonb::bool;
 bool
--
 t

postgres=# select '1.0'::jsonb::numeric;
 numeric
-
 1.0


Waiting for your feedback.
If you find it useful, I can also add support of json and other types, 
such as smallint and bigint.


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index b9bf18f..4bbe81c 100644
--- a/src/backend/utils/adt/jsonb.c
+++ b/src/backend/utils/adt/jsonb.c
@@ -1939,3 +1939,129 @@ jsonb_object_agg_finalfn(PG_FUNCTION_ARGS)
 
 	PG_RETURN_POINTER(out);
 }
+
+Datum
+jsonb_numeric(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_NUMERIC(v.val.numeric);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_int4(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_INT32(DatumGetInt32(DirectFunctionCall1(numeric_int4,
+		  NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_float8(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvNumeric)
+			PG_RETURN_FLOAT8(DatumGetFloat8(DirectFunctionCall1(numeric_float8,
+			NumericGetDatum(v.val.numeric;
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json numeric")));
+}
+
+Datum
+jsonb_bool(PG_FUNCTION_ARGS)
+{
+	Jsonb	   *in = PG_GETARG_JSONB(0);
+	JsonbIterator *it;
+	JsonbValue	v;
+
+	if (!JB_ROOT_IS_OBJECT(in)
+		&& !(JB_ROOT_IS_ARRAY(in) && !JB_ROOT_IS_SCALAR(in)))
+	{
+		Assert(JB_ROOT_IS_SCALAR(in));
+
+		it = JsonbIteratorInit(>root);
+
+		/*
+		 * A root scalar is stored as an array of one element, so we get the
+		 * array and then its first (and only) member.
+		 */
+		(void) JsonbIteratorNext(, , true);
+		Assert(v.type == jbvArray);
+		(void) JsonbIteratorNext(, , true);
+
+		if (v.type == jbvBool)
+			PG_RETURN_BOOL(v.val.boolean);
+	}
+
+	ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("key value must be json boolean")));
+}
\ No newline at end of file
diff --git a/src/include/catalog/pg_cast.h b/src/include/catalog/pg_cast.h
index 80a40ab..0646f99 100644
--- a/src/include/catalog/pg_cast.h
+++ b/src/include/catalog/pg_cast.h
@@ -377,5 +377,9 @@ DATA(insert ( 1700 1700 1703 i f ));
 /* json to/from jsonb */
 DATA(insert (  114 38020 a i ));
 DATA(insert ( 3802	1140 a i ));
+DATA(insert ( 3802	1700   774 e f ));
+DATA(insert ( 3802	23 775 e f ));
+DATA(insert ( 3802	701776 e f ));
+DATA(insert ( 3802	16 777 e f ));
 
 #endif   /* PG_CAST_H */
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 31c828a..ff7da5b 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2364,6 +2364,15 @@ DESCR("convert int2 to numeric");
 DATA(insert OID = 1783

Re: [HACKERS] IF NOT EXISTS option for CREATE SERVER and CREATE USER MAPPING statements

2017-02-15 Thread Anastasia Lubennikova

13.02.2017 19:34, Andrew Dunstan:


On 01/13/2017 08:36 AM, Anastasia Lubennikova wrote:

I implemented IF NOT EXISTS option for CREATE SERVER and CREATE USER
MAPPING statements
for one of our customers.
I think other users can also find it useful for scripting and
automated tasks.
The patches themselves are small and simple. Documentation is updated
as well.




This looks good and useful. Please add some regression tests.


Done.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/ref/create_server.sgml b/doc/src/sgml/ref/create_server.sgml
index 734c6c9..6777679 100644
--- a/doc/src/sgml/ref/create_server.sgml
+++ b/doc/src/sgml/ref/create_server.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  
 
-CREATE SERVER server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
+CREATE SERVER [IF NOT EXISTS] server_name [ TYPE 'server_type' ] [ VERSION 'server_version' ]
 FOREIGN DATA WRAPPER fdw_name
 [ OPTIONS ( option 'value' [, ... ] ) ]
 
@@ -56,6 +56,19 @@ CREATE SERVER server_name [ TYPE '<
   Parameters
 
   
+
+  
+IF NOT EXISTS
+
+ 
+  Do not throw an error if a server with the same name already exists.
+  A notice is issued in this case.  Note that there is no guarantee that
+  the existing server is anything like the one that would have been
+  created.
+ 
+
+   
+

 server_name
 
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index 6ff8b69..b4ae5de 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -883,12 +883,25 @@ CreateForeignServer(CreateForeignServerStmt *stmt)
 
 	/*
 	 * Check that there is no other foreign server by this name.
+	 * Do nothing if IF NOT EXISTS was enforced.
 	 */
 	if (GetForeignServerByName(stmt->servername, true) != NULL)
-		ereport(ERROR,
-(errcode(ERRCODE_DUPLICATE_OBJECT),
- errmsg("server \"%s\" already exists",
-		stmt->servername)));
+	{
+		if (stmt->if_not_exists)
+		{
+			ereport(NOTICE,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists, skipping",
+			stmt->servername)));
+			heap_close(rel, RowExclusiveLock);
+			return InvalidObjectAddress;
+		}
+		else
+			ereport(ERROR,
+	(errcode(ERRCODE_DUPLICATE_OBJECT),
+	 errmsg("foreign server \"%s\" already exists",
+			stmt->servername)));
+	}
 
 	/*
 	 * Check that the FDW exists and that we have USAGE on it. Also get the
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a4edea0..d007468 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -4655,6 +4655,19 @@ CreateForeignServerStmt: CREATE SERVER name opt_type opt_foreign_server_version
 	n->version = $5;
 	n->fdwname = $9;
 	n->options = $10;
+	n->if_not_exists = false;
+	$$ = (Node *) n;
+}
+| CREATE SERVER IF_P NOT EXISTS name opt_type opt_foreign_server_version
+		 FOREIGN DATA_P WRAPPER name create_generic_options
+{
+	CreateForeignServerStmt *n = makeNode(CreateForeignServerStmt);
+	n->servername = $6;
+	n->servertype = $7;
+	n->version = $8;
+	n->fdwname = $12;
+	n->options = $13;
+	n->if_not_exists = true;
 	$$ = (Node *) n;
 }
 		;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 07a8436..704bc6b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2113,6 +2113,7 @@ typedef struct CreateForeignServerStmt
 	char	   *servertype;		/* optional server type */
 	char	   *version;		/* optional server version */
 	char	   *fdwname;		/* FDW name */
+	bool		if_not_exists;	/* just do nothing if it already exists? */
 	List	   *options;		/* generic options to server */
 } CreateForeignServerStmt;
 
diff --git a/src/test/regress/expected/foreign_data.out b/src/test/regress/expected/foreign_data.out
index 3a9fb8f..17f9f40 100644
--- a/src/test/regress/expected/foreign_data.out
+++ b/src/test/regress/expected/foreign_data.out
@@ -283,7 +283,9 @@ ERROR:  foreign-data wrapper "foo" does not exist
 CREATE FOREIGN DATA WRAPPER foo OPTIONS ("test wrapper" 'true');
 CREATE SERVER s1 FOREIGN DATA WRAPPER foo;
 CREATE SERVER s1 FOREIGN DATA WRAPPER foo;  -- ERROR
-ERROR:  server "s1" already exists
+ERROR:  foreign server "s1" already exists
+CREATE SERVER IF NOT EXISTS s1 FOREIGN DATA WRAPPER foo;	-- No ERROR, just NOTICE
+NOTICE:  foreign server "s1" already exists, skipping
 CREATE SERVER s2 FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b');
 CREATE SERVER s3 TYPE 'oracle' FOREIGN DATA WRAPPER foo;
 CREATE SERVER s4 TYPE 'oracle' FOREIGN DATA WRAPPER foo OPTIONS (host 'a', dbname 'b');
diff --git a/src/test/regress/sql

Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem

2017-01-19 Thread Anastasia Lubennikova

28.12.2016 23:43, Claudio Freire:

Attached v4 patches with the requested fixes.


Sorry for being late, but the tests took a lot of time.

create table t1 as select i, md5(random()::text) from 
generate_series(0,4) as i;
create index md5_idx ON  t1(md5);
update t1 set md5 = md5((random() * (100 + 500))::text);
vacuum;

Patched vacuum used 2.9Gb of memory and vacuumed the index in one pass,
while for old version it took three passes (1GB+1GB+0.9GB).
Vacuum duration results:

vanilla:
LOG: duration: 4359006.327 ms  statement: vacuum verbose t1;
patched:
LOG: duration: 3076827.378 ms  statement: vacuum verbose t1;

We can see 30% vacuum speedup. I should note that this case can be 
considered
as favorable to vanilla vacuum: the table is not that big, it has just 
one index
and disk used is a fast fusionIO. We can expect even more gain on slower 
disks.


Thank you again for the patch. Hope to see it in 10.0.

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] GSoC 2017

2017-01-17 Thread Anastasia Lubennikova

I'm ready to be a mentor.

10.01.2017 12:53, Alexander Korotkov:

Hi all!

In 2016 PostgreSQL project didn't pass to GSoC program.  In my 
understanding the reasons for that are following.


1. We did last-minute submission of our application to GSoC.
2. In 2016 GSoC application form for mentoring organizations has been 
changed.  In particular, it required more detailed information about 
possible project.


As result we didn't manage to make a good enough application that 
time.  Thus, our application was declined. See [1] and [2] for details.


I think that the right way to manage this in 2017 would be to start 
collecting required information in advance. According to GSoC 2017 
timeline [3] mentoring organization can submit their applications from 
January 19 to February 9. Thus, now it's a good time to start 
collecting project ideas and make call for mentors.  Also, we need to 
decide who would be our admin this year.


In sum, we have following questions:
1. What project ideas we have?
2. Who are going to be mentors this year?
3. Who is going to be project admin this year?

BTW, I'm ready to be mentor this year.  I'm also open to be an admin 
if needed.


[1] 
https://www.postgresql.org/message-id/flat/CAA-aLv4p1jfuMpsRaY2jDUQqypkEXUxeb7z8Mp-0mW6M03St7A%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/flat/CALxAEPuGpAjBSN-PTuxHfuLLqDS47BEbO_ZYxUYQR3ud1nwbww%40mail.gmail.com

[3] https://developers.google.com/open-source/gsoc/timeline

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Re: GiST optimizing memmoves in gistplacetopage for fixed-size updates [PoC]

2016-08-16 Thread Anastasia Lubennikova

09.08.2016 19:45, Andrew Borodin:

Here is new version of the patch, now it includes recommendations from
Anastasia Lubennikova.


I've investigated anamalous index size decrease. Most probable version
appeared to be true.
Cube extension, as some others, use Guttman's polynomial time split
algorithm. It is known to generate "needle-like" MBBs (MBRs) for
sorted data due to imbalanced splits (splitting 100 tuples as 98 to
2). Due to previous throw-to-the-end behavior of GiST this imbalance
was further amplificated (most of inserts were going to bigger part
after split). So GiST inserts were extremely slow for sorted data.
There is no need to do exact sorting to trigger this behavior.
This behavior can be fused by implementation of small-m restriction in
split (AFAIR this is described in original R-tree paper from 84),
which prescribes to do a split in a parts no smaller than m, where m
is around 20% of a page capacity (in tuples number). After application
of this patch performance for sorted data is roughly the same as
performance for randomized data.


Thank you for explanation. Now it's clear to me. I did some more testing and
found nothing special. The declared feature is implemented correctly.
It passes all regression tests and improves performance.

I still have a few minor nitpicks about the patch style.
You can find a style guide on 
https://www.postgresql.org/docs/9.6/static/source.html


1) remove extra whitespace in README

2) add whitespace: if(ntup == 1)

3) fix comments in accordance with coding conventions

/* In case of single tuple update GiST calls overwrite
 * In all other cases function gistplacetopage deletes
 * old tuples and place updated at the end
 *  */


+/* Normally here was following assertion
+ * Assert(ItemIdHasStorage(ii));
+ * This assertion was introduced in PageIndexTupleDelete
+ * But if this function will be used from BRIN index
+ * this assertion will fail. Thus, here we do not check that
+ * item has the storage.
+ * */

4) remove unrelated changes
-data += sizeof(OffsetNumber) * xldata->ntodelete;
+data += sizeof(OffsetNumber) *xldata->ntodelete;

5) If the comment is correct, maxalignment is not necessary.
+/* tuples on a page are always maxaligned */
+oldsize = MAXALIGN(oldsize);

6) I'd rather use alignednewsize here.
 +ItemIdSetNormal(tupid, offset + size_diff, newsize);


After the cleanup you can change status to "Ready for Committer"
without waiting for the response.


If data is randomized this effect of Guttman poly-time split is not in
effect; test from the start of the thread will show no statistically
confident improvement by the patch.
To prove performance increase in randomized case I've composed
modified test https://gist.github.com/x4m/856b2e1a5db745f8265c76b9c195f2e1
This test with five runs shows following:
Before patch
Insert Time AVG 78 seconds STD 9.5
Afer patch
Insert Time AVG 68 seconds STD 7.7
This is marginal but statistically viable performance improvement.

For sorted data performance is improved by a factor of 3.
Best regards, Andrey Borodin, Octonica & Ural Federal University.



--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring of heapam code.

2016-09-06 Thread Anastasia Lubennikova

06.09.2016 07:44, Pavan Deolasee:




On Mon, Aug 8, 2016 at 3:13 PM, Anastasia Lubennikova 
<a.lubennik...@postgrespro.ru <mailto:a.lubennik...@postgrespro.ru>> 
wrote:




Thank you for the review, I'll fix these problems in final version.

Posting the first message I intended to raise the discussion. Patches
were attached mainly to illustrate the problem and to be more
concrete.


I started looking at the first patch posted above, but it seems you'll 
rewrite these patches after discussion on the heapam refactoring ideas 
you posted here https://wiki.postgresql.org/wiki/HeapamRefactoring 
<https://wiki.postgresql.org/wiki/HeapamRefactoring> concludes.




Thank you for the review.


Some comments anyways on the first patch:

1. It does not apply cleanly with git-apply - many white space errors


I'll fix all merge issues and attach it to the following message.

2. I don't understand the difference 
between PageGetItemHeapHeaderOnly() and PageGetItemHeap(). They seem 
to do exactly the same thing and can be used interchangeably.


The only difference between these two macros is that
PageGetItemHeapHeaderOnly() doesn't touch any key fields of a tuple,
it only checks header fields (see HeapTupleHeaderData). I divided it into
two separate functions, while I was working on page compression and
I wanted to avoid unnecessary decompression calls. These names are
just a kind of 'markers' to make the code reading and improving easier.

3. While I like the idea of using separate interfaces to get 
heap/index tuple from a page, may be they can internally use a common 
interface instead of duplicating what PageGetItem() does already.


I don't sure I get it right. Do you suggest to leave PageGetItem and write
PageGetItemHeap() and PageGetItemIndex() as wrappers on it?
It sounds reasonable while we have similar layout for both heap and 
index pages.

In any case, it'll be easy to separate them when necessary.

4. Should we rename PageGetItemHeap() to PageGetHeapTuple() or 
something like that?


I don't feel like doing that, because HeapTuple is a different structure.
What we do get from page is a HeapTupleHeaderData structure
followed by user's data.

5. If we do this, we should probably have corresponding wrapper 
functions/macros for remaining callers of PageGetItem()



There are some calls of PageGetItem() left in BRIN and SP-gist indexes,
I can write simple wrappers for them.
I left them just because they were out of interest for my compression 
prototype.



I also looked at the refactoring design doc. Looks like a fine 
approach to me, but the API will need much more elaborate discussions. 
I am not sure if the interfaces as presented are enough to support 
everything that even heapam can do today.


What features of heapam do you think could be unsupportable in this API?
Maybe I've just missed them.

My point here is that heapam is too complicated for many simpler use-cases
read-only tables and append-only tables do not need many MVCC related tricks
like vacuum and complicated locking, tables with fixed len attributes 
can use

more optimal page layout. And so on.

I suggest refactoring, that will allow us to develop new heap-like 
access methods.
For the first version, they must have methods to "heapify" tuple i.e 
turn internal
representation of the data to regular HeapTuple, for example put some 
stubs into
MVCC fields. Of course this approach has its disadvantages, such as 
performance issues.
It definitely won't be enough to write column storage or to implement 
other great

data structures. But it allows us not to depend of the Executor's code.

And much more thoughts will be required to ensure we don't miss out 
things that new primary access method may need.



Do you have any particular ideas of new access methods?



A few points regarding how the proposed API maps to heapam:

- How do we support parallel scans on heap?


As far as I understand, parallel heap scan requires one more API function
heap_beginscan_parallel(). It uses the same API function - heap_getnext(),
but in addition to ordinary seq scan it selects page to read in a 
synchronized manner.



- Does the primary AM need to support locking of rows?


That's a good question. I'd say it should be an option.

- Would there be separate API to interpret the PAM tuple itself? 
(something that htup_details.h does today). It seems natural that 
every PAM will have its own interpretation of tuple structure and we 
would like to hide that inside PAM implementation.


As I already wrote, for the first prototype, I'd use function to "heapify"
tuple and don't care about executor issues at all. More detailed discussion
of this question you can find in another thread [1].

- There are many upper modules that need access to system attributes 
(xmin, xmax for starters). How do you plan to handle that? You think 
we can provide enough abstraction so that the callers don't need to 
know 

  1   2   >