Re: GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)

2014-06-18 Thread Amit Langote
On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 01/22/2014 03:39 AM, Tomas Vondra wrote:

 What annoys me a bit is the huge size difference between the index
 updated incrementally (by a sequence of INSERT commands), and the index
 rebuilt from scratch using VACUUM FULL. It's a bit better with the patch
 (2288 vs. 2035 MB), but is there a chance to improve this?


 Hmm. What seems to be happening is that pending item list pages that the
 fast update mechanism uses are not getting recycled. When enough list pages
 are filled up, they are flushed into the main index and the list pages are
 marked as deleted. But they are not recorded in the FSM, so they won't be
 recycled until the index is vacuumed. Almost all of the difference can be
 attributed to deleted pages left behind like that.

 So this isn't actually related to the packed postinglists patch at all. It
 just makes the bloat more obvious, because it makes the actual size of the
 index size, excluding deleted pages, smaller. But it can be observed on git
 master as well:

 I created a simple test table and index like this:

 create table foo (intarr int[]);
 create index i_foo on foo using gin(intarr) with (fastupdate=on);

 I filled the table like this:

 insert into foo select array[-1] from generate_series(1, 1000) g;

 postgres=# \d+i
List of relations
  Schema | Name | Type  | Owner  |  Size  | Description
 +--+---+++-
  public | foo  | table | heikki | 575 MB |
 (1 row)

 postgres=# \di+
List of relations
  Schema | Name  | Type  | Owner  | Table |  Size  | Description
 +---+---++---++-
  public | i_foo | index | heikki | foo   | 251 MB |
 (1 row)

 I wrote a little utility that scans all pages in a gin index, and prints out
 the flags indicating what kind of a page it is. The distribution looks like
 this:

  19 DATA
7420 DATA LEAF
   24701 DELETED
   1 LEAF
   1 META

 I think we need to add the deleted pages to the FSM more aggressively.

 I tried simply adding calls to RecordFreeIndexPage, after the list pages
 have been marked as deleted, but unfortunately that didn't help. The problem
 is that the FSM is organized into a three-level tree, and
 RecordFreeIndexPage only updates the bottom level. The upper levels are not
 updated until the FSM is vacuumed, so the pages are still not visible to
 GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a
 call to IndexFreeSpaceMapVacuum after flushing the pending list, per
 attached patch. I'm slightly worried about the performance impact of the
 IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which
 isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update
 the upper levels of the FSM in a retail-fashion instead.


I wonder if you pursued this further?

You recently added a number of TODO items related to GIN index; is it
worth adding this to the list?

--
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-24 Thread Alexander Korotkov
On Thu, Jan 23, 2014 at 9:27 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Wed, Jan 22, 2014 at 9:28 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 01/22/2014 02:17 PM, Alexander Korotkov wrote:

 We already spent a lot of time with compression. Now we need to figure
 out
 the result we want see. I spent quite long time debugging varbyte
 encoding
 without segments. Finally, it passed very many tests without any
 problems.
 Now, it is just piece of junk. I'm afraid that we will have to
 reimplement
 everything from scratch another two or three times because code doesn't
 look perfect. For sure, it's incompatible with getting something into
 9.4.


 That's a bit harsh :-). But yes, I understand what you're saying. It's
 quite common for large patches like this to be rewritten several times
 before being committed; you just don't know what works best until you've
 tried many designs.


  Remember we have also subsequent fast-scan which is very needed for
 hstore
 and other application.
 I propose to do final decisions now and concentrate forces on making
 committable patch with these decisions. And don't change these decisions
 even if somebody have idea how to improve code readability in 100 times
 and
 potential extendability in 1000 times.
 I propose following decisions:
 1) Leave uncompressed area, allow duplicates in it
 2) Don't introduce Items on page.


 Well, I got the insertions to work now without the uncompressed area, so
 in the spirit of Just Getting This Damn Patch Committed, I'm going to go
 ahead with that. We can add the uncompressed area later if performance
 testing shows it to be necessary. And I agree about the Items on page idea
 - we can come back to that too still in 9.4 timeframe if necessary, but
 probably not.

 So, committed. It's the same design as in the most recent patch I posted,
 but with a bunch of bugs fixed, and cleaned up all over. I'm going to move
 to the fast scan patch now, but please do test and review the committed
 version to see if I missed something.


 Great! Thanks a lot!
 Assertion in dataPlaceToPageLeafRecompress is too strong. Page can contain
 GinDataLeafMaxContentSize bytes. Patch is attached.
 My test-suite don't run correctly. I'm debugging now.


ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
segments. Others are not even re-palloced. They are moved left
in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
memcpy, proper solution is memmove. Using memcpy platform dependently can
lead to page corruption. Another solution is to re-palloc segments in
ginVacuumPostingTreeLeaf.

--
With best regards,
Alexander Korotkov.


gin-memmove-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-24 Thread Heikki Linnakangas

On 01/24/2014 10:03 AM, Alexander Korotkov wrote:

ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
segments. Others are not even re-palloced. They are moved left
in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
memcpy, proper solution is memmove. Using memcpy platform dependently can
lead to page corruption. Another solution is to re-palloc segments in
ginVacuumPostingTreeLeaf.


Good catch. Thanks, committed, changing memcpy to memmove. Will have to 
switch to pallocing everything in the future, if we make leafRepackItems 
smarter, so that it doesn't rewrite all the segments after the first 
modified one.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-24 Thread Alexander Korotkov
On Fri, Jan 24, 2014 at 12:50 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/24/2014 10:03 AM, Alexander Korotkov wrote:

 ITSM I found this bug. ginVacuumPostingTreeLeaf re-encodes only some
 segments. Others are not even re-palloced. They are moved left
 in dataPlaceToPageLeafRecompress by memcpy. But it's incorrect to with
 memcpy, proper solution is memmove. Using memcpy platform dependently can
 lead to page corruption. Another solution is to re-palloc segments in
 ginVacuumPostingTreeLeaf.


 Good catch. Thanks, committed, changing memcpy to memmove. Will have to
 switch to pallocing everything in the future, if we make leafRepackItems
 smarter, so that it doesn't rewrite all the segments after the first
 modified one.


OK. What about previous fix in assert?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-24 Thread Heikki Linnakangas

On 01/24/2014 10:53 AM, Alexander Korotkov wrote:

OK. What about previous fix in assert?


Ah right, fixed that too now.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-24 Thread Alexander Korotkov
On Fri, Jan 24, 2014 at 1:19 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/24/2014 10:53 AM, Alexander Korotkov wrote:

 OK. What about previous fix in assert?


 Ah right, fixed that too now.


Good, now my test-suite passed. Results are so.

Time of operations
 event | period
---+-
 index_build   | 00:01:45.709546
 index_build_recovery  | 00:00:09
 index_update  | 00:05:45.732214
 index_update_recovery | 00:01:17
 search_new| 00:24:02.191049
 search_updated| 00:26:54.122852
(6 rows)

Index sizes
 label |   size
---+---
 new   | 387547136
 after_updates | 610877440
(2 rows)

Before compression results were following.

Time of operations
 event | period
---+-
 index_build   | 00:01:51.116722
 index_build_recovery  | 00:00:08
 index_update  | 00:05:12.124758
 index_update_recovery | 00:01:43
 search_new| 00:23:44.281424
 search_updated| 00:25:41.96251
(6 rows)

Index sizes
 label |size
---+
 new   |  884514816
 after_updates | 1595252736
(2 rows)

So, we can see little regression. However, I'm not sure if it's very
significant.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-22 Thread Heikki Linnakangas

On 01/22/2014 09:25 AM, Alexander Korotkov wrote:

On Wed, Jan 22, 2014 at 1:21 AM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



On 01/21/2014 11:35 AM, Heikki Linnakangas wrote:


Oh, I see what's going on. I had assumed that there cannot be duplicate
insertions into the posting tree, but that's dead wrong. The fast
insertion mechanism depends on a duplicate insertion to do nothing.


Ok, this turned out to be a much bigger change than I thought...

In principle, it's not difficult to eliminate duplicates. However, to
detect a duplicate, you have to check if the item is present in the
uncompressed items array, or in the compressed lists. That requires
decoding the segment where it should be.

But if we decode the segment, what's the purpose of the uncompressed items
array? Its purpose was to speed up insertions, by buffering them so that we
don't need to decode and re-encode the page/segment on every inserted item.
But if we're paying the price of decoding it anyway, we might as well
include the new item and re-encode the segment. The uncompressed area saves
some effort in WAL-logging, as the record of inserting an entry into the
uncompressed area is much smaller than that of re-encoding part of the
page, but if that really is a concern, we could track more carefully which
parts of the page is modified, and only WAL-log the required parts. And
hopefully, the fast-update lists buffer inserts so that you insert many
items at a time to the posting tree, and the price of re-encoding is only
paid once.

So, now that I think about this once more, I don't think the uncompressed
area in every leaf page is a good idea.


I didn't get why insertion of duplicate TIDs to uncompressed area and
eliminate them of re-encoding? It would be somewhat more work during
updates, more work during scan, more WAL records. But is it really
significant for real-life workloads?


Hmm, so you would merrily insert duplicate TIDs into the uncompressed 
area, and weed them out when reading or recompressing the page? I had 
not thought of that. Yeah, it might be a good trade-off, duplicates are 
not expected to happen very often.



I refactored the way the recompression routine works again. It is now more

clearly a multi-step process. First, the existing page is disassembled
into an in-memory struct, which is a linked list of all the segments.
In-memory each segment can be represented as an array of item pointers, or
in the compressed format. In the next phase, all the new items are added to
the segments where they belong. This naturally can lead to overly large
segments; in the third phase, the items are redistributed among the
segments, and compressed back to the compressed format. Finally, the
finished segments are written back to the page, or pages if it had to be
split.

The same subroutines to disassemble and recompress are also used in vacuum.

Attached is what I've got now. This is again quite heavily-changed from
the previous version - sorry for repeatedly rewriting this. I'll continue
testing and re-reviewing this tomorrow.



Let's clarify where we are. We had quite debugged and tested version with
hard-wired varbyte encoding. Now we're reimplementing and debugging
segmented version of varbyte encoding in a hope that one day we can easily
replace it with something better that meets our restrictions (but we didn't
find it already). Is it right?


The segmentation was a sensible change on code-readability grounds 
alone. Yes, it made it easier to experiment with different encodings, 
and will make it easier to replace the encoding in the future, but that 
wasn't the only reason for doing it. It's nice to have the 
encoding/decoding stuff in ginpostinglists.c, so that the rest of the 
code just passes around opaque GinPostingList structs (previously known 
as PostingListSegment).


One thing I have been pondering, though: Instead of storing the posting 
lists one after each other on the leaf page, it might be better to store 
them as Items on the page, with normal ItemIds pointing to each. So the 
page layout would be more standard, and you could use 
PageAddItem/PageIndexMultiDelete to add/remove posting lists to page. 
The immediate advantage of that would be that it would make it possible 
to do a binary search of the segments, to quickly locate the right 
segment where a tuple belongs to. That might not be significant in 
practice - linearly scanning 32 items is not slow either. And it would 
add some overhead, one line pointer per segment, 4 * 32 = 128 bytes per 
page with the current segment size of 256 bytes. But then again, it 
might be a good idea just because it would make the pages look more like 
any other page, which is generally a good thing.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)

2014-01-22 Thread Heikki Linnakangas

On 01/22/2014 03:39 AM, Tomas Vondra wrote:

What annoys me a bit is the huge size difference between the index
updated incrementally (by a sequence of INSERT commands), and the index
rebuilt from scratch using VACUUM FULL. It's a bit better with the patch
(2288 vs. 2035 MB), but is there a chance to improve this?


Hmm. What seems to be happening is that pending item list pages that the 
fast update mechanism uses are not getting recycled. When enough list 
pages are filled up, they are flushed into the main index and the list 
pages are marked as deleted. But they are not recorded in the FSM, so 
they won't be recycled until the index is vacuumed. Almost all of the 
difference can be attributed to deleted pages left behind like that.


So this isn't actually related to the packed postinglists patch at all. 
It just makes the bloat more obvious, because it makes the actual size 
of the index size, excluding deleted pages, smaller. But it can be 
observed on git master as well:


I created a simple test table and index like this:

create table foo (intarr int[]);
create index i_foo on foo using gin(intarr) with (fastupdate=on);

I filled the table like this:

insert into foo select array[-1] from generate_series(1, 1000) g;

postgres=# \d+i
   List of relations
 Schema | Name | Type  | Owner  |  Size  | Description
+--+---+++-
 public | foo  | table | heikki | 575 MB |
(1 row)

postgres=# \di+
   List of relations
 Schema | Name  | Type  | Owner  | Table |  Size  | Description
+---+---++---++-
 public | i_foo | index | heikki | foo   | 251 MB |
(1 row)

I wrote a little utility that scans all pages in a gin index, and prints 
out the flags indicating what kind of a page it is. The distribution 
looks like this:


 19 DATA
   7420 DATA LEAF
  24701 DELETED
  1 LEAF
  1 META

I think we need to add the deleted pages to the FSM more aggressively.

I tried simply adding calls to RecordFreeIndexPage, after the list pages 
have been marked as deleted, but unfortunately that didn't help. The 
problem is that the FSM is organized into a three-level tree, and 
RecordFreeIndexPage only updates the bottom level. The upper levels are 
not updated until the FSM is vacuumed, so the pages are still not 
visible to GetFreeIndexPage calls until next vacuum. The simplest fix 
would be to add a call to IndexFreeSpaceMapVacuum after flushing the 
pending list, per attached patch. I'm slightly worried about the 
performance impact of the IndexFreeSpaceMapVacuum() call. It scans the 
whole FSM of the index, which isn't exactly free. So perhaps we should 
teach RecordFreeIndexPage to update the upper levels of the FSM in a 
retail-fashion instead.


- Heikki
*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 21,26 
--- 21,27 
  #include access/gin_private.h
  #include commands/vacuum.h
  #include miscadmin.h
+ #include storage/indexfsm.h
  #include utils/memutils.h
  #include utils/rel.h
  
***
*** 434,440  ginHeapTupleFastInsert(GinState *ginstate, GinTupleCollector *collector)
--- 435,453 
  	END_CRIT_SECTION();
  
  	if (needCleanup)
+ 	{
  		ginInsertCleanup(ginstate, false, NULL);
+ 
+ 		/*
+ 		 * Vacuum the FSM to make the deleted list pages available for re-use.
+ 		 *
+ 		 * gininsertCleanup marked them as free in the FSM by calling
+ 		 * RecordFreeIndexPage, but that only updates the bottom FSM level.
+ 		 * Since GetFreeIndexPage scans from the top, they won't be visible
+ 		 * for reuse until the upper levels have been updated.
+ 		 */
+ 		IndexFreeSpaceMapVacuum(index);
+ 	}
  }
  
  /*
***
*** 599,608  shiftList(Relation index, Buffer metabuffer, BlockNumber newHead,
  			}
  		}
  
  		for (i = 0; i  data.ndeleted; i++)
  			UnlockReleaseBuffer(buffers[i]);
! 
! 		END_CRIT_SECTION();
  	} while (blknoToDelete != newHead);
  
  	return false;
--- 612,625 
  			}
  		}
  
+ 		END_CRIT_SECTION();
+ 
  		for (i = 0; i  data.ndeleted; i++)
+ 		{
+ 			BlockNumber blkno = BufferGetBlockNumber(buffers[i]);
  			UnlockReleaseBuffer(buffers[i]);
! 			RecordFreeIndexPage(index, blkno);
! 		}
  	} while (blknoToDelete != newHead);
  
  	return false;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-22 Thread Alexander Korotkov
On Wed, Jan 22, 2014 at 12:30 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/22/2014 09:25 AM, Alexander Korotkov wrote:

 On Wed, Jan 22, 2014 at 1:21 AM, Heikki Linnakangas 
 hlinnakan...@vmware.com

 wrote:


  On 01/21/2014 11:35 AM, Heikki Linnakangas wrote:

  Oh, I see what's going on. I had assumed that there cannot be duplicate
 insertions into the posting tree, but that's dead wrong. The fast
 insertion mechanism depends on a duplicate insertion to do nothing.


 Ok, this turned out to be a much bigger change than I thought...

 In principle, it's not difficult to eliminate duplicates. However, to
 detect a duplicate, you have to check if the item is present in the
 uncompressed items array, or in the compressed lists. That requires
 decoding the segment where it should be.

 But if we decode the segment, what's the purpose of the uncompressed
 items
 array? Its purpose was to speed up insertions, by buffering them so that
 we
 don't need to decode and re-encode the page/segment on every inserted
 item.
 But if we're paying the price of decoding it anyway, we might as well
 include the new item and re-encode the segment. The uncompressed area
 saves
 some effort in WAL-logging, as the record of inserting an entry into the
 uncompressed area is much smaller than that of re-encoding part of the
 page, but if that really is a concern, we could track more carefully
 which
 parts of the page is modified, and only WAL-log the required parts. And
 hopefully, the fast-update lists buffer inserts so that you insert many
 items at a time to the posting tree, and the price of re-encoding is only
 paid once.

 So, now that I think about this once more, I don't think the uncompressed
 area in every leaf page is a good idea.


 I didn't get why insertion of duplicate TIDs to uncompressed area and
 eliminate them of re-encoding? It would be somewhat more work during
 updates, more work during scan, more WAL records. But is it really
 significant for real-life workloads?


 Hmm, so you would merrily insert duplicate TIDs into the uncompressed
 area, and weed them out when reading or recompressing the page? I had not
 thought of that. Yeah, it might be a good trade-off, duplicates are not
 expected to happen very often.


  I refactored the way the recompression routine works again. It is now more

 clearly a multi-step process. First, the existing page is disassembled
 into an in-memory struct, which is a linked list of all the segments.
 In-memory each segment can be represented as an array of item pointers,
 or
 in the compressed format. In the next phase, all the new items are added
 to
 the segments where they belong. This naturally can lead to overly large
 segments; in the third phase, the items are redistributed among the
 segments, and compressed back to the compressed format. Finally, the
 finished segments are written back to the page, or pages if it had to be
 split.

 The same subroutines to disassemble and recompress are also used in
 vacuum.

 Attached is what I've got now. This is again quite heavily-changed from
 the previous version - sorry for repeatedly rewriting this. I'll continue
 testing and re-reviewing this tomorrow.



 Let's clarify where we are. We had quite debugged and tested version with
 hard-wired varbyte encoding. Now we're reimplementing and debugging
 segmented version of varbyte encoding in a hope that one day we can easily
 replace it with something better that meets our restrictions (but we
 didn't
 find it already). Is it right?


 The segmentation was a sensible change on code-readability grounds alone.
 Yes, it made it easier to experiment with different encodings, and will
 make it easier to replace the encoding in the future, but that wasn't the
 only reason for doing it. It's nice to have the encoding/decoding stuff in
 ginpostinglists.c, so that the rest of the code just passes around opaque
 GinPostingList structs (previously known as PostingListSegment).

 One thing I have been pondering, though: Instead of storing the posting
 lists one after each other on the leaf page, it might be better to store
 them as Items on the page, with normal ItemIds pointing to each. So the
 page layout would be more standard, and you could use 
 PageAddItem/PageIndexMultiDelete
 to add/remove posting lists to page. The immediate advantage of that would
 be that it would make it possible to do a binary search of the segments, to
 quickly locate the right segment where a tuple belongs to. That might not
 be significant in practice - linearly scanning 32 items is not slow either.
 And it would add some overhead, one line pointer per segment, 4 * 32 = 128
 bytes per page with the current segment size of 256 bytes. But then again,
 it might be a good idea just because it would make the pages look more like
 any other page, which is generally a good thing.


We already spent a lot of time with compression. Now we need to figure out
the result we want see. I spent quite 

Re: GIN pending list pages not recycled promptly (was Re: [HACKERS] GIN improvements part 1: additional information)

2014-01-22 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 I wrote a little utility that scans all pages in a gin index, and
 prints out the flags indicating what kind of a page it is. The
 distribution looks like this:
 
  19 DATA
7420 DATA LEAF
   24701 DELETED
   1 LEAF
   1 META

Hah.

 I think we need to add the deleted pages to the FSM more aggressively.

 I tried simply adding calls to RecordFreeIndexPage, after the list
 pages have been marked as deleted, but unfortunately that didn't
 help. The problem is that the FSM is organized into a three-level
 tree, and RecordFreeIndexPage only updates the bottom level.

Interesting.  I think the idea of having an option for RecordFreeIndexPage
to update upper levels makes sense (no need to force it for other
users.)

Some time ago I proposed an index-only cleanup for vacuum.  That would
help GIN get this kind of treatment (vacuuming its FSM and processing
the pending list) separately from vacuuming the index.  It's probably
too late for 9.4 though.

One other thing worth considering in this area is that making the
pending list size depend on work_mem appears to have been a really bad
idea.  I know one case where the server is really large and seems to run
mostly OLAP type stuff with occasional updates, so they globally set
work_mem=2GB; they have GIN indexes for text search, and the result is
horrible performance 90% of the time, then a vacuum cleans the pending
list and it is blazing fast until the pending list starts getting big
again.  Now you can argue that setting work_mem to that value is a bad
idea, but as it turns out, in this case other than the GIN pending list
it seems to work fine.

Not related to the patch at hand, but I thought I would out it for
consideration, 'cause I'm not gonna start a new thread about it.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-22 Thread Heikki Linnakangas

On 01/22/2014 02:17 PM, Alexander Korotkov wrote:

We already spent a lot of time with compression. Now we need to figure out
the result we want see. I spent quite long time debugging varbyte encoding
without segments. Finally, it passed very many tests without any problems.
Now, it is just piece of junk. I'm afraid that we will have to reimplement
everything from scratch another two or three times because code doesn't
look perfect. For sure, it's incompatible with getting something into 9.4.


That's a bit harsh :-). But yes, I understand what you're saying. It's 
quite common for large patches like this to be rewritten several times 
before being committed; you just don't know what works best until you've 
tried many designs.



Remember we have also subsequent fast-scan which is very needed for hstore
and other application.
I propose to do final decisions now and concentrate forces on making
committable patch with these decisions. And don't change these decisions
even if somebody have idea how to improve code readability in 100 times and
potential extendability in 1000 times.
I propose following decisions:
1) Leave uncompressed area, allow duplicates in it
2) Don't introduce Items on page.


Well, I got the insertions to work now without the uncompressed area, so 
in the spirit of Just Getting This Damn Patch Committed, I'm going to go 
ahead with that. We can add the uncompressed area later if performance 
testing shows it to be necessary. And I agree about the Items on page 
idea - we can come back to that too still in 9.4 timeframe if necessary, 
but probably not.


So, committed. It's the same design as in the most recent patch I 
posted, but with a bunch of bugs fixed, and cleaned up all over. I'm 
going to move to the fast scan patch now, but please do test and review 
the committed version to see if I missed something.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-22 Thread Alexander Korotkov
On Wed, Jan 22, 2014 at 9:28 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/22/2014 02:17 PM, Alexander Korotkov wrote:

 We already spent a lot of time with compression. Now we need to figure out
 the result we want see. I spent quite long time debugging varbyte encoding
 without segments. Finally, it passed very many tests without any problems.
 Now, it is just piece of junk. I'm afraid that we will have to reimplement
 everything from scratch another two or three times because code doesn't
 look perfect. For sure, it's incompatible with getting something into 9.4.


 That's a bit harsh :-). But yes, I understand what you're saying. It's
 quite common for large patches like this to be rewritten several times
 before being committed; you just don't know what works best until you've
 tried many designs.


  Remember we have also subsequent fast-scan which is very needed for hstore
 and other application.
 I propose to do final decisions now and concentrate forces on making
 committable patch with these decisions. And don't change these decisions
 even if somebody have idea how to improve code readability in 100 times
 and
 potential extendability in 1000 times.
 I propose following decisions:
 1) Leave uncompressed area, allow duplicates in it
 2) Don't introduce Items on page.


 Well, I got the insertions to work now without the uncompressed area, so
 in the spirit of Just Getting This Damn Patch Committed, I'm going to go
 ahead with that. We can add the uncompressed area later if performance
 testing shows it to be necessary. And I agree about the Items on page idea
 - we can come back to that too still in 9.4 timeframe if necessary, but
 probably not.

 So, committed. It's the same design as in the most recent patch I posted,
 but with a bunch of bugs fixed, and cleaned up all over. I'm going to move
 to the fast scan patch now, but please do test and review the committed
 version to see if I missed something.


Great! Thanks a lot!
Assertion in dataPlaceToPageLeafRecompress is too strong. Page can contain
GinDataLeafMaxContentSize bytes. Patch is attached.
My test-suite don't run correctly. I'm debugging now.

--
With best regards,
Alexander Korotkov.


gin-assert-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Heikki Linnakangas

On 01/21/2014 04:02 AM, Tomas Vondra wrote:

On 20.1.2014 19:30, Heikki Linnakangas wrote:


Attached is a yet another version, with more bugs fixed and more
comments added and updated. I would appreciate some heavy-testing of
this patch now. If you could re-run the tests you've been using,
that could be great. I've tested the WAL replay by replicating GIN
operations over streaming replication. That doesn't guarantee it's
correct, but it's a good smoke test.


I gave it a try - the OOM error seems to be gone, but now get this

PANIC:  cannot insert duplicate items to GIN index page

This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.

Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):

DO LANGUAGE plpgsql $$
DECLARE
 r tsvector;
BEGIN
 FOR r IN SELECT body_tsvector FROM data_table LOOP
 INSERT INTO idx_table (body_tsvector) VALUES (r);
 END LOOP;
END$$;

where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.


Oh, I see what's going on. I had assumed that there cannot be duplicate 
insertions into the posting tree, but that's dead wrong. The fast 
insertion mechanism depends on a duplicate insertion to do nothing.


Will fix, thanks for the testing!

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Alexander Korotkov
On Mon, Jan 20, 2014 at 10:30 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/17/2014 08:49 PM, Alexander Korotkov wrote:

 On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

  Seems to be fixed in the attached version of patch.
 Another improvement in this version: only left-most segments and further
 are re-encoded. Left part of page are left untouched.


 I'm looking into this now. A few quick notes:

 * Even when you don't re-encode the whole page, you still WAL-log the
 whole page. While correct, it'd be a pretty obvious optimization to only
 WAL-log the modified part.


 Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
 appropriate place fro data but didn't notice that I still wrote whole page
 to xlog record.


 Yeah. I think ginRedoInsertData was too cute to be bug-free. I added an
 explicit unmodifiedsize field to the WAL record, so that ginRedoInsertData
 doesn't need to calculate it.


  * When new items are appended to the end of the page so that only the last
 existing compressed segment is re-encoded, and the page has to be split,
 the items aren't divided 50/50 on the pages. The loop that moves segments
 destined for the left page to the right won't move any existing,
 untouched,
 segments.


 I think this loop will move one segment. However, it's too small.


 Implemented this.

 I noticed that the gin vacuum redo routine is dead code, except for the
 data-leaf page handling, because we never remove entries or internal nodes
 (page deletion is a separate wal record type). And the data-leaf case is
 functionally equivalent to heap newpage records. I removed the dead code
 and made it more clear that it resembles heap newpage.

 Attached is a yet another version, with more bugs fixed and more comments
 added and updated. I would appreciate some heavy-testing of this patch now.
 If you could re-run the tests you've been using, that could be great. I've
 tested the WAL replay by replicating GIN operations over streaming
 replication. That doesn't guarantee it's correct, but it's a good smoke
 test.


I tried my test-suite but it hangs on index scan with infinite loop. I
re-tried it on my laptop with -O0. I found it to crash on update and vacuum
in some random places like:
Assert(GinPageIsData(page)); in xlogVacuumPage
Assert(ndecoded == totalpacked); in ginCompressPostingList
Trying to debug it.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Alexander Korotkov
On Tue, Jan 21, 2014 at 4:28 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 I noticed that the gin vacuum redo routine is dead code, except for the
 data-leaf page handling, because we never remove entries or internal nodes
 (page deletion is a separate wal record type). And the data-leaf case is
 functionally equivalent to heap newpage records. I removed the dead code
 and made it more clear that it resembles heap newpage.

 Attached is a yet another version, with more bugs fixed and more comments
 added and updated. I would appreciate some heavy-testing of this patch now.
 If you could re-run the tests you've been using, that could be great. I've
 tested the WAL replay by replicating GIN operations over streaming
 replication. That doesn't guarantee it's correct, but it's a good smoke
 test.


 I tried my test-suite but it hangs on index scan with infinite loop. I
 re-tried it on my laptop with -O0. I found it to crash on update and vacuum
 in some random places like:
 Assert(GinPageIsData(page)); in xlogVacuumPage
 Assert(ndecoded == totalpacked); in ginCompressPostingList
 Trying to debug it.


Another question is about dataPlaceToPageLeaf:

while ((Pointer) seg  segend)
{
if (ginCompareItemPointers(minNewItem, seg-first)  0)
break;

Shouldn't we adjust seg to previous segment? If minNewItem is less than
seg-first we should insert it to previous segment.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Heikki Linnakangas

On 01/21/2014 11:35 AM, Heikki Linnakangas wrote:

On 01/21/2014 04:02 AM, Tomas Vondra wrote:

On 20.1.2014 19:30, Heikki Linnakangas wrote:


Attached is a yet another version, with more bugs fixed and more
comments added and updated. I would appreciate some heavy-testing of
this patch now. If you could re-run the tests you've been using,
that could be great. I've tested the WAL replay by replicating GIN
operations over streaming replication. That doesn't guarantee it's
correct, but it's a good smoke test.


I gave it a try - the OOM error seems to be gone, but now get this

 PANIC:  cannot insert duplicate items to GIN index page

This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.

Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):

DO LANGUAGE plpgsql $$
DECLARE
  r tsvector;
BEGIN
  FOR r IN SELECT body_tsvector FROM data_table LOOP
  INSERT INTO idx_table (body_tsvector) VALUES (r);
  END LOOP;
END$$;

where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.


Oh, I see what's going on. I had assumed that there cannot be duplicate
insertions into the posting tree, but that's dead wrong. The fast
insertion mechanism depends on a duplicate insertion to do nothing.


Ok, this turned out to be a much bigger change than I thought...

In principle, it's not difficult to eliminate duplicates. However, to 
detect a duplicate, you have to check if the item is present in the 
uncompressed items array, or in the compressed lists. That requires 
decoding the segment where it should be.


But if we decode the segment, what's the purpose of the uncompressed 
items array? Its purpose was to speed up insertions, by buffering them 
so that we don't need to decode and re-encode the page/segment on every 
inserted item. But if we're paying the price of decoding it anyway, we 
might as well include the new item and re-encode the segment. The 
uncompressed area saves some effort in WAL-logging, as the record of 
inserting an entry into the uncompressed area is much smaller than that 
of re-encoding part of the page, but if that really is a concern, we 
could track more carefully which parts of the page is modified, and only 
WAL-log the required parts. And hopefully, the fast-update lists buffer 
inserts so that you insert many items at a time to the posting tree, and 
the price of re-encoding is only paid once.


So, now that I think about this once more, I don't think the 
uncompressed area in every leaf page is a good idea.


I refactored the way the recompression routine works again. It is now 
more clearly a multi-step process. First, the existing page is 
disassembled into an in-memory struct, which is a linked list of all 
the segments. In-memory each segment can be represented as an array of 
item pointers, or in the compressed format. In the next phase, all the 
new items are added to the segments where they belong. This naturally 
can lead to overly large segments; in the third phase, the items are 
redistributed among the segments, and compressed back to the compressed 
format. Finally, the finished segments are written back to the page, or 
pages if it had to be split.


The same subroutines to disassemble and recompress are also used in vacuum.

Attached is what I've got now. This is again quite heavily-changed from 
the previous version - sorry for repeatedly rewriting this. I'll 
continue testing and re-reviewing this tomorrow.


- Heikki


gin-packed-postinglists-20140121.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Tomas Vondra
On 21.1.2014 22:21, Heikki Linnakangas wrote:
 On 01/21/2014 11:35 AM, Heikki Linnakangas wrote:
 On 01/21/2014 04:02 AM, Tomas Vondra wrote:
 On 20.1.2014 19:30, Heikki Linnakangas wrote:

 Attached is a yet another version, with more bugs fixed and more
 comments added and updated. I would appreciate some heavy-testing of
 this patch now. If you could re-run the tests you've been using,
 that could be great. I've tested the WAL replay by replicating GIN
 operations over streaming replication. That doesn't guarantee it's
 correct, but it's a good smoke test.

 I gave it a try - the OOM error seems to be gone, but now get this

  PANIC:  cannot insert duplicate items to GIN index page

 This only happens when building the index incrementally (i.e. using a
 sequence of INSERT statements into a table with GIN index). When I
 create a new index on a table (already containing the same dataset) it
 works just fine.

 Also, I tried to reproduce the issue by running a simple plpgsql loop
 (instead of a complex python script):

 DO LANGUAGE plpgsql $$
 DECLARE
   r tsvector;
 BEGIN
   FOR r IN SELECT body_tsvector FROM data_table LOOP
   INSERT INTO idx_table (body_tsvector) VALUES (r);
   END LOOP;
 END$$;

 where data_table is the table with imported data (the same data I
 mentioned in the post about OOM errors), and index_table is an empty
 table with a GIN index. And indeed it fails, but only if I run the block
 in multiple sessions in parallel.

 Oh, I see what's going on. I had assumed that there cannot be duplicate
 insertions into the posting tree, but that's dead wrong. The fast
 insertion mechanism depends on a duplicate insertion to do nothing.
 
 Ok, this turned out to be a much bigger change than I thought...
 
 In principle, it's not difficult to eliminate duplicates. However, to
 detect a duplicate, you have to check if the item is present in the
 uncompressed items array, or in the compressed lists. That requires
 decoding the segment where it should be.
 
 But if we decode the segment, what's the purpose of the uncompressed
 items array? Its purpose was to speed up insertions, by buffering them
 so that we don't need to decode and re-encode the page/segment on every
 inserted item. But if we're paying the price of decoding it anyway, we
 might as well include the new item and re-encode the segment. The
 uncompressed area saves some effort in WAL-logging, as the record of
 inserting an entry into the uncompressed area is much smaller than that
 of re-encoding part of the page, but if that really is a concern, we
 could track more carefully which parts of the page is modified, and only
 WAL-log the required parts. And hopefully, the fast-update lists buffer
 inserts so that you insert many items at a time to the posting tree, and
 the price of re-encoding is only paid once.
 
 So, now that I think about this once more, I don't think the
 uncompressed area in every leaf page is a good idea.
 
 I refactored the way the recompression routine works again. It is now
 more clearly a multi-step process. First, the existing page is
 disassembled into an in-memory struct, which is a linked list of all
 the segments. In-memory each segment can be represented as an array of
 item pointers, or in the compressed format. In the next phase, all the
 new items are added to the segments where they belong. This naturally
 can lead to overly large segments; in the third phase, the items are
 redistributed among the segments, and compressed back to the compressed
 format. Finally, the finished segments are written back to the page, or
 pages if it had to be split.
 
 The same subroutines to disassemble and recompress are also used in vacuum.
 
 Attached is what I've got now. This is again quite heavily-changed from
 the previous version - sorry for repeatedly rewriting this. I'll
 continue testing and re-reviewing this tomorrow.

I've repeated the tests, and it actually finishes OK with this patch.
The incremental load works OK, does not fail because of OOM or PANIC
errors, or any other issues. Subsequent querying seems to work fine too.

The sizes before / after applying the patch look like this:

  object  | raw size (old) | raw size (new) |   diff
--|--
  table   |   6093 |   6093 |  00.0%
  index A |   2288 |   2035 | -11.0%
  index B | 96 | 96 |  00.0%

and after running VACUUM FULL

  object  | vacuumed (old) | vacuumed (new) |   diff
--|--
  table   |   6416 |   6416 |  00.0%
  index A |678 |415 | -38.8%
  index B | 42 | 20 | -52.4%

There results are slightly different than on my previous runs - for
example in my post from 2013/10/12 I've posted this (after vacuuming)

   |   HEAD   |  patched |

subject 

Re: [HACKERS] GIN improvements part 1: additional information

2014-01-21 Thread Alexander Korotkov
On Wed, Jan 22, 2014 at 1:21 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 01/21/2014 11:35 AM, Heikki Linnakangas wrote:

 On 01/21/2014 04:02 AM, Tomas Vondra wrote:

 On 20.1.2014 19:30, Heikki Linnakangas wrote:


 Attached is a yet another version, with more bugs fixed and more
 comments added and updated. I would appreciate some heavy-testing of
 this patch now. If you could re-run the tests you've been using,
 that could be great. I've tested the WAL replay by replicating GIN
 operations over streaming replication. That doesn't guarantee it's
 correct, but it's a good smoke test.


 I gave it a try - the OOM error seems to be gone, but now get this

  PANIC:  cannot insert duplicate items to GIN index page

 This only happens when building the index incrementally (i.e. using a
 sequence of INSERT statements into a table with GIN index). When I
 create a new index on a table (already containing the same dataset) it
 works just fine.

 Also, I tried to reproduce the issue by running a simple plpgsql loop
 (instead of a complex python script):

 DO LANGUAGE plpgsql $$
 DECLARE
   r tsvector;
 BEGIN
   FOR r IN SELECT body_tsvector FROM data_table LOOP
   INSERT INTO idx_table (body_tsvector) VALUES (r);
   END LOOP;
 END$$;

 where data_table is the table with imported data (the same data I
 mentioned in the post about OOM errors), and index_table is an empty
 table with a GIN index. And indeed it fails, but only if I run the block
 in multiple sessions in parallel.


 Oh, I see what's going on. I had assumed that there cannot be duplicate
 insertions into the posting tree, but that's dead wrong. The fast
 insertion mechanism depends on a duplicate insertion to do nothing.


 Ok, this turned out to be a much bigger change than I thought...

 In principle, it's not difficult to eliminate duplicates. However, to
 detect a duplicate, you have to check if the item is present in the
 uncompressed items array, or in the compressed lists. That requires
 decoding the segment where it should be.

 But if we decode the segment, what's the purpose of the uncompressed items
 array? Its purpose was to speed up insertions, by buffering them so that we
 don't need to decode and re-encode the page/segment on every inserted item.
 But if we're paying the price of decoding it anyway, we might as well
 include the new item and re-encode the segment. The uncompressed area saves
 some effort in WAL-logging, as the record of inserting an entry into the
 uncompressed area is much smaller than that of re-encoding part of the
 page, but if that really is a concern, we could track more carefully which
 parts of the page is modified, and only WAL-log the required parts. And
 hopefully, the fast-update lists buffer inserts so that you insert many
 items at a time to the posting tree, and the price of re-encoding is only
 paid once.

 So, now that I think about this once more, I don't think the uncompressed
 area in every leaf page is a good idea.


I didn't get why insertion of duplicate TIDs to uncompressed area and
eliminate them of re-encoding? It would be somewhat more work during
updates, more work during scan, more WAL records. But is it really
significant for real-life workloads?

I refactored the way the recompression routine works again. It is now more
 clearly a multi-step process. First, the existing page is disassembled
 into an in-memory struct, which is a linked list of all the segments.
 In-memory each segment can be represented as an array of item pointers, or
 in the compressed format. In the next phase, all the new items are added to
 the segments where they belong. This naturally can lead to overly large
 segments; in the third phase, the items are redistributed among the
 segments, and compressed back to the compressed format. Finally, the
 finished segments are written back to the page, or pages if it had to be
 split.

 The same subroutines to disassemble and recompress are also used in vacuum.

 Attached is what I've got now. This is again quite heavily-changed from
 the previous version - sorry for repeatedly rewriting this. I'll continue
 testing and re-reviewing this tomorrow.


Let's clarify where we are. We had quite debugged and tested version with
hard-wired varbyte encoding. Now we're reimplementing and debugging
segmented version of varbyte encoding in a hope that one day we can easily
replace it with something better that meets our restrictions (but we didn't
find it already). Is it right?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-20 Thread Heikki Linnakangas

On 01/17/2014 08:49 PM, Alexander Korotkov wrote:

On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


On 01/17/2014 01:05 PM, Alexander Korotkov wrote:


Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.


I'm looking into this now. A few quick notes:

* Even when you don't re-encode the whole page, you still WAL-log the
whole page. While correct, it'd be a pretty obvious optimization to only
WAL-log the modified part.


Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
appropriate place fro data but didn't notice that I still wrote whole page
to xlog record.


Yeah. I think ginRedoInsertData was too cute to be bug-free. I added an 
explicit unmodifiedsize field to the WAL record, so that 
ginRedoInsertData doesn't need to calculate it.



* When new items are appended to the end of the page so that only the last
existing compressed segment is re-encoded, and the page has to be split,
the items aren't divided 50/50 on the pages. The loop that moves segments
destined for the left page to the right won't move any existing, untouched,
segments.


I think this loop will move one segment. However, it's too small.


Implemented this.

I noticed that the gin vacuum redo routine is dead code, except for the 
data-leaf page handling, because we never remove entries or internal 
nodes (page deletion is a separate wal record type). And the data-leaf 
case is functionally equivalent to heap newpage records. I removed the 
dead code and made it more clear that it resembles heap newpage.


Attached is a yet another version, with more bugs fixed and more 
comments added and updated. I would appreciate some heavy-testing of 
this patch now. If you could re-run the tests you've been using, that 
could be great. I've tested the WAL replay by replicating GIN operations 
over streaming replication. That doesn't guarantee it's correct, but 
it's a good smoke test.


- Heikki


gin-packed-postinglists-20140120.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-20 Thread Tomas Vondra
On 20.1.2014 19:30, Heikki Linnakangas wrote:
 
 Attached is a yet another version, with more bugs fixed and more 
 comments added and updated. I would appreciate some heavy-testing of 
 this patch now. If you could re-run the tests you've been using,
 that could be great. I've tested the WAL replay by replicating GIN
 operations over streaming replication. That doesn't guarantee it's
 correct, but it's a good smoke test.

I gave it a try - the OOM error seems to be gone, but now get this

   PANIC:  cannot insert duplicate items to GIN index page

This only happens when building the index incrementally (i.e. using a
sequence of INSERT statements into a table with GIN index). When I
create a new index on a table (already containing the same dataset) it
works just fine.

Also, I tried to reproduce the issue by running a simple plpgsql loop
(instead of a complex python script):

DO LANGUAGE plpgsql $$
DECLARE
r tsvector;
BEGIN
FOR r IN SELECT body_tsvector FROM data_table LOOP
INSERT INTO idx_table (body_tsvector) VALUES (r);
END LOOP;
END$$;

where data_table is the table with imported data (the same data I
mentioned in the post about OOM errors), and index_table is an empty
table with a GIN index. And indeed it fails, but only if I run the block
in multiple sessions in parallel.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Wed, Jan 15, 2014 at 10:47 AM, Alexander Korotkov
aekorot...@gmail.comwrote:

 On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 14.1.2014 00:38, Tomas Vondra wrote:
  On 13.1.2014 18:07, Alexander Korotkov wrote:
  On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  On 8.1.2014 22:58, Alexander Korotkov wrote:
   Thanks for reporting. Fixed version is attached.
 
  I've tried to rerun the 'archie' benchmark with the current patch,
 and
  once again I got
 
 PANIC:  could not split GIN page, didn't fit
 
  I reran it with '--enable-cassert' and with that I got
 
  TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
 items[i])  0), File: gindatapage.c, Line:
 149)
  LOG:  server process (PID 5364) was terminated by signal 6: Aborted
  DETAIL:  Failed process was running: INSERT INTO messages ...
 
  so the assert in GinDataLeafPageGetUncompressed fails for some
 reason.
 
  I can easily reproduce it, but my knowledge in this area is rather
  limited so I'm not entirely sure what to look for.
 
 
  I've fixed this bug and many other bug. Now patch passes test suite
 that
  I've used earlier. The results are so:
 
  OK, it seems the bug is gone. However now there's a memory leak
  somewhere. I'm loading pgsql mailing list archives (~600k messages)
  using this script
 
 https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
 
  And after loading about 1/5 of the data, all the memory gets filled by
  the pgsql backends (loading the data in parallel) and the DB gets killed
  by the OOM killer.

 I've spent a fair amount of time trying to locate the memory leak, but
 so far no luck. I'm not sufficiently familiar with the GIN code.

 I can however demonstrate that it's there, and I have rather simple test
 case to reproduce it - basically just a CREATE INDEX on a table with ~1M
 email message bodies (in a tsvector column). The data is available here
 (360MB compressed, 1GB raw):

http://www.fuzzy.cz/tmp/message-b.data.gz

 Simply create a single-column table, load data and create the index

CREATE TABLE test ( body_tsvector TSVECTOR );
COPY test FROM '/tmp/message-b.data';
CREATE test_idx ON test USING gin test ( body_tsvector );

 I'm running this on a machine with 8GB of RAM, with these settings

shared_buffers=1GB
maintenance_work_mem=1GB

 According to top, CREATE INDEX from the current HEAD never consumes more
 than ~25% of RAM:

 PID USER  PR  NIVIRTRESSHR  %CPU %MEM  COMMAND
   32091 tomas 20   0 2026032 1,817g 1,040g  56,2 23,8  postgres

 which is about right, as (shared_buffers + maintenance_work_mem) is
 about 1/4 of RAM.

 With the v5 patch version applied, the CREATE INDEX process eventually
 goes crazy and allocates almost all the available memory (but somesimes
 finishes, mostly by pure luck). This is what I was able to get from top

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM  COMMAND
   14090 tomas 20   0 7913820 6,962g 955036 D   4,3 91,1  postgres

 while the system was still reasonably responsive.


 Thanks a lot for your help! I believe problem is that each decompressed
 item pointers array is palloc'd but not freed. I hope to fix it today.


Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Heikki Linnakangas

On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

Seems to be fixed in the attached version of patch.
Another improvement in this version: only left-most segments and further
are re-encoded. Left part of page are left untouched.


I'm looking into this now. A few quick notes:

* Even when you don't re-encode the whole page, you still WAL-log the 
whole page. While correct, it'd be a pretty obvious optimization to only 
WAL-log the modified part.


* When new items are appended to the end of the page so that only the 
last existing compressed segment is re-encoded, and the page has to be 
split, the items aren't divided 50/50 on the pages. The loop that moves 
segments destined for the left page to the right won't move any 
existing, untouched, segments.



!   /*
!* Didn't fit uncompressed. We'll have to encode them. Check if both
!* new items and uncompressed items can be placed starting from last
!* segment of page. Then re-encode only last segment of page.
!*/
!   minNewItem = newItems[0];
!   if (nolduncompressed == 0 
!   ginCompareItemPointers(olduncompressed[0], minNewItem) 
 0)
!   minNewItem = olduncompressed[0];


That looks wrong. If I'm understanding it right, it's trying to do 
minNewItem = Min(newItems[0], olduncompressed[0]). The test should be 
nolduncompressed  0  ...


No need to send a new patch, I'll just fix those while reviewing...

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-17 Thread Alexander Korotkov
On Fri, Jan 17, 2014 at 10:38 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/17/2014 01:05 PM, Alexander Korotkov wrote:

 Seems to be fixed in the attached version of patch.
 Another improvement in this version: only left-most segments and further
 are re-encoded. Left part of page are left untouched.


 I'm looking into this now. A few quick notes:

 * Even when you don't re-encode the whole page, you still WAL-log the
 whole page. While correct, it'd be a pretty obvious optimization to only
 WAL-log the modified part.


Oh, I overlooked it. I wrote code in ginRedoInsertData which finds
appropriate place fro data but didn't notice that I still wrote whole page
to xlog record.


 * When new items are appended to the end of the page so that only the last
 existing compressed segment is re-encoded, and the page has to be split,
 the items aren't divided 50/50 on the pages. The loop that moves segments
 destined for the left page to the right won't move any existing, untouched,
 segments.


I think this loop will move one segment. However, it's too small.


  !   /*
 !* Didn't fit uncompressed. We'll have to encode them. Check if
 both
 !* new items and uncompressed items can be placed starting from
 last
 !* segment of page. Then re-encode only last segment of page.
 !*/
 !   minNewItem = newItems[0];
 !   if (nolduncompressed == 0 
 !   ginCompareItemPointers(olduncompressed[0],
 minNewItem)  0)
 !   minNewItem = olduncompressed[0];


 That looks wrong. If I'm understanding it right, it's trying to do
 minNewItem = Min(newItems[0], olduncompressed[0]). The test should be
 nolduncompressed  0  ...


Yes, definitely a bug.


 No need to send a new patch, I'll just fix those while reviewing...


Ok, thanks!

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-14 Thread Heikki Linnakangas

On 01/13/2014 07:07 PM, Alexander Korotkov wrote:

I've fixed this bug and many other bug. Now patch passes test suite that
I've used earlier. The results are so:

Operations time:
  event | period
---+-
  index_build   | 00:01:47.53915
  index_build_recovery  | 00:00:04
  index_update  | 00:05:24.388163
  index_update_recovery | 00:00:53
  search_new| 00:24:02.289384
  search_updated| 00:27:09.193343
(6 rows)

Index sizes:
  label |   size
---+---
  new   | 384761856
  after_updates | 667942912
(2 rows)

Also, I made following changes in algorithms:

- Now, there is a limit to number of uncompressed TIDs in the page.
After reaching this limit, they are encoded independent on if they can fit
page. That seems to me more desirable behaviour and somehow it accelerates
search speed. Before this change times were following:

  event | period
---+-
  index_build   | 00:01:51.467888
  index_build_recovery  | 00:00:04
  index_update  | 00:05:03.315155
  index_update_recovery | 00:00:51
  search_new| 00:24:43.194882
  search_updated| 00:28:36.316784
(6 rows)


Hmm, that's strange. Any idea why that's happening? One theory is that 
when you re-encode the pages more aggressively, there are fewer pages 
with a mix of packed and unpacked items. Mixed pages are somewhat slower 
to scan than fully packed or fully unpacked pages, because 
GinDataLeafPageGetItems() has to merge the packed and unpacked items 
into a single list. But I wouldn't expect that effect to be large enough 
to explain the results you got.



- Page are not fully re-encoded if it's enough to re-encode just last
segment.


Great! We should also take advantage of that in the WAL record that's 
written; no point in WAL-logging all the segments, if we know that only 
last one was modified.	


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-14 Thread Alexander Korotkov
On Tue, Jan 14, 2014 at 12:34 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 01/13/2014 07:07 PM, Alexander Korotkov wrote:

 I've fixed this bug and many other bug. Now patch passes test suite that
 I've used earlier. The results are so:

 Operations time:
   event | period
 ---+-
   index_build   | 00:01:47.53915
   index_build_recovery  | 00:00:04
   index_update  | 00:05:24.388163
   index_update_recovery | 00:00:53
   search_new| 00:24:02.289384
   search_updated| 00:27:09.193343
 (6 rows)

 Index sizes:
   label |   size
 ---+---
   new   | 384761856
   after_updates | 667942912
 (2 rows)

 Also, I made following changes in algorithms:

 - Now, there is a limit to number of uncompressed TIDs in the page.

 After reaching this limit, they are encoded independent on if they
 can fit
 page. That seems to me more desirable behaviour and somehow it
 accelerates
 search speed. Before this change times were following:

   event | period
 ---+-
   index_build   | 00:01:51.467888
   index_build_recovery  | 00:00:04
   index_update  | 00:05:03.315155
   index_update_recovery | 00:00:51
   search_new| 00:24:43.194882
   search_updated| 00:28:36.316784
 (6 rows)


 Hmm, that's strange. Any idea why that's happening? One theory is that
 when you re-encode the pages more aggressively, there are fewer pages with
 a mix of packed and unpacked items. Mixed pages are somewhat slower to scan
 than fully packed or fully unpacked pages, because
 GinDataLeafPageGetItems() has to merge the packed and unpacked items into a
 single list. But I wouldn't expect that effect to be large enough to
 explain the results you got.


Probably, it's because of less work in ginMergeItemPointers.

 - Page are not fully re-encoded if it's enough to re-encode just last
 segment.


 Great! We should also take advantage of that in the WAL record that's
 written; no point in WAL-logging all the segments, if we know that only
 last one was modified.


Already.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-14 Thread Tomas Vondra
On 14.1.2014 00:38, Tomas Vondra wrote:
 On 13.1.2014 18:07, Alexander Korotkov wrote:
 On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:

 On 8.1.2014 22:58, Alexander Korotkov wrote:
  Thanks for reporting. Fixed version is attached.

 I've tried to rerun the 'archie' benchmark with the current patch, and
 once again I got

PANIC:  could not split GIN page, didn't fit

 I reran it with '--enable-cassert' and with that I got

 TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
items[i])  0), File: gindatapage.c, Line: 149)
 LOG:  server process (PID 5364) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: INSERT INTO messages ...

 so the assert in GinDataLeafPageGetUncompressed fails for some reason.

 I can easily reproduce it, but my knowledge in this area is rather
 limited so I'm not entirely sure what to look for.


 I've fixed this bug and many other bug. Now patch passes test suite that
 I've used earlier. The results are so:
 
 OK, it seems the bug is gone. However now there's a memory leak
 somewhere. I'm loading pgsql mailing list archives (~600k messages)
 using this script
 
https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
 
 And after loading about 1/5 of the data, all the memory gets filled by
 the pgsql backends (loading the data in parallel) and the DB gets killed
 by the OOM killer.

I've spent a fair amount of time trying to locate the memory leak, but
so far no luck. I'm not sufficiently familiar with the GIN code.

I can however demonstrate that it's there, and I have rather simple test
case to reproduce it - basically just a CREATE INDEX on a table with ~1M
email message bodies (in a tsvector column). The data is available here
(360MB compressed, 1GB raw):

   http://www.fuzzy.cz/tmp/message-b.data.gz

Simply create a single-column table, load data and create the index

   CREATE TABLE test ( body_tsvector TSVECTOR );
   COPY test FROM '/tmp/message-b.data';
   CREATE test_idx ON test USING gin test ( body_tsvector );

I'm running this on a machine with 8GB of RAM, with these settings

   shared_buffers=1GB
   maintenance_work_mem=1GB

According to top, CREATE INDEX from the current HEAD never consumes more
than ~25% of RAM:

PID USER  PR  NIVIRTRESSHR  %CPU %MEM  COMMAND
  32091 tomas 20   0 2026032 1,817g 1,040g  56,2 23,8  postgres

which is about right, as (shared_buffers + maintenance_work_mem) is
about 1/4 of RAM.

With the v5 patch version applied, the CREATE INDEX process eventually
goes crazy and allocates almost all the available memory (but somesimes
finishes, mostly by pure luck). This is what I was able to get from top

PID USER  PR  NIVIRTRESSHR S  %CPU %MEM  COMMAND
  14090 tomas 20   0 7913820 6,962g 955036 D   4,3 91,1  postgres

while the system was still reasonably responsive.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-14 Thread Alexander Korotkov
On Wed, Jan 15, 2014 at 5:17 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 14.1.2014 00:38, Tomas Vondra wrote:
  On 13.1.2014 18:07, Alexander Korotkov wrote:
  On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz wrote:
 
  On 8.1.2014 22:58, Alexander Korotkov wrote:
   Thanks for reporting. Fixed version is attached.
 
  I've tried to rerun the 'archie' benchmark with the current patch,
 and
  once again I got
 
 PANIC:  could not split GIN page, didn't fit
 
  I reran it with '--enable-cassert' and with that I got
 
  TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
 items[i])  0), File: gindatapage.c, Line:
 149)
  LOG:  server process (PID 5364) was terminated by signal 6: Aborted
  DETAIL:  Failed process was running: INSERT INTO messages ...
 
  so the assert in GinDataLeafPageGetUncompressed fails for some
 reason.
 
  I can easily reproduce it, but my knowledge in this area is rather
  limited so I'm not entirely sure what to look for.
 
 
  I've fixed this bug and many other bug. Now patch passes test suite that
  I've used earlier. The results are so:
 
  OK, it seems the bug is gone. However now there's a memory leak
  somewhere. I'm loading pgsql mailing list archives (~600k messages)
  using this script
 
 https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py
 
  And after loading about 1/5 of the data, all the memory gets filled by
  the pgsql backends (loading the data in parallel) and the DB gets killed
  by the OOM killer.

 I've spent a fair amount of time trying to locate the memory leak, but
 so far no luck. I'm not sufficiently familiar with the GIN code.

 I can however demonstrate that it's there, and I have rather simple test
 case to reproduce it - basically just a CREATE INDEX on a table with ~1M
 email message bodies (in a tsvector column). The data is available here
 (360MB compressed, 1GB raw):

http://www.fuzzy.cz/tmp/message-b.data.gz

 Simply create a single-column table, load data and create the index

CREATE TABLE test ( body_tsvector TSVECTOR );
COPY test FROM '/tmp/message-b.data';
CREATE test_idx ON test USING gin test ( body_tsvector );

 I'm running this on a machine with 8GB of RAM, with these settings

shared_buffers=1GB
maintenance_work_mem=1GB

 According to top, CREATE INDEX from the current HEAD never consumes more
 than ~25% of RAM:

 PID USER  PR  NIVIRTRESSHR  %CPU %MEM  COMMAND
   32091 tomas 20   0 2026032 1,817g 1,040g  56,2 23,8  postgres

 which is about right, as (shared_buffers + maintenance_work_mem) is
 about 1/4 of RAM.

 With the v5 patch version applied, the CREATE INDEX process eventually
 goes crazy and allocates almost all the available memory (but somesimes
 finishes, mostly by pure luck). This is what I was able to get from top

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM  COMMAND
   14090 tomas 20   0 7913820 6,962g 955036 D   4,3 91,1  postgres

 while the system was still reasonably responsive.


Thanks a lot for your help! I believe problem is that each decompressed
item pointers array is palloc'd but not freed. I hope to fix it today.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-13 Thread Alexander Korotkov
On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 8.1.2014 22:58, Alexander Korotkov wrote:
  Thanks for reporting. Fixed version is attached.

 I've tried to rerun the 'archie' benchmark with the current patch, and
 once again I got

PANIC:  could not split GIN page, didn't fit

 I reran it with '--enable-cassert' and with that I got

 TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
items[i])  0), File: gindatapage.c, Line: 149)
 LOG:  server process (PID 5364) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: INSERT INTO messages ...

 so the assert in GinDataLeafPageGetUncompressed fails for some reason.

 I can easily reproduce it, but my knowledge in this area is rather
 limited so I'm not entirely sure what to look for.


I've fixed this bug and many other bug. Now patch passes test suite that
I've used earlier. The results are so:

Operations time:
 event | period
---+-
 index_build   | 00:01:47.53915
 index_build_recovery  | 00:00:04
 index_update  | 00:05:24.388163
 index_update_recovery | 00:00:53
 search_new| 00:24:02.289384
 search_updated| 00:27:09.193343
(6 rows)

Index sizes:
 label |   size
---+---
 new   | 384761856
 after_updates | 667942912
(2 rows)

Also, I made following changes in algorithms:

   - Now, there is a limit to number of uncompressed TIDs in the page.
   After reaching this limit, they are encoded independent on if they can fit
   page. That seems to me more desirable behaviour and somehow it accelerates
   search speed. Before this change times were following:

 event | period
---+-
 index_build   | 00:01:51.467888
 index_build_recovery  | 00:00:04
 index_update  | 00:05:03.315155
 index_update_recovery | 00:00:51
 search_new| 00:24:43.194882
 search_updated| 00:28:36.316784
(6 rows)

   - Page are not fully re-encoded if it's enough to re-encode just last
   segment.


README is updated.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte5.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-13 Thread Tomas Vondra
On 13.1.2014 18:07, Alexander Korotkov wrote:
 On Sat, Jan 11, 2014 at 6:15 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 On 8.1.2014 22:58, Alexander Korotkov wrote:
  Thanks for reporting. Fixed version is attached.
 
 I've tried to rerun the 'archie' benchmark with the current patch, and
 once again I got
 
PANIC:  could not split GIN page, didn't fit
 
 I reran it with '--enable-cassert' and with that I got
 
 TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
items[i])  0), File: gindatapage.c, Line: 149)
 LOG:  server process (PID 5364) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: INSERT INTO messages ...
 
 so the assert in GinDataLeafPageGetUncompressed fails for some reason.
 
 I can easily reproduce it, but my knowledge in this area is rather
 limited so I'm not entirely sure what to look for.
 
 
 I've fixed this bug and many other bug. Now patch passes test suite that
 I've used earlier. The results are so:

OK, it seems the bug is gone. However now there's a memory leak
somewhere. I'm loading pgsql mailing list archives (~600k messages)
using this script

   https://bitbucket.org/tvondra/archie/src/1bbeb920/bin/load.py

And after loading about 1/5 of the data, all the memory gets filled by
the pgsql backends (loading the data in parallel) and the DB gets killed
by the OOM killer.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-13 Thread Etsuro Fujita
Peter Eisentraut wrote:
 On 12/10/13, 2:44 PM, Alexander Korotkov wrote:
  However, patch didn't apply to head. Corrected version is attached.

 Update the pgstattuple regression test results.

The latest version of the patch still doesn't pass the test.

I'll look at the patch in further detail.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-10 Thread Tomas Vondra
On 8.1.2014 22:58, Alexander Korotkov wrote:
 Thanks for reporting. Fixed version is attached.

I've tried to rerun the 'archie' benchmark with the current patch, and
once again I got

   PANIC:  could not split GIN page, didn't fit

I reran it with '--enable-cassert' and with that I got

TRAP: FailedAssertion(!(ginCompareItemPointers(items[i - 1],
   items[i])  0), File: gindatapage.c, Line: 149)
LOG:  server process (PID 5364) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: INSERT INTO messages ...

so the assert in GinDataLeafPageGetUncompressed fails for some reason.

I can easily reproduce it, but my knowledge in this area is rather
limited so I'm not entirely sure what to look for.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-08 Thread Alexander Korotkov
On Mon, Jan 6, 2014 at 12:35 PM, Amit Langote amitlangot...@gmail.comwrote:

 On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 
  Yet another version. The encoding/decoding code is now quite isolated in
  ginpostinglist.c, so it's easy to experiment with different encodings.
 This
  patch uses varbyte encoding again.
 
  I got a bit carried away, experimented with a bunch of different
 encodings.
  I tried rice encoding, rice encoding with block and offset number delta
  stored separately, the simple9 variant, and varbyte encoding.
 
  The compressed size obviously depends a lot on the distribution of the
  items, but in the test set I used, the differences between different
  encodings were quite small.
 
  One fatal problem with many encodings is VACUUM. If a page is completely
  full and you remove one item, the result must still fit. In other words,
  removing an item must never enlarge the space needed. Otherwise we have
 to
  be able to split on vacuum, which adds a lot of code, and also makes it
  possible for VACUUM to fail if there is no disk space left. That's
  unpleasant if you're trying to run VACUUM to release disk space. (gin
 fast
  updates already has that problem BTW, but let's not make it worse)
 
  I believe that eliminates all encodings in the Simple family, as well as
  PForDelta, and surprisingly also Rice encoding. For example, if you have
  three items in consecutive offsets, the differences between them are
 encoded
  as 11 in rice encoding. If you remove the middle item, the encoding for
 the
  next item becomes 010, which takes more space than the original.
 
  AFAICS varbyte encoding is safe from that. (a formal proof would be nice
  though).
 
  So, I'm happy to go with varbyte encoding now, indeed I don't think we
 have
  much choice unless someone can come up with an alternative that's
  VACUUM-safe. I have to put this patch aside for a while now, I spent a
 lot
  more time on these encoding experiments than I intended. If you could
 take a
  look at this latest version, spend some time reviewing it and cleaning up
  any obsolete comments, and re-run the performance tests you did earlier,
  that would be great. One thing I'm slightly worried about is the
 overhead of
  merging the compressed and uncompressed posting lists in a scan. This
 patch
  will be in good shape for the final commitfest, or even before that.
 


 I just tried out the patch gin-packed-postinglists-varbyte2.patch
 (which looks like the latest one in this thread) as follows:

 1) Applied patch to the HEAD (on commit
 94b899b829657332bda856ac3f06153d09077bd1)
 2) Created a test table and index

 create table test (a text);
 copy test from '/usr/share/dict/words';
 create index test_trgm_idx on test using gin (a gin_trgm_ops);

 3) Got the following error on a wildcard query:

 postgres=# explain (buffers, analyze) select count(*) from test where
 a like '%tio%';
 ERROR:  lock 9447 is not held
 STATEMENT:  explain (buffers, analyze) select count(*) from test where
 a like '%tio%';
 ERROR:  lock 9447 is not held


Thanks for reporting. Fixed version is attached.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-varbyte3.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2014-01-06 Thread Amit Langote
On Sat, Dec 21, 2013 at 4:36 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

 Yet another version. The encoding/decoding code is now quite isolated in
 ginpostinglist.c, so it's easy to experiment with different encodings. This
 patch uses varbyte encoding again.

 I got a bit carried away, experimented with a bunch of different encodings.
 I tried rice encoding, rice encoding with block and offset number delta
 stored separately, the simple9 variant, and varbyte encoding.

 The compressed size obviously depends a lot on the distribution of the
 items, but in the test set I used, the differences between different
 encodings were quite small.

 One fatal problem with many encodings is VACUUM. If a page is completely
 full and you remove one item, the result must still fit. In other words,
 removing an item must never enlarge the space needed. Otherwise we have to
 be able to split on vacuum, which adds a lot of code, and also makes it
 possible for VACUUM to fail if there is no disk space left. That's
 unpleasant if you're trying to run VACUUM to release disk space. (gin fast
 updates already has that problem BTW, but let's not make it worse)

 I believe that eliminates all encodings in the Simple family, as well as
 PForDelta, and surprisingly also Rice encoding. For example, if you have
 three items in consecutive offsets, the differences between them are encoded
 as 11 in rice encoding. If you remove the middle item, the encoding for the
 next item becomes 010, which takes more space than the original.

 AFAICS varbyte encoding is safe from that. (a formal proof would be nice
 though).

 So, I'm happy to go with varbyte encoding now, indeed I don't think we have
 much choice unless someone can come up with an alternative that's
 VACUUM-safe. I have to put this patch aside for a while now, I spent a lot
 more time on these encoding experiments than I intended. If you could take a
 look at this latest version, spend some time reviewing it and cleaning up
 any obsolete comments, and re-run the performance tests you did earlier,
 that would be great. One thing I'm slightly worried about is the overhead of
 merging the compressed and uncompressed posting lists in a scan. This patch
 will be in good shape for the final commitfest, or even before that.



I just tried out the patch gin-packed-postinglists-varbyte2.patch
(which looks like the latest one in this thread) as follows:

1) Applied patch to the HEAD (on commit
94b899b829657332bda856ac3f06153d09077bd1)
2) Created a test table and index

create table test (a text);
copy test from '/usr/share/dict/words';
create index test_trgm_idx on test using gin (a gin_trgm_ops);

3) Got the following error on a wildcard query:

postgres=# explain (buffers, analyze) select count(*) from test where
a like '%tio%';
ERROR:  lock 9447 is not held
STATEMENT:  explain (buffers, analyze) select count(*) from test where
a like '%tio%';
ERROR:  lock 9447 is not held

Following is the bt:

#0  LWLockRelease (lockid=9447) at lwlock.c:747
#1  0x0074a356 in LockBuffer (buffer=4638, mode=0) at bufmgr.c:2760
#2  0x00749d6e in UnlockReleaseBuffer (buffer=4638) at bufmgr.c:2551
#3  0x00478bcc in entryGetNextItem (ginstate=0x2380100,
entry=0x23832a8) at ginget.c:555
#4  0x00479346 in entryGetItem (ginstate=0x2380100,
entry=0x23832a8) at ginget.c:695
#5  0x0047987e in scanGetItem (scan=0x237fee8,
advancePast=0x7fffa1a46b80, item=0x7fffa1a46b80,
recheck=0x7fffa1a46b7f \001) at ginget.c:925
#6  0x0047ae3f in gingetbitmap (fcinfo=0x7fffa1a46be0) at ginget.c:1540
#7  0x008a9486 in FunctionCall2Coll (flinfo=0x236f558,
collation=0, arg1=37224168, arg2=37236160) at fmgr.c:1323
#8  0x004bd091 in index_getbitmap (scan=0x237fee8,
bitmap=0x2382dc0) at indexam.c:649
#9  0x0064827c in MultiExecBitmapIndexScan (node=0x237fce0) at
nodeBitmapIndexscan.c:89
#10 0x006313b6 in MultiExecProcNode (node=0x237fce0) at
execProcnode.c:550
#11 0x0064713a in BitmapHeapNext (node=0x237e998) at
nodeBitmapHeapscan.c:104
#12 0x0063c348 in ExecScanFetch (node=0x237e998,
accessMtd=0x6470ac BitmapHeapNext, recheckMtd=0x647cbc
BitmapHeapRecheck) at execScan.c:82
#13 0x0063c3bc in ExecScan (node=0x237e998, accessMtd=0x6470ac
BitmapHeapNext, recheckMtd=0x647cbc BitmapHeapRecheck) at
execScan.c:132
#14 0x00647d3a in ExecBitmapHeapScan (node=0x237e998) at
nodeBitmapHeapscan.c:436
#15 0x00631121 in ExecProcNode (node=0x237e998) at execProcnode.c:414
#16 0x00644992 in agg_retrieve_direct (aggstate=0x237e200) at
nodeAgg.c:1073
#17 0x006448cc in ExecAgg (node=0x237e200) at nodeAgg.c:1026
#18 0x00631247 in ExecProcNode (node=0x237e200) at execProcnode.c:476
#19 0x0062ef2a in ExecutePlan (estate=0x237e0e8,
planstate=0x237e200, operation=CMD_SELECT, sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0xd0c360) at
execMain.c:1474
#20 0x0062cfeb 

Re: [HACKERS] GIN improvements part 1: additional information

2013-12-24 Thread Peter Eisentraut
On 12/10/13, 2:44 PM, Alexander Korotkov wrote:
 However, patch didn't apply to head. Corrected version is attached.

Update the pgstattuple regression test results.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Heikki Linnakangas

On 12/19/2013 10:44 AM, Heikki Linnakangas wrote:

On 12/19/2013 08:37 AM, Oleg Bartunov wrote:

Guys,

before digging deep into the art of comp/decomp world I'd like to know
if you familiar with results of
http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
some newer research ?


Yeah, I saw that paper.


Do we agree in what we really want ? Basically,
there are three main features: size, compression, decompression speed
- we should take two :)


According to that Zhang et al paper you linked, the Vbyte method
actually performs the worst on all of those measures. The other
algorithms are quite similar in terms of size (PForDelta being the most
efficient), while PForDelta is significantly faster to compress/decompress.

Just by looking at those numbers, PForDelta looks like a clear winner.
However, it operates on much bigger batches than the other algorithms; I
haven't looked at it in detail, but Zhang et al used 128 integer
batches, and they say that 32 integers is the minimum batch size. If we
want to use it for the inline posting lists stored in entry tuples, that
would be quite wasteful if there are only a few item pointers on the tuple.

Also, in the tests I've run, the compression/decompression speed is not
a significant factor in total performance, with either varbyte encoding
or Simple9-like encoding I hacked together.


One disadvantage of Simple9 and other encodings that operate in batches 
is that removing a value from the middle can increase the number of 
bytes required for the remaining values. That's a problem during vacuum; 
it's possible that after vacuuming away one item pointer, the remaining 
items no longer fit on the page.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Heikki Linnakangas

On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:

On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
hlinnakan...@vmware.com

wrote:



On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert

inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items
uncompressed,
in a separate area in the page. For example, grow the list of
uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.


I hacked together an implementation of a variant of Simple9, to see how
it performs. Insertions are handled per the above scheme.


Here's an updated version of that, using the page layout without
item-indexes that I described in the other post. This is much less buggy
than that earlier crude version I posted - and unfortunately it doesn't
compress as well. The earlier version lost some items :-(.

Nevertheless, I think this page layout and code formatting is better,
even if we switch the encoding back to the varbyte encoding in the end.


Yet another version. The encoding/decoding code is now quite isolated in 
ginpostinglist.c, so it's easy to experiment with different encodings. 
This patch uses varbyte encoding again.


I got a bit carried away, experimented with a bunch of different 
encodings. I tried rice encoding, rice encoding with block and offset 
number delta stored separately, the simple9 variant, and varbyte encoding.


The compressed size obviously depends a lot on the distribution of the 
items, but in the test set I used, the differences between different 
encodings were quite small.


One fatal problem with many encodings is VACUUM. If a page is completely 
full and you remove one item, the result must still fit. In other words, 
removing an item must never enlarge the space needed. Otherwise we have 
to be able to split on vacuum, which adds a lot of code, and also makes 
it possible for VACUUM to fail if there is no disk space left. That's 
unpleasant if you're trying to run VACUUM to release disk space. (gin 
fast updates already has that problem BTW, but let's not make it worse)


I believe that eliminates all encodings in the Simple family, as well as 
PForDelta, and surprisingly also Rice encoding. For example, if you have 
three items in consecutive offsets, the differences between them are 
encoded as 11 in rice encoding. If you remove the middle item, the 
encoding for the next item becomes 010, which takes more space than the 
original.


AFAICS varbyte encoding is safe from that. (a formal proof would be nice 
though).


So, I'm happy to go with varbyte encoding now, indeed I don't think we 
have much choice unless someone can come up with an alternative that's 
VACUUM-safe. I have to put this patch aside for a while now, I spent a 
lot more time on these encoding experiments than I intended. If you 
could take a look at this latest version, spend some time reviewing it 
and cleaning up any obsolete comments, and re-run the performance tests 
you did earlier, that would be great. One thing I'm slightly worried 
about is the overhead of merging the compressed and uncompressed posting 
lists in a scan. This patch will be in good shape for the final 
commitfest, or even before that.


- Heikki


gin-packed-postinglists-varbyte2.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alvaro Herrera
Heikki Linnakangas escribió:

 I believe that eliminates all encodings in the Simple family, as
 well as PForDelta, and surprisingly also Rice encoding. For example,
 if you have three items in consecutive offsets, the differences
 between them are encoded as 11 in rice encoding. If you remove the
 middle item, the encoding for the next item becomes 010, which takes
 more space than the original.

I don't understand this.  If you have three consecutive entries, and the
differences between them are 11, you need to store two 11s.  But if you
have two items, you only need to store 010 once.  So the difference is
larger, but since you need to store only one of them then overall it's
still shorter than the original.  No?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alexander Korotkov
On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera
alvhe...@2ndquadrant.comwrote:

 Heikki Linnakangas escribió:

  I believe that eliminates all encodings in the Simple family, as
  well as PForDelta, and surprisingly also Rice encoding. For example,
  if you have three items in consecutive offsets, the differences
  between them are encoded as 11 in rice encoding. If you remove the
  middle item, the encoding for the next item becomes 010, which takes
  more space than the original.

 I don't understand this.  If you have three consecutive entries, and the
 differences between them are 11, you need to store two 11s.  But if you
 have two items, you only need to store 010 once.  So the difference is
 larger, but since you need to store only one of them then overall it's
 still shorter than the original.  No?


I believe Heikki mean both differences are encoded as 11, each one is 1.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alvaro Herrera
Alexander Korotkov escribió:
 On Fri, Dec 20, 2013 at 11:43 PM, Alvaro Herrera
 alvhe...@2ndquadrant.comwrote:
 
  Heikki Linnakangas escribió:
 
   I believe that eliminates all encodings in the Simple family, as
   well as PForDelta, and surprisingly also Rice encoding. For example,
   if you have three items in consecutive offsets, the differences
   between them are encoded as 11 in rice encoding. If you remove the
   middle item, the encoding for the next item becomes 010, which takes
   more space than the original.
 
  I don't understand this.  If you have three consecutive entries, and the
  differences between them are 11, you need to store two 11s.  But if you
  have two items, you only need to store 010 once.  So the difference is
  larger, but since you need to store only one of them then overall it's
  still shorter than the original.  No?
 
 I believe Heikki mean both differences are encoded as 11, each one is 1.

Oh, that sucks (or it's great, depending on perspective).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-20 Thread Alexander Korotkov
On Fri, Dec 20, 2013 at 11:36 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 12/19/2013 03:33 PM, Heikki Linnakangas wrote:

 On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:

 On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

 On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
 hlinnakan...@vmware.com

 wrote:


  On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

   When values are packed into small groups, we have to either insert

 inefficiently encoded value or re-encode whole right part of values.


 It would probably be simplest to store newly inserted items
 uncompressed,
 in a separate area in the page. For example, grow the list of
 uncompressed
 items downwards from pg_upper, and the compressed items upwards from
 pg_lower. When the page fills up, re-encode the whole page.


 I hacked together an implementation of a variant of Simple9, to see how
 it performs. Insertions are handled per the above scheme.


 Here's an updated version of that, using the page layout without
 item-indexes that I described in the other post. This is much less buggy
 than that earlier crude version I posted - and unfortunately it doesn't
 compress as well. The earlier version lost some items :-(.

 Nevertheless, I think this page layout and code formatting is better,
 even if we switch the encoding back to the varbyte encoding in the end.


 Yet another version. The encoding/decoding code is now quite isolated in
 ginpostinglist.c, so it's easy to experiment with different encodings. This
 patch uses varbyte encoding again.

 I got a bit carried away, experimented with a bunch of different
 encodings. I tried rice encoding, rice encoding with block and offset
 number delta stored separately, the simple9 variant, and varbyte encoding.

 The compressed size obviously depends a lot on the distribution of the
 items, but in the test set I used, the differences between different
 encodings were quite small.

 One fatal problem with many encodings is VACUUM. If a page is completely
 full and you remove one item, the result must still fit. In other words,
 removing an item must never enlarge the space needed. Otherwise we have to
 be able to split on vacuum, which adds a lot of code, and also makes it
 possible for VACUUM to fail if there is no disk space left. That's
 unpleasant if you're trying to run VACUUM to release disk space. (gin fast
 updates already has that problem BTW, but let's not make it worse)

 I believe that eliminates all encodings in the Simple family, as well as
 PForDelta, and surprisingly also Rice encoding. For example, if you have
 three items in consecutive offsets, the differences between them are
 encoded as 11 in rice encoding. If you remove the middle item, the encoding
 for the next item becomes 010, which takes more space than the original.

 AFAICS varbyte encoding is safe from that. (a formal proof would be nice
 though).


Formal proof is so. Removing number is actually replacement of two numbers
with their sum. We have to proof that varbyte encoding of sum can't be
longer than varbyte encoding of summands.High bit number of sum is at most
one more than high bit number of larger summand. So varbyte encoding of sum
is at most one byte longer than varbyte encoding of larger summand. Lesser
summand is at least one byte.


 So, I'm happy to go with varbyte encoding now, indeed I don't think we
 have much choice unless someone can come up with an alternative that's
 VACUUM-safe. I have to put this patch aside for a while now, I spent a lot
 more time on these encoding experiments than I intended. If you could take
 a look at this latest version, spend some time reviewing it and cleaning up
 any obsolete comments, and re-run the performance tests you did earlier,
 that would be great. One thing I'm slightly worried about is the overhead
 of merging the compressed and uncompressed posting lists in a scan. This
 patch will be in good shape for the final commitfest, or even before that.


OK. I'll make tests for scans on mix of compressed and uncompressed posting
lists. However, I don't expect it to be slower than both pure compressed
and pure uncompressed posting lists. Overhead of compressing uncompressed
posting lists is evident. But it also would be nice to measure.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-19 Thread Heikki Linnakangas

On 12/19/2013 08:37 AM, Oleg Bartunov wrote:

Guys,

before digging deep into the art of comp/decomp world I'd like to know
if you familiar with results of
http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
some newer research ?


Yeah, I saw that paper.


Do we agree in what we really want ? Basically,
there are three main features: size, compression, decompression speed
- we should take two :)


According to that Zhang et al paper you linked, the Vbyte method 
actually performs the worst on all of those measures. The other 
algorithms are quite similar in terms of size (PForDelta being the most 
efficient), while PForDelta is significantly faster to compress/decompress.


Just by looking at those numbers, PForDelta looks like a clear winner. 
However, it operates on much bigger batches than the other algorithms; I 
haven't looked at it in detail, but Zhang et al used 128 integer 
batches, and they say that 32 integers is the minimum batch size. If we 
want to use it for the inline posting lists stored in entry tuples, that 
would be quite wasteful if there are only a few item pointers on the tuple.


Also, in the tests I've run, the compression/decompression speed is not 
a significant factor in total performance, with either varbyte encoding 
or Simple9-like encoding I hacked together.


Actually, now that I think about this a bit more, maybe we should go 
with Rice encoding after all? It's the most efficient in terms of size, 
and I believe it would be fast enough.



Should we design sort of plugin, which could support independent
storage on disk, so users can apply different techniques, depending on
data.

What I want to say is that we certainly can play with this very
challenged task, but we have limited time  before 9.4 and we should
think in positive direction.


Once we have the code in place to deal with one encoding, it's easy to 
switch the implementation. Making it user-configurable or pluggable 
would be overkill IMHO.


What I'm saying is that we should make sure we get the page format right 
(in particular, I strongly feel we should use the self-contained 
PostingListSegment struct instead of item-indees that I mentioned in the 
other post), with the implementation details hidden in the functions in 
ginpostinglist.c. Then we can easily experiment with different algorithms.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-19 Thread Heikki Linnakangas

On 12/17/2013 12:49 AM, Heikki Linnakangas wrote:

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas
hlinnakan...@vmware.com

wrote:



On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert

inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items
uncompressed,
in a separate area in the page. For example, grow the list of
uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.


I hacked together an implementation of a variant of Simple9, to see how
it performs. Insertions are handled per the above scheme.


Here's an updated version of that, using the page layout without 
item-indexes that I described in the other post. This is much less buggy 
than that earlier crude version I posted - and unfortunately it doesn't 
compress as well. The earlier version lost some items :-(.


Nevertheless, I think this page layout and code formatting is better, 
even if we switch the encoding back to the varbyte encoding in the end.


I haven't tested WAL replay or VACUUM with this version yet, so those 
are likely broken.


- Heikki


gin-packed-postinglists-simple8-segments-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-18 Thread Alexander Korotkov
On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

 On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com

 wrote:


  On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

   When values are packed into small groups, we have to either insert

 inefficiently encoded value or re-encode whole right part of values.


 It would probably be simplest to store newly inserted items uncompressed,
 in a separate area in the page. For example, grow the list of
 uncompressed
 items downwards from pg_upper, and the compressed items upwards from
 pg_lower. When the page fills up, re-encode the whole page.


 I hacked together an implementation of a variant of Simple9, to see how it
 performs. Insertions are handled per the above scheme.

 In a limited pg_trgm test case I've been using a lot for this, this
 reduces the index size about 20%, compared to varbyte encoding. It might be
 possible to squeeze it a bit more, I handcrafted the selectors in the
 encoding algorithm to suite our needs, but I don't actually have a good
 idea of how to choose them optimally. Also, the encoding can encode 0
 values, but we never need to do that, so you could take advantage of that
 to pack items tighter.

 Compression and decompression speed seems to be about the same.

 Patch attached if you want to play with it. WAL replay is still broken,
 and there are probably bugs.


  Good idea. But:
 1) We'll still need item indexes in the end of page for fast scan.


 Sure.


  2) Storage would be easily extendable to hold additional information as
 well.
 Better compression shouldn't block more serious improvements.


 I'm not sure I agree with that. For all the cases where you don't care
 about additional information - which covers all existing users for example
 - reducing disk size is pretty important. How are you planning to store the
 additional information, and how does using another encoding gets in the way
 of that?


I was planned to store additional information datums between
varbyte-encoded tids. I was expected it would be hard to do with PFOR.
However, I don't see significant problems in your implementation of Simple9
encoding. I'm going to dig deeper in your version of patch.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-18 Thread Heikki Linnakangas

On 12/18/2013 01:45 PM, Alexander Korotkov wrote:

On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
  2) Storage would be easily extendable to hold additional information as

well.
Better compression shouldn't block more serious improvements.



I'm not sure I agree with that. For all the cases where you don't care
about additional information - which covers all existing users for example
- reducing disk size is pretty important. How are you planning to store the
additional information, and how does using another encoding gets in the way
of that?


I was planned to store additional information datums between
varbyte-encoded tids. I was expected it would be hard to do with PFOR.
However, I don't see significant problems in your implementation of Simple9
encoding. I'm going to dig deeper in your version of patch.


Ok, thanks.

I had another idea about the page format this morning. Instead of having 
the item-indexes at the end of the page, it would be more flexible to 
store a bunch of self-contained posting list segments on the page. So 
I propose that we get rid of the item-indexes, and instead store a bunch 
of independent posting lists on the page:


typedef struct
{
ItemPointerData first;   /* first item in this segment (unpacked) */
uint16  nwords;  /* number of words that follow */
uint64  words[1];/* var length */
} PostingListSegment;

Each segment can be encoded and decoded independently. When searching 
for a particular item (like on insertion), you skip over segments where 
'first'  the item you're searching for.


This format offers a lot more flexibility compared to the separate item 
indexes. First, we don't need to have another fixed sized area on the 
page, which simplifies the page format. Second, we can more easily 
re-encode only one segment on the page, on insertion or vacuum. The 
latter is particularly important with the Simple-9 encoding, which 
operates one word at a time rather than one item at a time; removing or 
inserting an item in the middle can require a complete re-encoding of 
everything that follows. Third, when a page is being inserted into and 
contains only uncompressed items, you don't waste any space for unused 
item indexes.


While we're at it, I think we should use the above struct in the inline 
posting lists stored directly in entry tuples. That wastes a few bytes 
compared to the current approach in the patch (more alignment, and 
'words' is redundant with the number of items stored on the tuple 
header), but it simplifies the functions handling these lists.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-18 Thread Oleg Bartunov
Guys,

before digging deep into the art of comp/decomp world I'd like to know
if you familiar with results of
http://wwwconference.org/www2008/papers/pdf/p387-zhangA.pdf paper and
some newer research ?  Do we agree in what we really want ? Basically,
there are three main features: size, compression, decompression speed
- we should take two :)

Should we design sort of plugin, which could support independent
storage on disk, so users can apply different techniques, depending on
data.

What I want to say is that we certainly can play with this very
challenged task, but we have limited time  before 9.4 and we should
think in positive direction.

Oleg

On Wed, Dec 18, 2013 at 6:50 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 12/18/2013 01:45 PM, Alexander Korotkov wrote:

 On Tue, Dec 17, 2013 at 2:49 AM, Heikki Linnakangas
 hlinnakan...@vmware.com

 wrote:


 On 12/17/2013 12:22 AM, Alexander Korotkov wrote:
   2) Storage would be easily extendable to hold additional information as

 well.
 Better compression shouldn't block more serious improvements.


 I'm not sure I agree with that. For all the cases where you don't care
 about additional information - which covers all existing users for
 example
 - reducing disk size is pretty important. How are you planning to store
 the
 additional information, and how does using another encoding gets in the
 way
 of that?


 I was planned to store additional information datums between
 varbyte-encoded tids. I was expected it would be hard to do with PFOR.
 However, I don't see significant problems in your implementation of
 Simple9
 encoding. I'm going to dig deeper in your version of patch.


 Ok, thanks.

 I had another idea about the page format this morning. Instead of having the
 item-indexes at the end of the page, it would be more flexible to store a
 bunch of self-contained posting list segments on the page. So I propose
 that we get rid of the item-indexes, and instead store a bunch of
 independent posting lists on the page:

 typedef struct
 {
 ItemPointerData first;   /* first item in this segment (unpacked) */
 uint16  nwords;  /* number of words that follow */
 uint64  words[1];/* var length */
 } PostingListSegment;

 Each segment can be encoded and decoded independently. When searching for a
 particular item (like on insertion), you skip over segments where 'first' 
 the item you're searching for.

 This format offers a lot more flexibility compared to the separate item
 indexes. First, we don't need to have another fixed sized area on the page,
 which simplifies the page format. Second, we can more easily re-encode only
 one segment on the page, on insertion or vacuum. The latter is particularly
 important with the Simple-9 encoding, which operates one word at a time
 rather than one item at a time; removing or inserting an item in the middle
 can require a complete re-encoding of everything that follows. Third, when a
 page is being inserted into and contains only uncompressed items, you don't
 waste any space for unused item indexes.

 While we're at it, I think we should use the above struct in the inline
 posting lists stored directly in entry tuples. That wastes a few bytes
 compared to the current approach in the patch (more alignment, and 'words'
 is redundant with the number of items stored on the tuple header), but it
 simplifies the functions handling these lists.


 - Heikki


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-16 Thread Heikki Linnakangas

On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

I've thought about different algorithms little more. General problem I see
is online update. We need it while it is typically not covered by
researches at all. We already have to invent small index in the end of
page. Different encoding methods adds more challenges. In general, methods
can be classified in two groups:
1) Values aren't aligned by bytes (gamma-codes, PFOR etc.)
2) Multiple values are packed together in small group (simple-9, simple-18)


Ok.


For the first group of methods when inserting in the middle of the page we
would have to do not byte-aligned shift of right part of values. I don't
know how expensive is this shift but I expect that it would be much slower
than memmove.


Agreed.


When values are packed into small groups, we have to either insert
inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items 
uncompressed, in a separate area in the page. For example, grow the list 
of uncompressed items downwards from pg_upper, and the compressed items 
upwards from pg_lower. When the page fills up, re-encode the whole page.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-16 Thread Alexander Korotkov
On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

 I've thought about different algorithms little more. General problem I see
 is online update. We need it while it is typically not covered by
 researches at all. We already have to invent small index in the end of
 page. Different encoding methods adds more challenges. In general, methods
 can be classified in two groups:
 1) Values aren't aligned by bytes (gamma-codes, PFOR etc.)
 2) Multiple values are packed together in small group (simple-9,
 simple-18)


 Ok.


  For the first group of methods when inserting in the middle of the page we
 would have to do not byte-aligned shift of right part of values. I don't
 know how expensive is this shift but I expect that it would be much slower
 than memmove.


 Agreed.


  When values are packed into small groups, we have to either insert
 inefficiently encoded value or re-encode whole right part of values.


 It would probably be simplest to store newly inserted items uncompressed,
 in a separate area in the page. For example, grow the list of uncompressed
 items downwards from pg_upper, and the compressed items upwards from
 pg_lower. When the page fills up, re-encode the whole page.


Good idea. But:
1) We'll still need item indexes in the end of page for fast scan.
2) Storage would be easily extendable to hold additional information as
well.
Better compression shouldn't block more serious improvements.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-16 Thread Heikki Linnakangas

On 12/17/2013 12:22 AM, Alexander Korotkov wrote:

On Mon, Dec 16, 2013 at 3:30 PM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



On 12/12/2013 06:44 PM, Alexander Korotkov wrote:

  When values are packed into small groups, we have to either insert

inefficiently encoded value or re-encode whole right part of values.


It would probably be simplest to store newly inserted items uncompressed,
in a separate area in the page. For example, grow the list of uncompressed
items downwards from pg_upper, and the compressed items upwards from
pg_lower. When the page fills up, re-encode the whole page.


I hacked together an implementation of a variant of Simple9, to see how 
it performs. Insertions are handled per the above scheme.


In a limited pg_trgm test case I've been using a lot for this, this 
reduces the index size about 20%, compared to varbyte encoding. It might 
be possible to squeeze it a bit more, I handcrafted the selectors in 
the encoding algorithm to suite our needs, but I don't actually have a 
good idea of how to choose them optimally. Also, the encoding can encode 
0 values, but we never need to do that, so you could take advantage of 
that to pack items tighter.


Compression and decompression speed seems to be about the same.

Patch attached if you want to play with it. WAL replay is still broken, 
and there are probably bugs.



Good idea. But:
1) We'll still need item indexes in the end of page for fast scan.


Sure.


2) Storage would be easily extendable to hold additional information as
well.
Better compression shouldn't block more serious improvements.


I'm not sure I agree with that. For all the cases where you don't care 
about additional information - which covers all existing users for 
example - reducing disk size is pretty important. How are you planning 
to store the additional information, and how does using another encoding 
gets in the way of that?


- Heikki


gin-packed-postinglists-simple9-1.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-12 Thread Alexander Korotkov
On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 12/09/2013 11:34 AM, Alexander Korotkov wrote:

 On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
 hlinnakan...@vmware.comwrote:

  Even if we use varbyte encoding, I wonder if it would be better to treat
 block + offset number as a single 48-bit integer, rather than encode them
 separately. That would allow the delta of two items on the same page to
 be
 stored as a single byte, rather than two bytes. Naturally it would be a
 loss on other values, but would be nice to see some kind of an analysis
 on
 that. I suspect it might make the code simpler, too.


 Yeah, I had that idea, but I thought it's not a better option. Will try to
 do some analysis.


 The more I think about that, the more convinced I am that it's a good
 idea. I don't think it will ever compress worse than the current approach
 of treating block and offset numbers separately, and, although I haven't
 actually tested it, I doubt it's any slower. About the same amount of
 arithmetic is required in both versions.

 Attached is a version that does that. Plus some other minor cleanup.

 (we should still investigate using a completely different algorithm,
 though)


I've thought about different algorithms little more. General problem I see
is online update. We need it while it is typically not covered by
researches at all. We already have to invent small index in the end of
page. Different encoding methods adds more challenges. In general, methods
can be classified in two groups:
1) Values aren't aligned by bytes (gamma-codes, PFOR etc.)
2) Multiple values are packed together in small group (simple-9, simple-18)
For the first group of methods when inserting in the middle of the page we
would have to do not byte-aligned shift of right part of values. I don't
know how expensive is this shift but I expect that it would be much slower
than memmove.
When values are packed into small groups, we have to either insert
inefficiently encoded value or re-encode whole right part of values.

The option I see is to encode bins between item indexes separately. This
still might be slower and require much more complicated maintenance. And
also this much complicates further work on storing additional information
in GIN.

Any other thoughts?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-10 Thread Alexander Korotkov
On Tue, Dec 10, 2013 at 12:26 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 12/09/2013 11:34 AM, Alexander Korotkov wrote:

 On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
 hlinnakan...@vmware.comwrote:

  Even if we use varbyte encoding, I wonder if it would be better to treat
 block + offset number as a single 48-bit integer, rather than encode them
 separately. That would allow the delta of two items on the same page to
 be
 stored as a single byte, rather than two bytes. Naturally it would be a
 loss on other values, but would be nice to see some kind of an analysis
 on
 that. I suspect it might make the code simpler, too.


 Yeah, I had that idea, but I thought it's not a better option. Will try to
 do some analysis.


 The more I think about that, the more convinced I am that it's a good
 idea. I don't think it will ever compress worse than the current approach
 of treating block and offset numbers separately, and, although I haven't
 actually tested it, I doubt it's any slower. About the same amount of
 arithmetic is required in both versions.

 Attached is a version that does that. Plus some other minor cleanup.

 (we should still investigate using a completely different algorithm,
 though)


Yes, when I though about that, I didn't realize that we can reserve less
than 16 bits for offset number.
I rerun my benchmark and got following results:

 event | period
---+-
 index_build   | 00:01:46.39056
 index_build_recovery  | 00:00:05
 index_update  | 00:06:01.557708
 index_update_recovery | 00:01:23
 search_new| 00:24:05.600366
 search_updated| 00:25:29.520642
(6 rows)

 label  | blocks_mark
+-
 search_new |   847509920
 search_updated |   883789826
(2 rows)

 label |   size
---+---
  new   | 364560384
 after_updates | 642736128
(2 rows)

Speed is same while index size is less. In previous format it was:

 label |   size
---+---
 new   | 419299328
 after_updates | 715915264
(2 rows)

Good optimization, thanks. I'll try another datasets but I expect similar
results.
However, patch didn't apply to head. Corrected version is attached.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-20.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-09 Thread Heikki Linnakangas

On 12/08/2013 09:56 PM, Alexander Korotkov wrote:

On Fri, Nov 29, 2013 at 11:17 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


I'll continue reviewing next week..


Got dragged into other things and didn't make any progress on this last 
week. I'm trying again now.



Good. Thanks for debug and fixing bugs.
Can I do anything for this patch now?


I wonder if we're leaving some money on the table, by using varbyte 
encoding. Googling around, there are many other compression methods out 
there for compressing integer deltas that compress better, and/or 
decompress faster.


Even if we use varbyte encoding, I wonder if it would be better to treat 
block + offset number as a single 48-bit integer, rather than encode 
them separately. That would allow the delta of two items on the same 
page to be stored as a single byte, rather than two bytes. Naturally it 
would be a loss on other values, but would be nice to see some kind of 
an analysis on that. I suspect it might make the code simpler, too.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-09 Thread Alexander Korotkov
On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 12/08/2013 09:56 PM, Alexander Korotkov wrote:

 On Fri, Nov 29, 2013 at 11:17 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  I'll continue reviewing next week..


 Got dragged into other things and didn't make any progress on this last
 week. I'm trying again now.


  Good. Thanks for debug and fixing bugs.
 Can I do anything for this patch now?


 I wonder if we're leaving some money on the table, by using varbyte
 encoding. Googling around, there are many other compression methods out
 there for compressing integer deltas that compress better, and/or
 decompress faster.


Ok, I'll try to google around. Probably, there are some better options.


 Even if we use varbyte encoding, I wonder if it would be better to treat
 block + offset number as a single 48-bit integer, rather than encode them
 separately. That would allow the delta of two items on the same page to be
 stored as a single byte, rather than two bytes. Naturally it would be a
 loss on other values, but would be nice to see some kind of an analysis on
 that. I suspect it might make the code simpler, too.


Yeah, I had that idea, but I thought it's not a better option. Will try to
do some analysis.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-09 Thread Heikki Linnakangas

On 12/09/2013 11:34 AM, Alexander Korotkov wrote:

On Mon, Dec 9, 2013 at 1:18 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:


Even if we use varbyte encoding, I wonder if it would be better to treat
block + offset number as a single 48-bit integer, rather than encode them
separately. That would allow the delta of two items on the same page to be
stored as a single byte, rather than two bytes. Naturally it would be a
loss on other values, but would be nice to see some kind of an analysis on
that. I suspect it might make the code simpler, too.


Yeah, I had that idea, but I thought it's not a better option. Will try to
do some analysis.


The more I think about that, the more convinced I am that it's a good 
idea. I don't think it will ever compress worse than the current 
approach of treating block and offset numbers separately, and, although 
I haven't actually tested it, I doubt it's any slower. About the same 
amount of arithmetic is required in both versions.


Attached is a version that does that. Plus some other minor cleanup.

(we should still investigate using a completely different algorithm, though)

- Heikki


gin-packed-postinglists-19.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-12-08 Thread Alexander Korotkov
On Fri, Nov 29, 2013 at 11:17 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 11/29/2013 11:41 AM, Heikki Linnakangas wrote:

 On 11/28/2013 09:19 AM, Alexander Korotkov wrote:

 On Wed, Nov 27, 2013 at 1:14 AM, Heikki Linnakangas
 hlinnakan...@vmware.com

 wrote:


  On 11/26/13 15:34, Alexander Korotkov wrote:

  What's your plans about GIN now? I tried to rebase packed posting lists
 with head. But I found that you've changed interface of placeToPage
 function. That's conflicts with packed posting lists, because
 dataPlaceToPageLeaf needs not only offset number to describe place to
 insert item pointer. Do you like to commit rework of handling GIN
 incomplete splits before?


 Yeah, I'm planning to get back to this patch after committing the
 incomplete splits patch. I think the refactoring of the WAL-logging
 that I
 did in that patch will simplify this patch, too. I'll take a look at
 Michael's latest comments on the incomplete splits patch tomorrow, so I
 should get back to this on Thursday or Friday.


 Should I try to rebase this patch now or you plan to do it yourself? Some
 changes like void *insertdata argument make me think you have some
 particular plan to rebase this patch, but I didn't get it exactly.


 Here's rebased version. I'll continue reviewing it now..


 Another update. Fixes a bunch of bugs. Mostly introduced by me, but a
 couple were present in your v16:

 * When allocating the entry-list buffer in a scan, it must be large
 enough for the max number of items that can fit on a compressed page,
 whether the current page is compressed or not. That's because the same
 buffer is reused on subsequent pages, which might be compressed.

 * When splitting a leaf page during index creation, missed the trick
 that's present in current master, to choose the split point so that left
 page is packed as full as possible. I put that back, it makes newly-built
 indexes somewhat smaller. (I wonder if we should leave some free space for
 future updates. But that's a separate patch, let's keep the current
 behavior in this patch)

 I'll continue reviewing next week..


Good. Thanks for debug and fixing bugs.
Can I do anything for this patch now?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-29 Thread Heikki Linnakangas

On 11/29/2013 11:41 AM, Heikki Linnakangas wrote:

On 11/28/2013 09:19 AM, Alexander Korotkov wrote:

On Wed, Nov 27, 2013 at 1:14 AM, Heikki Linnakangas
hlinnakan...@vmware.com

wrote:



On 11/26/13 15:34, Alexander Korotkov wrote:


What's your plans about GIN now? I tried to rebase packed posting lists
with head. But I found that you've changed interface of placeToPage
function. That's conflicts with packed posting lists, because
dataPlaceToPageLeaf needs not only offset number to describe place to
insert item pointer. Do you like to commit rework of handling GIN
incomplete splits before?


Yeah, I'm planning to get back to this patch after committing the
incomplete splits patch. I think the refactoring of the WAL-logging
that I
did in that patch will simplify this patch, too. I'll take a look at
Michael's latest comments on the incomplete splits patch tomorrow, so I
should get back to this on Thursday or Friday.


Should I try to rebase this patch now or you plan to do it yourself? Some
changes like void *insertdata argument make me think you have some
particular plan to rebase this patch, but I didn't get it exactly.


Here's rebased version. I'll continue reviewing it now..


Another update. Fixes a bunch of bugs. Mostly introduced by me, but a 
couple were present in your v16:


* When allocating the entry-list buffer in a scan, it must be large 
enough for the max number of items that can fit on a compressed page, 
whether the current page is compressed or not. That's because the same 
buffer is reused on subsequent pages, which might be compressed.


* When splitting a leaf page during index creation, missed the trick 
that's present in current master, to choose the split point so that left 
page is packed as full as possible. I put that back, it makes 
newly-built indexes somewhat smaller. (I wonder if we should leave some 
free space for future updates. But that's a separate patch, let's keep 
the current behavior in this patch)


I'll continue reviewing next week..

- Heikki


gin-packed-postinglists-18.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-27 Thread Alexander Korotkov
On Wed, Nov 27, 2013 at 1:14 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 11/26/13 15:34, Alexander Korotkov wrote:

 What's your plans about GIN now? I tried to rebase packed posting lists
 with head. But I found that you've changed interface of placeToPage
 function. That's conflicts with packed posting lists, because
 dataPlaceToPageLeaf needs not only offset number to describe place to
 insert item pointer. Do you like to commit rework of handling GIN
 incomplete splits before?


 Yeah, I'm planning to get back to this patch after committing the
 incomplete splits patch. I think the refactoring of the WAL-logging that I
 did in that patch will simplify this patch, too. I'll take a look at
 Michael's latest comments on the incomplete splits patch tomorrow, so I
 should get back to this on Thursday or Friday.


Should I try to rebase this patch now or you plan to do it yourself? Some
changes like void *insertdata argument make me think you have some
particular plan to rebase this patch, but I didn't get it exactly.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-26 Thread Alexander Korotkov
Hi!

On Wed, Nov 20, 2013 at 9:02 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 06.11.2013 17:36, Alvaro Herrera wrote:

 Just for my own illumination, can someone explain this bit?

 + If a posting list is too large to store in-line in a key entry, a
 posting tree
 + is created. A posting tree is a B-tree structure, where the ItemPointer
 is
 + used as the key. At the leaf-level, item pointers are stored
 compressed, in
 + varbyte encoding.

 I think the first ItemPointer mentioned (the key) refers to a TID
 pointing to the index, and item pointers stored compressed refers to
 the TIDs pointing to the heap (the data).  Is that correct?


 No, they both refer to TIDs pointing to the heap.


  I'm also interested in the FIXME explain varbyte encoding explanation
 currently missing, if somebody can write it down ...


 Alexander's latest version filled in that explanation (haven't read it
 myself yet)


off-list

What's your plans about GIN now? I tried to rebase packed posting lists
with head. But I found that you've changed interface of placeToPage
function. That's conflicts with packed posting lists, because
dataPlaceToPageLeaf needs not only offset number to describe place to
insert item pointer. Do you like to commit rework of handling GIN
incomplete splits before?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-26 Thread Alexander Korotkov
On Tue, Nov 26, 2013 at 5:34 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Wed, Nov 20, 2013 at 9:02 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 06.11.2013 17:36, Alvaro Herrera wrote:

 Just for my own illumination, can someone explain this bit?

 + If a posting list is too large to store in-line in a key entry, a
 posting tree
 + is created. A posting tree is a B-tree structure, where the
 ItemPointer is
 + used as the key. At the leaf-level, item pointers are stored
 compressed, in
 + varbyte encoding.

 I think the first ItemPointer mentioned (the key) refers to a TID
 pointing to the index, and item pointers stored compressed refers to
 the TIDs pointing to the heap (the data).  Is that correct?


 No, they both refer to TIDs pointing to the heap.


  I'm also interested in the FIXME explain varbyte encoding explanation
 currently missing, if somebody can write it down ...


 Alexander's latest version filled in that explanation (haven't read it
 myself yet)


 off-list


It appears to be not actually off-list, sorry :)


 What's your plans about GIN now? I tried to rebase packed posting lists
 with head. But I found that you've changed interface of placeToPage
 function. That's conflicts with packed posting lists, because
 dataPlaceToPageLeaf needs not only offset number to describe place to
 insert item pointer. Do you like to commit rework of handling GIN
 incomplete splits before?


--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-26 Thread Heikki Linnakangas

On 11/26/13 15:34, Alexander Korotkov wrote:

What's your plans about GIN now? I tried to rebase packed posting lists
with head. But I found that you've changed interface of placeToPage
function. That's conflicts with packed posting lists, because
dataPlaceToPageLeaf needs not only offset number to describe place to
insert item pointer. Do you like to commit rework of handling GIN
incomplete splits before?


Yeah, I'm planning to get back to this patch after committing the 
incomplete splits patch. I think the refactoring of the WAL-logging that 
I did in that patch will simplify this patch, too. I'll take a look at 
Michael's latest comments on the incomplete splits patch tomorrow, so I 
should get back to this on Thursday or Friday.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-26 Thread Peter Eisentraut
This patch needs to be rebased.
 
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-20 Thread Heikki Linnakangas

On 06.11.2013 17:36, Alvaro Herrera wrote:

Just for my own illumination, can someone explain this bit?

+ If a posting list is too large to store in-line in a key entry, a posting tree
+ is created. A posting tree is a B-tree structure, where the ItemPointer is
+ used as the key. At the leaf-level, item pointers are stored compressed, in
+ varbyte encoding.

I think the first ItemPointer mentioned (the key) refers to a TID
pointing to the index, and item pointers stored compressed refers to
the TIDs pointing to the heap (the data).  Is that correct?


No, they both refer to TIDs pointing to the heap.


I'm also interested in the FIXME explain varbyte encoding explanation
currently missing, if somebody can write it down ...


Alexander's latest version filled in that explanation (haven't read it 
myself yet)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-15 Thread Peter Eisentraut
On 11/14/13, 6:00 AM, Alexander Korotkov wrote:
 Sorry, I posted buggy version of patch. Attached version is correct.

This patch crashes the hstore the pg_trgm regression tests.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-15 Thread Alexander Korotkov
On Fri, Nov 15, 2013 at 8:58 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 11/14/13, 6:00 AM, Alexander Korotkov wrote:
  Sorry, I posted buggy version of patch. Attached version is correct.

 This patch crashes the hstore the pg_trgm regression tests.


What exact version did you try 14 or 16? If it was 16, could you please
post a stacktrace, because it doesn't crash for me.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-15 Thread Peter Eisentraut
On 11/15/13, 12:24 PM, Alexander Korotkov wrote:
 On Fri, Nov 15, 2013 at 8:58 PM, Peter Eisentraut pete...@gmx.net
 mailto:pete...@gmx.net wrote:
 
 On 11/14/13, 6:00 AM, Alexander Korotkov wrote:
  Sorry, I posted buggy version of patch. Attached version is correct.
 
 This patch crashes the hstore the pg_trgm regression tests.
 
 
 What exact version did you try 14 or 16? If it was 16, could you please
 post a stacktrace, because it doesn't crash for me.

The one that's the latest in the commitfest: 
http://www.postgresql.org/message-id/capphfdveq-jhe_2z9pvw22bp6h_o8xoaxcbjagf87gs4p4j...@mail.gmail.com

If you have a newer one, please add it there.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-15 Thread Alexander Korotkov
On Fri, Nov 15, 2013 at 9:56 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 11/15/13, 12:24 PM, Alexander Korotkov wrote:
  On Fri, Nov 15, 2013 at 8:58 PM, Peter Eisentraut pete...@gmx.net
  mailto:pete...@gmx.net wrote:
 
  On 11/14/13, 6:00 AM, Alexander Korotkov wrote:
   Sorry, I posted buggy version of patch. Attached version is
 correct.
 
  This patch crashes the hstore the pg_trgm regression tests.
 
 
  What exact version did you try 14 or 16? If it was 16, could you please
  post a stacktrace, because it doesn't crash for me.

 The one that's the latest in the commitfest:
 http://www.postgresql.org/message-id/capphfdveq-jhe_2z9pvw22bp6h_o8xoaxcbjagf87gs4p4j...@mail.gmail.com

 If you have a newer one, please add it there.


Ok, I actualized information in commitfest app.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that number
 of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting tree
 leaf pages are not vacuumed at all; that's clearly not right.


Fixed. Now separate function handles uncompressed posting lists and
compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing a
 page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


Now this situation is ereported before any change in page.

To handle splitting a non-compressed page, it seems that you're relying on
 the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


Good point. Yes, if we don't handle specially inserting item indexes, I see
no point to do special handling for single TID which is much smaller. In
the attached patch dataCompressLeafPage just reserves space for single TID.

Also, many changes in comments and README.

Unfortunally, I didn't understand what is FIXME about in
ginVacuumEntryPage. So, it's not fixed :)

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-13.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that number
 of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting tree
 leaf pages are not vacuumed at all; that's clearly not right.


 Fixed. Now separate function handles uncompressed posting lists and
 compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing a
 page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


 Now this situation is ereported before any change in page.

 To handle splitting a non-compressed page, it seems that you're relying on
 the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


 Good point. Yes, if we don't handle specially inserting item indexes, I
 see no point to do special handling for single TID which is much smaller.
 In the attached patch dataCompressLeafPage just reserves space for single
 TID.

 Also, many changes in comments and README.

 Unfortunally, I didn't understand what is FIXME about in
 ginVacuumEntryPage. So, it's not fixed :)


Sorry, I posted buggy version of patch. Attached version is correct.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-14.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-14 Thread Alexander Korotkov
On Thu, Nov 14, 2013 at 3:00 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Nov 14, 2013 at 2:17 PM, Alexander Korotkov 
 aekorot...@gmail.comwrote:

 On Tue, Nov 5, 2013 at 9:49 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 On 04.11.2013 23:44, Alexander Korotkov wrote:

 On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
 aekorot...@gmail.comwrote:

  Attached version of patch is debugged. I would like to note that
 number of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it.
 You
 can see that numbers are improved as the result of your refactoring.

   event | period
 ---+-
   index_build   | 00:01:45.416822
   index_build_recovery  | 00:00:06
   index_update  | 00:05:17.263606
   index_update_recovery | 00:01:22
   search_new| 00:24:07.706482
   search_updated| 00:26:25.364494
 (6 rows)

   label  | blocks_mark
 +-
   search_new |   847587636
   search_updated |   881210486
 (2 rows)

   label |   size
 ---+---
   new   | 419299328
   after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


 Attached version of patch has support of old page format. Patch still
 needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.


 Thanks, just glanced at this quickly.

 If I'm reading the patch correctly, old-style non-compressed posting
 tree leaf pages are not vacuumed at all; that's clearly not right.


 Fixed. Now separate function handles uncompressed posting lists and
 compress them if as least one TID is deleted.


 I argued earlier that we don't need to handle the case that compressing
 a page makes it larger, so that the existing items don't fit on the page
 anymore. I'm having some second thoughts on that; I didn't take into
 account the fact that the mini-index in the new page format takes up some
 space. I think it's still highly unlikely that there isn't enough space on
 a 8k page, but it's not totally crazy with a non-standard small page size.
 So at least that situation needs to be checked for with an ereport(),
 rather than Assert.


 Now this situation is ereported before any change in page.

 To handle splitting a non-compressed page, it seems that you're relying
 on the fact that before splitting, we try to insert, which compresses the
 page. The problem with that is that it's not correctly WAL-logged. The
 split record that follows includes a full copy of both page halves, but if
 the split fails for some reason, e.g you run out of disk space, there is no
 WAL record at all of the the compression. I'd suggest doing the compression
 in the insert phase on a temporary copy of the page, leaving the original
 page untouched if there isn't enough space for the insertion to complete.
 (You could argue that this case can't happen because the compression must
 always create enough space to insert one more item. maybe so, but at least
 there should be an explicit check for that.)


 Good point. Yes, if we don't handle specially inserting item indexes, I
 see no point to do special handling for single TID which is much smaller.
 In the attached patch dataCompressLeafPage just reserves space for single
 TID.

 Also, many changes in comments and README.

 Unfortunally, I didn't understand what is FIXME about in
 ginVacuumEntryPage. So, it's not fixed :)


 Sorry, I posted buggy version of patch. Attached version is correct.


Another bug fix by report from Rod Taylor.

--
With best regards,
Alexander Korotkov.




gin-packed-postinglists-16.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-06 Thread Heikki Linnakangas

On 04.11.2013 23:44, Alexander Korotkov wrote:

Attached version of patch has support of old page format. Patch still needs
more documentation and probably refactoring, but I believe idea is clear
and it can be reviewed. In the patch I have to revert change of null
category placement for compatibility with old posting list format.


I committed some of the refactorings and cleanup included in this patch. 
Attached is a rebased version that applies over current head. I hope I 
didn't joggle your elbow..


- Heikki



gin-packed-postinglists-12-rebased.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-06 Thread Alvaro Herrera
Heikki Linnakangas escribió:
 On 04.11.2013 23:44, Alexander Korotkov wrote:
 Attached version of patch has support of old page format. Patch still needs
 more documentation and probably refactoring, but I believe idea is clear
 and it can be reviewed. In the patch I have to revert change of null
 category placement for compatibility with old posting list format.
 
 I committed some of the refactorings and cleanup included in this
 patch. Attached is a rebased version that applies over current head.
 I hope I didn't joggle your elbow..

Just for my own illumination, can someone explain this bit?

+ If a posting list is too large to store in-line in a key entry, a posting tree
+ is created. A posting tree is a B-tree structure, where the ItemPointer is
+ used as the key. At the leaf-level, item pointers are stored compressed, in
+ varbyte encoding.

I think the first ItemPointer mentioned (the key) refers to a TID
pointing to the index, and item pointers stored compressed refers to
the TIDs pointing to the heap (the data).  Is that correct?


I'm also interested in the FIXME explain varbyte encoding explanation
currently missing, if somebody can write it down ...

Thanks,

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-05 Thread Heikki Linnakangas

On 04.11.2013 23:44, Alexander Korotkov wrote:

On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
aekorot...@gmail.comwrote:


Attached version of patch is debugged. I would like to note that number of
bugs was low and it wasn't very hard to debug. I've rerun tests on it. You
can see that numbers are improved as the result of your refactoring.

  event | period
---+-
  index_build   | 00:01:45.416822
  index_build_recovery  | 00:00:06
  index_update  | 00:05:17.263606
  index_update_recovery | 00:01:22
  search_new| 00:24:07.706482
  search_updated| 00:26:25.364494
(6 rows)

  label  | blocks_mark
+-
  search_new |   847587636
  search_updated |   881210486
(2 rows)

  label |   size
---+---
  new   | 419299328
  after_updates | 715915264
(2 rows)

Beside simple bugs, there was a subtle bug in incremental item indexes
update. I've added some more comments including ASCII picture about how
item indexes are shifted.

Now, I'm trying to implement support of old page format. Then we can
decide which approach to use.



Attached version of patch has support of old page format. Patch still needs
more documentation and probably refactoring, but I believe idea is clear
and it can be reviewed. In the patch I have to revert change of null
category placement for compatibility with old posting list format.


Thanks, just glanced at this quickly.

If I'm reading the patch correctly, old-style non-compressed posting 
tree leaf pages are not vacuumed at all; that's clearly not right.


I argued earlier that we don't need to handle the case that compressing 
a page makes it larger, so that the existing items don't fit on the page 
anymore. I'm having some second thoughts on that; I didn't take into 
account the fact that the mini-index in the new page format takes up 
some space. I think it's still highly unlikely that there isn't enough 
space on a 8k page, but it's not totally crazy with a non-standard small 
page size. So at least that situation needs to be checked for with an 
ereport(), rather than Assert.


To handle splitting a non-compressed page, it seems that you're relying 
on the fact that before splitting, we try to insert, which compresses 
the page. The problem with that is that it's not correctly WAL-logged. 
The split record that follows includes a full copy of both page halves, 
but if the split fails for some reason, e.g you run out of disk space, 
there is no WAL record at all of the the compression. I'd suggest doing 
the compression in the insert phase on a temporary copy of the page, 
leaving the original page untouched if there isn't enough space for the 
insertion to complete. (You could argue that this case can't happen 
because the compression must always create enough space to insert one 
more item. maybe so, but at least there should be an explicit check for 
that.)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-11-04 Thread Alexander Korotkov
On Mon, Oct 21, 2013 at 11:12 PM, Alexander Korotkov
aekorot...@gmail.comwrote:

 Attached version of patch is debugged. I would like to note that number of
 bugs was low and it wasn't very hard to debug. I've rerun tests on it. You
 can see that numbers are improved as the result of your refactoring.

  event | period
 ---+-
  index_build   | 00:01:45.416822
  index_build_recovery  | 00:00:06
  index_update  | 00:05:17.263606
  index_update_recovery | 00:01:22
  search_new| 00:24:07.706482
  search_updated| 00:26:25.364494
 (6 rows)

  label  | blocks_mark
 +-
  search_new |   847587636
  search_updated |   881210486
 (2 rows)

  label |   size
 ---+---
  new   | 419299328
  after_updates | 715915264
 (2 rows)

 Beside simple bugs, there was a subtle bug in incremental item indexes
 update. I've added some more comments including ASCII picture about how
 item indexes are shifted.

 Now, I'm trying to implement support of old page format. Then we can
 decide which approach to use.


Attached version of patch has support of old page format. Patch still needs
more documentation and probably refactoring, but I believe idea is clear
and it can be reviewed. In the patch I have to revert change of null
category placement for compatibility with old posting list format.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-12.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-21 Thread Alexander Korotkov
On Thu, Oct 10, 2013 at 3:57 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 09.10.2013 02:04, Tomas Vondra wrote:

 On 8.10.2013 21:59, Heikki Linnakangas wrote:

 On 08.10.2013 17:47, Alexander Korotkov wrote:

 Hi, Tomas!

 On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:

  I've attempted to rerun the benchmarks tests I did a few weeks ago, but
I got repeated crashes when loading the data (into a table with
 tsvector+gin index).

 Right before a crash, theres this message in the log:

  PANIC:  not enough space in leaf page!


 Thanks for testing. Heikki's version of patch don't works for me too on
 even much more simplier examples. I can try to get it working if he
 answer
 my question about GinDataLeafPageGetPostingList* macros.


 The new macros in that patch version were quite botched. Here's a new
 attempt.


 Nope, still the same errors :-(

 PANIC:  not enough space in leaf page!
 LOG:  server process (PID 29722) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages


 I've continued hacking away at the patch, here's yet another version. I've
 done a lot of cleanup and refactoring to make the code more readable (I
 hope). I'm not sure what caused the panic you saw, but it's probably fixed
 now.  Let me know if not.

 Some notable changes since my last patch version:

 * I changed the ginbtree interface so that isEnoughSpace method is no
 more. Instead, placeToPage returns false without doing anything if the item
 doesn't fit. It was slightly simpler this way when working with the
 compressed posting lists, as placeToPage had to calculate tuple sizes
 anyway to decide how many items fit on the page.

 * I rewrote incrUpdateItemIndexes(). It now operates in two stages: 1.
 adjust the pageOffsets and 2. split the bin. I found it more readable this
 way.

 * I changed the WAL format of insert records so that there is a separate
 struct for data-leaf, data-internal, and entry pages. The information
 recorded for each of those was so different that it was confusing to cram
 them all into the same struct.


 I'm going to step back a bit from the details of the patch now. I think
 there's enough work for you, Alexander, until the next commitfest:

 * Try to make the code also work with the old page format, so that you
 don't need to REINDEX after pg_upgrade.

 * Documentation. The new compressed posting list format needs to be
 explained somewhere. At the top of ginpostinglist.c would be a good place.
 The README should be improved too.

 * Fix whatever I broke (sorry)

 Are we committed to go ahead with this patch in 9.4 timeframe, in one form
 or another? If we are, then I'd like to start committing refactoring
 patches that just move code around in preparation for the Patch, to make
 the review of the core of the patch easier. For example, merging the
 isEnoughSpace and placeToPage methods in the ginbtree interface. Without
 the patch, it's unnecessary code churn - it won't do any harm but it won't
 do any good either. I'm pretty confident that this patch will land in the
 9.4 timeframe, so barring objections I'll start committing such
 refactorings.


Attached version of patch is debugged. I would like to note that number of
bugs was low and it wasn't very hard to debug. I've rerun tests on it. You
can see that numbers are improved as the result of your refactoring.

 event | period
---+-
 index_build   | 00:01:45.416822
 index_build_recovery  | 00:00:06
 index_update  | 00:05:17.263606
 index_update_recovery | 00:01:22
 search_new| 00:24:07.706482
 search_updated| 00:26:25.364494
(6 rows)

 label  | blocks_mark
+-
 search_new |   847587636
 search_updated |   881210486
(2 rows)

 label |   size
---+---
 new   | 419299328
 after_updates | 715915264
(2 rows)

Beside simple bugs, there was a subtle bug in incremental item indexes
update. I've added some more comments including ASCII picture about how
item indexes are shifted.

Now, I'm trying to implement support of old page format. Then we can decide
which approach to use.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-11.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-20 Thread Bruce Momjian
On Sat, Oct 19, 2013 at 05:11:55PM -0400, Bruce Momjian wrote:
 I am in Moscow with Alexander and we were discussing GIN pg_upgrade
 issues.  One option we have already discussed was to take the old GIN
 index code and put it in a separate directory, and call the new GIN
 index something different, but that got hung up on how users were going
 to create indexes of the new type.
 
 One nice trick would be for the index creation code to check the global
 variable IsBinaryUpgrade that pg_upgrade sets.  If CREATE INDEX ... GIN
 finds IsBinaryUpgrade set, it should create an (empty) index of type
 gin-v1/old.  If it is not set, it should create a new gin index.  This
 will allow pg_upgrade to work, and allow REINDEX to create a new-type
 GIN index from an old one.
 
 We would need to append -v1 to the old index directory, system
 catalog, and function names.  We could then remove the old GIN index
 code in some later major release, and pg_upgrade will then mark the
 indexes as invalid and create a REINDEX script.
 
 This allows users to reindex their GIN indexes over time, but it doesn't
 lock us into keeping the gin-v1 code around forever.

Correction --- the above is not going to work because the backend isn't
going to know, during a binary upgrade, if CREATE INDEX ... GIN is
coming from a 9.3 or 9.4 database.  We are going to need pg_dump code to
do the mapping, and also pg_dump code to map gin_v1 back to gin once we
remove the gin_v1 code.  We will also need index creation code to map
gin_v1 to gin to properly handle REINDEX and CLUSTER, but only if not in
binary upgrade mode.

If it would help, I would be happy to write a simple patch for the above
items once I return from Europe in November.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-19 Thread Bruce Momjian
On Thu, Oct  3, 2013 at 05:24:49PM -0400, Bruce Momjian wrote:
 On Thu, Oct  3, 2013 at 02:48:20PM -0400, Robert Haas wrote:
  On Thu, Oct 3, 2013 at 2:43 PM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
   It seems we've all but decided that we'll require reindexing GIN indexes 
   in
   9.4.
  
  I thought the consensus in Ottawa was strongly against that.  I'm not
  aware that anyone has subsequently changed their position on the
  topic.  Bruce is right to point out that we've done such things before
  and can therefore do it again, but just because we have the technical
  means to do it doesn't make it good policy.
  
  That having been said, if we do decide to break it...
 
 Agreed.  I was stating only that this is easy for pg_upgrade.  One cool
 thing is that the upgrades completes, and the indexes are there, but
 just marked as invalid until the REINDEX.
 
 One other point Alexander made is that the new GIN indexes will be
 smaller so most people would want the new format in the new cluster
 anyway.

I am in Moscow with Alexander and we were discussing GIN pg_upgrade
issues.  One option we have already discussed was to take the old GIN
index code and put it in a separate directory, and call the new GIN
index something different, but that got hung up on how users were going
to create indexes of the new type.

One nice trick would be for the index creation code to check the global
variable IsBinaryUpgrade that pg_upgrade sets.  If CREATE INDEX ... GIN
finds IsBinaryUpgrade set, it should create an (empty) index of type
gin-v1/old.  If it is not set, it should create a new gin index.  This
will allow pg_upgrade to work, and allow REINDEX to create a new-type
GIN index from an old one.

We would need to append -v1 to the old index directory, system
catalog, and function names.  We could then remove the old GIN index
code in some later major release, and pg_upgrade will then mark the
indexes as invalid and create a REINDEX script.

This allows users to reindex their GIN indexes over time, but it doesn't
lock us into keeping the gin-v1 code around forever.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-12 Thread Alexander Korotkov
On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz wrote:

 On 10.10.2013 13:57, Heikki Linnakangas wrote:
  On 09.10.2013 02:04, Tomas Vondra wrote:
  On 8.10.2013 21:59, Heikki Linnakangas wrote:
  On 08.10.2013 17:47, Alexander Korotkov wrote:
  Hi, Tomas!
 
  On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:
 
  I've attempted to rerun the benchmarks tests I did a few weeks ago,
  but
 I got repeated crashes when loading the data (into a table with
  tsvector+gin index).
 
  Right before a crash, theres this message in the log:
 
   PANIC:  not enough space in leaf page!
 
 
  Thanks for testing. Heikki's version of patch don't works for me too
 on
  even much more simplier examples. I can try to get it working if he
  answer
  my question about GinDataLeafPageGetPostingList* macros.
 
  The new macros in that patch version were quite botched. Here's a new
  attempt.
 
  Nope, still the same errors :-(
 
  PANIC:  not enough space in leaf page!
  LOG:  server process (PID 29722) was terminated by signal 6: Aborted
  DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages
 
  I've continued hacking away at the patch, here's yet another version.
  I've done a lot of cleanup and refactoring to make the code more
  readable (I hope). I'm not sure what caused the panic you saw, but it's
  probably fixed now.  Let me know if not.

 Yup, this version fixed the issues. I haven't been able to do any
 benchmarks yet, all I have is some basic stats

|   HEAD   |  patched
 ==
 load duration  |  1084 s  |   1086 s
 subject index  |   96 MB  | 96 MB
 body index | 2349 MB  |   2051 MB

 So there's virtually no difference in speed (which is expected, AFAIK)
 and the large index on full message bodies is significantly smaller.


Yes, it should be no significant difference in speed. But difference in
index sizes seems to be too small. Could you share database dump somewhere?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-12 Thread Tomas Vondra
On 12.10.2013 12:11, Alexander Korotkov wrote:
 On Sat, Oct 12, 2013 at 1:55 AM, Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz wrote:
 
 Yup, this version fixed the issues. I haven't been able to do any
 benchmarks yet, all I have is some basic stats
 
|   HEAD   |  patched
 ==
 load duration  |  1084 s  |   1086 s
 subject index  |   96 MB  | 96 MB
 body index | 2349 MB  |   2051 MB
 
 So there's virtually no difference in speed (which is expected, AFAIK)
 and the large index on full message bodies is significantly smaller.
 
 
 Yes, it should be no significant difference in speed. But difference in
 index sizes seems to be too small. Could you share database dump somewhere?

Turns out that if I do VACUUM FULL after loading the data (a sequence of
INSERT commands), the index sizes drop significantly.

   |   HEAD   |  patched
==
subject index  |   42 MB  |15 MB
body index |  624 MB  |   328 MB

So there's a significant improvement, as expected. I'm wondering if the
bloat is expected too? Is that the consequence of incremental index
updates vs. rebuilding the whole index at once during VACUUM FULL?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-11 Thread Tomas Vondra
On 10.10.2013 13:57, Heikki Linnakangas wrote:
 On 09.10.2013 02:04, Tomas Vondra wrote:
 On 8.10.2013 21:59, Heikki Linnakangas wrote:
 On 08.10.2013 17:47, Alexander Korotkov wrote:
 Hi, Tomas!

 On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:

 I've attempted to rerun the benchmarks tests I did a few weeks ago,
 but
I got repeated crashes when loading the data (into a table with
 tsvector+gin index).

 Right before a crash, theres this message in the log:

  PANIC:  not enough space in leaf page!


 Thanks for testing. Heikki's version of patch don't works for me too on
 even much more simplier examples. I can try to get it working if he
 answer
 my question about GinDataLeafPageGetPostingList* macros.

 The new macros in that patch version were quite botched. Here's a new
 attempt.

 Nope, still the same errors :-(

 PANIC:  not enough space in leaf page!
 LOG:  server process (PID 29722) was terminated by signal 6: Aborted
 DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages
 
 I've continued hacking away at the patch, here's yet another version.
 I've done a lot of cleanup and refactoring to make the code more
 readable (I hope). I'm not sure what caused the panic you saw, but it's
 probably fixed now.  Let me know if not.

Yup, this version fixed the issues. I haven't been able to do any
benchmarks yet, all I have is some basic stats

   |   HEAD   |  patched
==
load duration  |  1084 s  |   1086 s
subject index  |   96 MB  | 96 MB
body index | 2349 MB  |   2051 MB

So there's virtually no difference in speed (which is expected, AFAIK)
and the large index on full message bodies is significantly smaller.

regards
Tomas




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-10 Thread Heikki Linnakangas

On 09.10.2013 02:04, Tomas Vondra wrote:

On 8.10.2013 21:59, Heikki Linnakangas wrote:

On 08.10.2013 17:47, Alexander Korotkov wrote:

Hi, Tomas!

On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz   wrote:


I've attempted to rerun the benchmarks tests I did a few weeks ago, but
   I got repeated crashes when loading the data (into a table with
tsvector+gin index).

Right before a crash, theres this message in the log:

 PANIC:  not enough space in leaf page!



Thanks for testing. Heikki's version of patch don't works for me too on
even much more simplier examples. I can try to get it working if he
answer
my question about GinDataLeafPageGetPostingList* macros.


The new macros in that patch version were quite botched. Here's a new
attempt.


Nope, still the same errors :-(

PANIC:  not enough space in leaf page!
LOG:  server process (PID 29722) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages


I've continued hacking away at the patch, here's yet another version. 
I've done a lot of cleanup and refactoring to make the code more 
readable (I hope). I'm not sure what caused the panic you saw, but it's 
probably fixed now.  Let me know if not.


Some notable changes since my last patch version:

* I changed the ginbtree interface so that isEnoughSpace method is no 
more. Instead, placeToPage returns false without doing anything if the 
item doesn't fit. It was slightly simpler this way when working with the 
compressed posting lists, as placeToPage had to calculate tuple sizes 
anyway to decide how many items fit on the page.


* I rewrote incrUpdateItemIndexes(). It now operates in two stages: 1. 
adjust the pageOffsets and 2. split the bin. I found it more readable 
this way.


* I changed the WAL format of insert records so that there is a separate 
struct for data-leaf, data-internal, and entry pages. The information 
recorded for each of those was so different that it was confusing to 
cram them all into the same struct.



I'm going to step back a bit from the details of the patch now. I think 
there's enough work for you, Alexander, until the next commitfest:


* Try to make the code also work with the old page format, so that you 
don't need to REINDEX after pg_upgrade.


* Documentation. The new compressed posting list format needs to be 
explained somewhere. At the top of ginpostinglist.c would be a good 
place. The README should be improved too.


* Fix whatever I broke (sorry)

Are we committed to go ahead with this patch in 9.4 timeframe, in one 
form or another? If we are, then I'd like to start committing 
refactoring patches that just move code around in preparation for the 
Patch, to make the review of the core of the patch easier. For example, 
merging the isEnoughSpace and placeToPage methods in the ginbtree 
interface. Without the patch, it's unnecessary code churn - it won't do 
any harm but it won't do any good either. I'm pretty confident that this 
patch will land in the 9.4 timeframe, so barring objections I'll start 
committing such refactorings.


- Heikki


gin-packed-postinglists-10-heikki.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-10 Thread Peter Eisentraut
You linked to this email from the commitfest entry, but there is no
patch here.  You probably meant a different email.  Check please.


On Tue, 2013-10-08 at 21:48 +0300, Heikki Linnakangas wrote:
 On 04.10.2013 14:13, Alexander Korotkov wrote:
  On Fri, Oct 4, 2013 at 12:28 PM, Heikki Linnakangashlinnakan...@vmware.com
  wrote:
 
  In the attached patch, I in fact already did that for data leaf pages, but
  didn't change the format of non-leaf pages yet. If we want to support
  pg_upgrade, we might want to refrain from changing the non-leaf format.
 
  In GinDataLeafPageGetPostingList* you use sizeof(ItemPointerData) without
  MAXALIGN. Is it an error or you especially use 2 extra bytes on leaf page?
 
 I didn't even think of it. Now that I do think of it, I don't see a 
 reason to use MAXALIGN there. PostingItems only require 2-byte 
 alignment. It's a bit fragile and underdocumented though. It probably 
 would be good to have a struct to represent that layout. Something like:
 
 struct
 {
ItemPointerData rightBound;
PostingItem postingItems[1]; /* variable length array */
 } PostingItemPageContent;
 
 And use that struct in the macros.
 
 Then again, we do currently use MAXALIGN there, so if we want to avoid 
 changing the on-disk format, we have to keep it...
 
 - Heikki
 
 





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-08 Thread Alexander Korotkov
Hi, Tomas!

On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondra t...@fuzzy.cz wrote:

 I've attempted to rerun the benchmarks tests I did a few weeks ago, but
  I got repeated crashes when loading the data (into a table with
 tsvector+gin index).

 Right before a crash, theres this message in the log:

PANIC:  not enough space in leaf page!


Thanks for testing. Heikki's version of patch don't works for me too on
even much more simplier examples. I can try to get it working if he answer
my question about GinDataLeafPageGetPostingList* macros.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-08 Thread Heikki Linnakangas

On 04.10.2013 14:13, Alexander Korotkov wrote:

On Fri, Oct 4, 2013 at 12:28 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



In the attached patch, I in fact already did that for data leaf pages, but
didn't change the format of non-leaf pages yet. If we want to support
pg_upgrade, we might want to refrain from changing the non-leaf format.


In GinDataLeafPageGetPostingList* you use sizeof(ItemPointerData) without
MAXALIGN. Is it an error or you especially use 2 extra bytes on leaf page?


I didn't even think of it. Now that I do think of it, I don't see a 
reason to use MAXALIGN there. PostingItems only require 2-byte 
alignment. It's a bit fragile and underdocumented though. It probably 
would be good to have a struct to represent that layout. Something like:


struct
{
  ItemPointerData rightBound;
  PostingItem postingItems[1]; /* variable length array */
} PostingItemPageContent;

And use that struct in the macros.

Then again, we do currently use MAXALIGN there, so if we want to avoid 
changing the on-disk format, we have to keep it...


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-08 Thread Heikki Linnakangas

On 08.10.2013 17:47, Alexander Korotkov wrote:

Hi, Tomas!

On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz  wrote:


I've attempted to rerun the benchmarks tests I did a few weeks ago, but
  I got repeated crashes when loading the data (into a table with
tsvector+gin index).

Right before a crash, theres this message in the log:

PANIC:  not enough space in leaf page!



Thanks for testing. Heikki's version of patch don't works for me too on
even much more simplier examples. I can try to get it working if he answer
my question about GinDataLeafPageGetPostingList* macros.


The new macros in that patch version were quite botched. Here's a new 
attempt.


- Heikki


gin-packed-postinglists-9-heikki.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-08 Thread Tomas Vondra
On 8.10.2013 21:59, Heikki Linnakangas wrote:
 On 08.10.2013 17:47, Alexander Korotkov wrote:
 Hi, Tomas!

 On Sun, Oct 6, 2013 at 3:58 AM, Tomas Vondrat...@fuzzy.cz  wrote:

 I've attempted to rerun the benchmarks tests I did a few weeks ago, but
   I got repeated crashes when loading the data (into a table with
 tsvector+gin index).

 Right before a crash, theres this message in the log:

 PANIC:  not enough space in leaf page!


 Thanks for testing. Heikki's version of patch don't works for me too on
 even much more simplier examples. I can try to get it working if he
 answer
 my question about GinDataLeafPageGetPostingList* macros.
 
 The new macros in that patch version were quite botched. Here's a new
 attempt.

Nope, still the same errors :-(

PANIC:  not enough space in leaf page!
LOG:  server process (PID 29722) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: autovacuum: ANALYZE public.messages

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-04 Thread Heikki Linnakangas
Aside from the pg_upgrade discussion, here's an updated version of the 
patch, rebased over master. It also contains some other misc refactoring 
I've done while reading through the patch. I haven't tested this much, I 
may well have also broken something, but I wanted to post an update 
before the weekend.


Thinking about the page format, I think we should start using the 
pd_lower/upper pointers in the data page format. For a non-leaf page, 
pd_upper would always point to the beginning of the special area, and 
pd_lower would indicate the end of PostingItems. For a leaf page, 
pd_lower would indicate the end of the compressed posting list, and 
pd_upper would point to the leaf-index at the end of the page. That 
matches the standard page layout in the sense that the space between 
pd_lower and pd_upper is free, although the data stored in the non-free 
areas would be quite different. That would allow us to mark full-page 
images with buffer_std, allowing the gap to be left out. I think that 
would be a more natural way to keep track of the used/unused space on 
the page, anyway, compared to the current maxoff/endoffset field in the 
special area.


In the attached patch, I in fact already did that for data leaf pages, 
but didn't change the format of non-leaf pages yet. If we want to 
support pg_upgrade, we might want to refrain from changing the non-leaf 
format.


- Heikki


gin-packed-postinglists-8-heikki.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-04 Thread Alexander Korotkov
On Fri, Oct 4, 2013 at 12:28 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 Aside from the pg_upgrade discussion, here's an updated version of the
 patch, rebased over master. It also contains some other misc refactoring
 I've done while reading through the patch. I haven't tested this much, I
 may well have also broken something, but I wanted to post an update before
 the weekend.

 Thinking about the page format, I think we should start using the
 pd_lower/upper pointers in the data page format. For a non-leaf page,
 pd_upper would always point to the beginning of the special area, and
 pd_lower would indicate the end of PostingItems. For a leaf page, pd_lower
 would indicate the end of the compressed posting list, and pd_upper would
 point to the leaf-index at the end of the page. That matches the standard
 page layout in the sense that the space between pd_lower and pd_upper is
 free, although the data stored in the non-free areas would be quite
 different. That would allow us to mark full-page images with buffer_std,
 allowing the gap to be left out. I think that would be a more natural way
 to keep track of the used/unused space on the page, anyway, compared to the
 current maxoff/endoffset field in the special area.

 In the attached patch, I in fact already did that for data leaf pages, but
 didn't change the format of non-leaf pages yet. If we want to support
 pg_upgrade, we might want to refrain from changing the non-leaf format.


In GinDataLeafPageGetPostingList* you use sizeof(ItemPointerData) without
MAXALIGN. Is it an error or you especially use 2 extra bytes on leaf page?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Heikki Linnakangas

On 23.09.2013 18:35, Bruce Momjian wrote:

On Sun, Sep 15, 2013 at 01:14:45PM +0400, Alexander Korotkov wrote:

On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangashlinnakan...@vmware.com
wrote:

 There's a few open questions:

 1. How are we going to handle pg_upgrade? It would be nice to be able to
 read the old page format, or convert on-the-fly. OTOH, if it gets too
 complicated, might not be worth it. The indexes are much smaller with the
 patch, so anyone using GIN probably wants to rebuild them anyway, sooner or
 later. Still, I'd like to give it a shot.


We have broken pg_upgrade index compatibility in the past.
Specifically, hash and GIN index binary format changed from PG 8.3 to
8.4.  I handled it by invalidating the indexes and providing a
post-upgrade script to REINDEX all the changed indexes.  The user
message is:

   Your installation contains hash and/or GIN indexes.  These indexes 
have
   different internal formats between your old and new clusters, so they
   must be reindexed with the REINDEX command.  The file:

   ...

   when executed by psql by the database superuser will recreate all 
invalid
indexes; until then, none of these indexes will be used.

It would be very easy to do this from a pg_upgrade perspective.
However, I know there has been complaints from others about making
pg_upgrade more restrictive.

In this specific case, even if you write code to read the old file
format, we might want to create the REINDEX script to allow _optional_
reindexing to shrink the index files.

If we do require the REINDEX, --check will clearly warn the user that
this will be required.


It seems we've all but decided that we'll require reindexing GIN indexes 
in 9.4. Let's take the opportunity to change some other annoyances with 
the current GIN on-disk format:


1. There's no explicit page id field in the opaque struct, like there 
is in other index types. This is for the benefit of debugging tools like 
pg_filedump. We've managed to tell GIN pages apart from other index 
types by the fact that the special size of GIN pages is 8 and it's not 
using all the high-order bits in the last byte on the page. But an 
explicit page id field would be nice, so let's add that.


2. I'd like to change the way incomplete splits are handled. 
Currently, WAL recovery keeps track of incomplete splits, and fixes any 
that remain at the end of recovery. That concept is slightly broken; 
it's not guaranteed that after you've split a leaf page, for example, 
you will succeed in inserting the downlink to its parent. You might e.g 
run out of disk space. To fix that, I'd like to add a flag to the page 
header to indicate if the split has been completed, ie. if the page's 
downlink has been inserted to the parent, and fix them lazily on the 
next insert. I did a similar change to GiST back in 9.1. (Strictly 
speaking this doesn't require changing the on-disk format, though.)


3. I noticed that the GIN b-trees, the main key entry tree and the 
posting trees, use a slightly different arrangement of the downlink than 
our regular nbtree code does. In nbtree, the downlink for a page is the 
*low* key of that page, ie. if the downlink is 10, all the items on that 
child page must be = 10. But in GIN, we store the *high* key in the 
downlink, ie. all the items on the child page must be = 10. That makes 
inserting new downlinks at a page split slightly more complicated. For 
example, when splitting a page containing keys between 1-10 into 1-5 and 
5-10, you need to insert a new downlink with key 10 for the new right 
page, and also update the existing downlink to 5. The nbtree code 
doesn't require updating existing entries.


Anything else?

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Robert Haas
On Thu, Oct 3, 2013 at 2:43 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 It seems we've all but decided that we'll require reindexing GIN indexes in
 9.4.

I thought the consensus in Ottawa was strongly against that.  I'm not
aware that anyone has subsequently changed their position on the
topic.  Bruce is right to point out that we've done such things before
and can therefore do it again, but just because we have the technical
means to do it doesn't make it good policy.

That having been said, if we do decide to break it...

 Let's take the opportunity to change some other annoyances with the
 current GIN on-disk format:

...then fixing as much as possible in one go-round is clearly a good plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Alexander Korotkov
On Thu, Oct 3, 2013 at 10:48 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Oct 3, 2013 at 2:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  It seems we've all but decided that we'll require reindexing GIN indexes
 in
  9.4.

 I thought the consensus in Ottawa was strongly against that.  I'm not
 aware that anyone has subsequently changed their position on the
 topic.  Bruce is right to point out that we've done such things before
 and can therefore do it again, but just because we have the technical
 means to do it doesn't make it good policy.

 That having been said, if we do decide to break it...

  Let's take the opportunity to change some other annoyances with the
  current GIN on-disk format:

 ...then fixing as much as possible in one go-round is clearly a good plan.


Let's see what options we have at all. I see following:
1) Drop support old GIN on-disk format. But users will have to reindex
after pg_upgrade.
2) Insert kluges into GIN to support both old and new formats. So, kluges
are kluges :) I don't see elegant way to do it for now, because formats are
very different.
3) Upgrade GIN on-disk format in pg_upgrade. However, it would be rewriting
almost whole index. Is it much better than just reindex?
4) Fork GIN2, leave GIN as is. It would lead to much of duplicated code.
Any other options?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Heikki Linnakangas

On 03.10.2013 23:37, Alexander Korotkov wrote:

2) Insert kluges into GIN to support both old and new formats. So, kluges
are kluges :) I don't see elegant way to do it for now, because formats are
very different.


Hmm. All you need is some code to read the old format, and a function to 
convert a page to new format before updating. It doesn't seem *that* 
kludgey. It's a fair amount of work, for sure, but not insurmountable.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Alexander Korotkov
On Fri, Oct 4, 2013 at 12:41 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 03.10.2013 23:37, Alexander Korotkov wrote:

 2) Insert kluges into GIN to support both old and new formats. So, kluges
 are kluges :) I don't see elegant way to do it for now, because formats
 are
 very different.


 Hmm. All you need is some code to read the old format, and a function to
 convert a page to new format before updating. It doesn't seem *that*
 kludgey. It's a fair amount of work, for sure, but not insurmountable.


My notice was not as much about amount of work as about result.
ItemPointers compression reduce occupied space in all normal cases. It's
not very realistic, but it could increase space in worst case. That would
lead to page split after conversion. Are we going to support such case?

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Bruce Momjian
On Thu, Oct  3, 2013 at 02:48:20PM -0400, Robert Haas wrote:
 On Thu, Oct 3, 2013 at 2:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  It seems we've all but decided that we'll require reindexing GIN indexes in
  9.4.
 
 I thought the consensus in Ottawa was strongly against that.  I'm not
 aware that anyone has subsequently changed their position on the
 topic.  Bruce is right to point out that we've done such things before
 and can therefore do it again, but just because we have the technical
 means to do it doesn't make it good policy.
 
 That having been said, if we do decide to break it...

Agreed.  I was stating only that this is easy for pg_upgrade.  One cool
thing is that the upgrades completes, and the indexes are there, but
just marked as invalid until the REINDEX.

One other point Alexander made is that the new GIN indexes will be
smaller so most people would want the new format in the new cluster
anyway.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Alvaro Herrera
Bruce Momjian escribió:

 Agreed.  I was stating only that this is easy for pg_upgrade.  One cool
 thing is that the upgrades completes, and the indexes are there, but
 just marked as invalid until the REINDEX.
 
 One other point Alexander made is that the new GIN indexes will be
 smaller so most people would want the new format in the new cluster
 anyway.

But they're nonfunctional until after the reindex, which is bad for
people who want a quick upgrade and return to operational mode
immediately.  If you could just keep the old indexes around, in working
state, until they are REINDEX CONCURRENTLY'ed, that would be more
practical than just marking them invalid.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Heikki Linnakangas

On 03.10.2013 23:54, Alexander Korotkov wrote:

ItemPointers compression reduce occupied space in all normal cases. It's
not very realistic, but it could increase space in worst case. That would
lead to page split after conversion. Are we going to support such case?


Hmm, that's probably rare enough that the number of such indexes in the 
real world where that could happen is exactly 0. A compressed item 
requires 7 bytes in the worst case; that is an offset  127, and 
distance to previous item  2^(4*7) = 268435456 blocks. With the default 
block size, that requires an index larger than 2TB. And that's just for 
one such item to appear - to actually cause a page to overflow, a page 
would need to be full of other items widely apart each other to take up 
6 bytes each.


So I think if you can make the conversion work with the assumption that 
the compressed format always fits in the old space, and throw an error 
if it doesn't, that's good enough. (That's for the posting trees - the 
posting lists attached to entry tuples is a different story.)


Besides, if you convert the page when you insert to it, you might need 
to split it anyway. So it might not be very difficult to split if required.


IMHO the main argument for not bothering with pg_upgrade is that the 
gain from the patch is so great that you'll want to REINDEX after the 
upgrade anyway, to shrink the index. I really don't have an opinion on 
whether we should attempt reading the old format. On one hand, it would 
be really nice to not have that caveat when you pg_upgrade (oh, you have 
GIN indexes, you have to reindex..). On the other hand, supporting the 
old format is a fair amount of extra code to maintain.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Alexander Korotkov
On Fri, Oct 4, 2013 at 12:37 AM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Thu, Oct 3, 2013 at 10:48 PM, Robert Haas robertmh...@gmail.comwrote:

 On Thu, Oct 3, 2013 at 2:43 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
  It seems we've all but decided that we'll require reindexing GIN
 indexes in
  9.4.

 I thought the consensus in Ottawa was strongly against that.  I'm not
 aware that anyone has subsequently changed their position on the
 topic.  Bruce is right to point out that we've done such things before
 and can therefore do it again, but just because we have the technical
 means to do it doesn't make it good policy.

 That having been said, if we do decide to break it...

  Let's take the opportunity to change some other annoyances with the
  current GIN on-disk format:

 ...then fixing as much as possible in one go-round is clearly a good plan.


 Let's see what options we have at all. I see following:
 1) Drop support old GIN on-disk format. But users will have to reindex
 after pg_upgrade.
 2) Insert kluges into GIN to support both old and new formats. So, kluges
 are kluges :) I don't see elegant way to do it for now, because formats are
 very different.
 3) Upgrade GIN on-disk format in pg_upgrade. However, it would be
 rewriting almost whole index. Is it much better than just reindex?
 4) Fork GIN2, leave GIN as is. It would lead to much of duplicated code.
 Any other options?


I came to idea that I like option #4 more than option #2.
If we try to make new GIN work with old page formats we have to maintain 3
use cases:
1) old GIN with old page format (because of old releases)
2) new GIN with old page format
3) new GIN with new page format

If we create GIN2 we maintain only 2 use cases:
1) old GIN with old page format
2) new GIN with new page format
The code of old GIN would be additional code in 9.4, but not additional
code we maintain. Because we anyway maintain exactly same in old releases.

The problem I see is how to migrate users to GIN2. We can't expect they
read release notes, create GIN2 indexes and drop GIN1 indexes. A lot of
users will still use GIN1, because of they don't care :)
Ideally any new GIN index should be GIN2 and reindex turns GIN1 into GIN2.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] GIN improvements part 1: additional information

2013-10-03 Thread Bruce Momjian
On Fri, Oct  4, 2013 at 02:23:33AM +0400, Alexander Korotkov wrote:
 I came to idea that I like option #4 more than option #2.
 If we try to make new GIN work with old page formats we have to maintain 3 use
 cases:
 1) old GIN with old page format (because of old releases)
 2) new GIN with old page format
 3) new GIN with new page format
 
 If we create GIN2 we maintain only 2 use cases:
 1) old GIN with old page format
 2) new GIN with new page format
 The code of old GIN would be additional code in 9.4, but not additional code 
 we
 maintain. Because we anyway maintain exactly same in old releases.
 
 The problem I see is how to migrate users to GIN2. We can't expect they read
 release notes, create GIN2 indexes and drop GIN1 indexes. A lot of users will
 still use GIN1, because of they don't care :)
 Ideally any new GIN index should be GIN2 and reindex turns GIN1 into GIN2.

I am not sure I like the complexity of a GIN2, but we should give this
problem some serious thought as it will affect how we deal with other
on-disk index changes in the future.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-09-26 Thread Alexander Korotkov
On Wed, Sep 25, 2013 at 5:22 PM, Peter Eisentraut pete...@gmx.net wrote:

 On 9/23/13 5:36 PM, Alexander Korotkov wrote:
  In the attached version of patch double finding of ItemPointer during
  insert is avoided. Overhead becomes lower as expected.

 Fails cpluspluscheck:

 ./src/include/access/gin_private.h: In function ‘char*
 ginDataPageLeafReadItemPointer(char*, ItemPointer)’:
 ./src/include/access/gin_private.h:797:2: warning: comparison between
 signed and unsigned integer expressions [-Wsign-compare]


Thanks. Fixed in attached version of patch.

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-7.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-09-25 Thread Peter Eisentraut
On 9/23/13 5:36 PM, Alexander Korotkov wrote:
 In the attached version of patch double finding of ItemPointer during
 insert is avoided. Overhead becomes lower as expected.

Fails cpluspluscheck:

./src/include/access/gin_private.h: In function ‘char*
ginDataPageLeafReadItemPointer(char*, ItemPointer)’:
./src/include/access/gin_private.h:797:2: warning: comparison between
signed and unsigned integer expressions [-Wsign-compare]



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-09-23 Thread Bruce Momjian
On Sun, Sep 15, 2013 at 01:14:45PM +0400, Alexander Korotkov wrote:
 On Sat, Jun 29, 2013 at 12:56 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:
 
 There's a few open questions:
 
 1. How are we going to handle pg_upgrade? It would be nice to be able to
 read the old page format, or convert on-the-fly. OTOH, if it gets too
 complicated, might not be worth it. The indexes are much smaller with the
 patch, so anyone using GIN probably wants to rebuild them anyway, sooner 
 or
 later. Still, I'd like to give it a shot.

We have broken pg_upgrade index compatibility in the past. 
Specifically, hash and GIN index binary format changed from PG 8.3 to
8.4.  I handled it by invalidating the indexes and providing a
post-upgrade script to REINDEX all the changed indexes.  The user
message is:

   Your installation contains hash and/or GIN indexes.  These indexes 
have
   different internal formats between your old and new clusters, so they
   must be reindexed with the REINDEX command.  The file:

   ...

   when executed by psql by the database superuser will recreate all 
invalid
   indexes; until then, none of these indexes will be used.

It would be very easy to do this from a pg_upgrade perspective. 
However, I know there has been complaints from others about making
pg_upgrade more restrictive.

In this specific case, even if you write code to read the old file
format, we might want to create the REINDEX script to allow _optional_
reindexing to shrink the index files.

If we do require the REINDEX, --check will clearly warn the user that
this will be required.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GIN improvements part 1: additional information

2013-09-23 Thread Alexander Korotkov
On Mon, Sep 23, 2013 at 12:47 AM, Alexander Korotkov
aekorot...@gmail.comwrote:

 It's probably an option to select 64 entries instead of 32.
 There is still some regression in update speed. However, there is also
 room for improvement patch. It searches item index entries 2 times on
 insert: in dataLocateLeafItem and dataPlaceToPage. We can save full results
 of dataLocateLeafItem, but it require some rework of gin btree interface:
 store not only offset of item.


In the attached version of patch double finding of ItemPointer during
insert is avoided. Overhead becomes lower as expected.

 event |master   |  16-entries |  32-entries
  |  64-entries |  128-entries|
---+-+-+-+-+-+
 index_build   | 00:01:50.042658 | 00:01:54.130873 | 00:01:59.37302
 | 00:01:55.959693 | 00:01:58.126407 |
 index_build_recovery  | 00:00:19| 00:00:06| 00:00:06
 | 00:00:06| 00:00:06|
 index_update  | 00:05:18.215707 | 00:05:38.40231  |
00:05:30.658786 | 00:05:27.664312 | 00:05:30.815876 |
 index_update_recovery | 00:01:48| 00:01:53| 00:01:50
 | 00:01:44| 00:01:46|
 search_new| 00:25:21.481699 | 00:23:20.324152 |
00:24:02.120438 | 00:22:50.989723 | 00:23:05.703824 |
 search_updated| 00:25:57.622592 | 00:26:43.531979 |
00:26:08.003085 | 00:24:36.669028 | 00:26:09.175243 |


 label |size| 16-entries | 32-entries | 64-entries |
128-entries |
---+++++-+
 new   |  884514816 |  417013760 |  421240832 |  430350336 |
450994176 |
 after_updates | 1595252736 |  711368704 |  719380480 |  735682560 |
774275072 |

--
With best regards,
Alexander Korotkov.


gin-packed-postinglists-6.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >