Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-17 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 [ spgist patch ]

I've applied this after a lot of revisions, some cosmetic (better
comments etc), some not so much (less bogus vacuum and WAL handling).

There are still a number of loose ends that need to be worked on:

* The question of whether to store nulls so SPGiST can support
full-index scans.  I'm not convinced this is worth doing, because a
full-index scan is likely to suck performance-wise in SPGiST anyway.
But if it's going to be done then IMO it needs to happen for 9.2,
because it will affect both on-disk data and the opclass API.

* Supporting index-only scans.  This I think is worth doing, and I
plan to go do it in the next few days.  However, this is going to
complicate any desire to support full-index scans, because right now
value reconstruction is done by inner_consistent and leaf_consistent,
so there's no way to do it unless there's a qual to check.

* Perhaps it'd be a good idea to move the loop over scankeys to inside
the opclass consistent methods, ie call them just once to check all the
scankeys.  Then we could meaningfully define zero scankeys as a full
index scan, and we would also get rid of redundant value reconstruction
work when there's more than one scankey.  (Though I'm not sure
multiple-scankey cases will occur often enough to make that really worth
worrying about.)

* SPGiST inserts XIDs into redirect tuples so that it can tell when it's
safe for VACUUM to delete them.  This is similar to something btree
does, and as I recall, that was a headache for hot standby.  So probably
SPGiST also needs some work to be safe for hot standby.

* The index free space management seems pretty questionable.  I can
understand the use of the small local cache of recently-used pages in
each backend, but I'm much less than convinced that it's a good idea
to share that across backends through the metapage.  It's too small
for that.  I think the code needs to exploit the FSM a lot more.  For
one thing, AFAICS only *completely empty* pages will ever get added to
FSM by VACUUM, which means that once the backend that initialized a
given page has dropped it from its (tiny) local cache, it's basically
impossible for that page to ever receive any additions except as splits
from its existing entries.  Perhaps that's intentional but it seems
likely to lead to poor space utilization.

* It bothers me a bit that the config function gets called on every
single operation.  Maybe the rd_amcache struct should be used to cache
the config function results (as well as the datatype lookup results).
On the other hand, I have no proof that that's a noticeable time sink.

* It occurs to me that it might be worth sorting the ScanStackEntry
list by block number after each inner-tuple visit, so as to improve
locality of access.  I don't have any evidence for that either,
though.

* I pulled out the spgstat() function because it seemed like something
that didn't belong in core.  If we're going to have it at all, it should
be in contrib/pgstattuple.  I'm willing to do the work to put it there,
but I wonder if anyone has any thoughts about whether to keep its
existing API (return a text value) or change it to look more like
pgstatindex(), ie return a set of columns.  I'd favor the latter if
I thought the set of columns was well-defined, but I'm not sure it is.
(As a reminder, I've attached the last state of the function before
I pulled it out.)

regards, tom lane



Datum
spgstat(PG_FUNCTION_ARGS)
{
text   *name = PG_GETARG_TEXT_P(0);
RangeVar   *relvar;
Relationindex;
BlockNumber blkno;
BlockNumber totalPages = 0,
innerPages = 0,
leafPages = 0,
emptyPages = 0,
deletedPages = 0;
double  usedSpace = 0.0;
charres[1024];
int bufferSize = -1;
int64   innerTuples = 0,
leafTuples = 0,
nAllTheSame = 0,
nLeafPlaceholder = 0,
nInnerPlaceholder = 0,
nLeafRedirect = 0,
nInnerRedirect = 0;

#define IS_INDEX(r) ((r)-rd_rel-relkind == RELKIND_INDEX)
#define IS_SPGIST(r) ((r)-rd_rel-relam == SPGIST_AM_OID)

relvar = makeRangeVarFromNameList(textToQualifiedNameList(name));
index = relation_openrv(relvar, AccessExclusiveLock);

if (!IS_INDEX(index) || !IS_SPGIST(index))
elog(ERROR, relation \%s\ is not an SPGiST index,
 RelationGetRelationName(index));

totalPages = RelationGetNumberOfBlocks(index);

for (blkno = SPGIST_HEAD_BLKNO; blkno  totalPages; blkno++)
{
Buffer  buffer;
Pagepage;
int pageFree;

buffer = ReadBuffer(index, blkno);
LockBuffer(buffer, BUFFER_LOCK_SHARE);

page = BufferGetPage(buffer);

if (PageIsNew(page) || SpGistPageIsDeleted(page))
{
deletedPages++;
UnlockReleaseBuffer(buffer);
 

Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-16 Thread Greg Smith
Since this was marked WIP and Tom has now kicked off two side 
discussions, what I've done is tagged references to each of them as 
comments to the main patch, then marked this as returned with feedback.  
Surely what I do in the CF app isn't going to influence what Tom wants 
to work on, so I'll leave this one to his watch now.  If it does turn 
out to be committed soon instead of re-submitted as I'm presuming right 
now, I can always go back and fix that.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-13 Thread Teodor Sigaev

I wrote:

... the leaf tuple datatype is hard-wired to be

After another day's worth of hacking, I now understand the reason for
the above: when an index is less than a page and an incoming value would
still fit on the root page, the incoming value is simply dumped into a
leaf tuple without ever calling any opclass-specific function at all.

Exactly.


To allow the leaf datatype to be different from the indexed column,
we'd need at least one more opclass support function, and it's not clear
that the potential gain is worth any extra complexity.

Agree, all opclasses which I could imagine for sp-gist use the same type.
Without clear example I don't like an idea to add one more support function and 
it could be easily added later as an optional support function as it's already 
done for distance function for GiST




However, I now have another question: what is the point of the
allTheSame mechanism?  It seems to add quite a great deal of complexity,
I thought about two options: separate code path in core to support 
a-lot-of-the-same-values with minimal support in support functions and move all 
logic about this case to support functions. Second option is demonstrated in 
k-d-tree implementation, where split axis is contained by each half-plane.
May be it is a simpler solution although it moves responsibility to opclass 
developers.




one thing, it's giving me fits while attempting to fix the limitation
on storing long indexed values.  There's no reason why a suffix tree
representation shouldn't work for long strings, but you have to be
willing to cap the length of any given inner tuple's prefix to something
I don't see clear interface for now: let we have an empty index and we need to 
insert a long string (more than even several page). So, it's needed to have 
support function to split input value to several ones. I supposed that sp-gist 
is already complex enough for first step to add support for this non very useful 
case.


Of course, for future we have a plans to add support of long values, NULLs/IS 
NULL, knn-search at least.



I'm also still wondering what your thoughts are on storing null values
versus full-index-scan capability.  I'm leaning towards getting rid of
the dead code, but if you have an idea how to remove the limitation,
maybe we should do that instead.


I didn't have a plan to support NULLs in first stage, because it's not clear for 
me how and where to store them. It seems to me that it should be fully separated 
from normal path, like a linked list of pages with only ItemPointer data 
(similar to leaf data pages in GIN)


I missed that planner will not create qual-free scan, because I thought it's 
still possible with NOT NULL columns. If not, this code could be 
removed/commented/ifdefed.



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-13 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 However, I now have another question: what is the point of the
 allTheSame mechanism?  It seems to add quite a great deal of complexity,

 I thought about two options: separate code path in core to support 
 a-lot-of-the-same-values with minimal support in support functions and move 
 all 
 logic about this case to support functions. Second option is demonstrated in 
 k-d-tree implementation, where split axis is contained by each half-plane.
 May be it is a simpler solution although it moves responsibility to opclass 
 developers.

I eventually realized that you have to have it to ensure that you can
split a leaf-tuple list across pages even when the opclass picksplit
function thinks the values are all the same.  What made no sense to me
was (a) having the property forcibly inherit to child inner tuples,
and (b) suppressing node labels in allTheSame tuples.  That could make
it impossible for the opclass to reconstruct values.  In my local copy
I've changed this behavior a bit and improved the documentation about
what opclasses have to do with the flag.

 one thing, it's giving me fits while attempting to fix the limitation
 on storing long indexed values.  There's no reason why a suffix tree
 representation shouldn't work for long strings, but you have to be
 willing to cap the length of any given inner tuple's prefix to something

 I don't see clear interface for now: let we have an empty index and we
 need to insert a long string (more than even several page). So, it's
 needed to have support function to split input value to several
 ones. I supposed that sp-gist is already complex enough for first step
 to add support for this non very useful case.

Well, I have it working well enough to satisfy me locally.  The
picksplit function can handle the prefixing perfectly well, as long as
it's not surprised by getting called on a single oversized leaf tuple.
(I changed the format of leaf tuples to let the size field be 30 bits,
so we can have an oversized leaf tuple in memory even if we can't store
it.  This doesn't cost anything space-wise because of alignment rules.)

 Of course, for future we have a plans to add support of long values,
 NULLs/IS NULL, knn-search at least.

I think if we're going to do nulls we can't wait; that will necessarily
change the on-disk representation, which is going to be a hard sell once
9.2 is out the door.  Neither leaf nor inner tuples have any good way to
deal with nulls at present.  Maybe if you invent a completely separate
representation for nulls it could be added on after the fact, but that
seems like an ugly answer.

 I missed that planner will not create qual-free scan, because I thought it's 
 still possible with NOT NULL columns. If not, this code could be 
 removed/commented/ifdefed.

Well, it won't do so because pg_am.amoptionalkey is not set.  But we
can't set that if we don't store NULLs.

Right at the moment, my local copy has completely broken handling of
WAL, because I've been focusing on the opclass interface and didn't
worry about WAL while I was whacking the picksplit code around.
I'll try to clean that up today and then post a new version of the
patch.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-10 Thread Tom Lane
I wrote:
 ... the leaf tuple datatype is hard-wired to be
 the same as the indexed column's type.  Why is that?  It seems to me
 to be both confusing and restrictive.  For instance, if you'd designed
 the suffix tree opclass just a little differently, it would be wanting
 to store char not text in leaf nodes.  Shouldn't we change this to
 allow the leaf data type to be specified explicitly?

After another day's worth of hacking, I now understand the reason for
the above: when an index is less than a page and an incoming value would
still fit on the root page, the incoming value is simply dumped into a
leaf tuple without ever calling any opclass-specific function at all.
To allow the leaf datatype to be different from the indexed column,
we'd need at least one more opclass support function, and it's not clear
that the potential gain is worth any extra complexity.

However, I now have another question: what is the point of the
allTheSame mechanism?  It seems to add quite a great deal of complexity,
without saving much of any space.  At least for the node key types used
so far, the null bitmap that's added to the node tuples eats just as
much space as storing a normal key would.  We could probably avoid that
by using custom tuple construction code instead of index_form_tuple,
but on the whole I think it'd be better to remove the concept.  For
one thing, it's giving me fits while attempting to fix the limitation
on storing long indexed values.  There's no reason why a suffix tree
representation shouldn't work for long strings, but you have to be
willing to cap the length of any given inner tuple's prefix to something
that will fit on a page --- and that breaks the badly-underdocumented
allTheSame logic in spgtextproc.c.  You can't choose to just drop the
node key (i.e., the next byte in the original string) unless there isn't
any because you reached the end of the string.  In general it's not
clear to me why it's sensible to drop a node key ever.

I'm also still wondering what your thoughts are on storing null values
versus full-index-scan capability.  I'm leaning towards getting rid of
the dead code, but if you have an idea how to remove the limitation,
maybe we should do that instead.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-09 Thread Tom Lane
Teodor Sigaev teo...@sigaev.ru writes:
 [ spgist patch ]

I've been working through this patch and fixing assorted things.
There are a couple of issues that require some discussion:

1. It took me awhile to realize it, but there are actually three
different datatypes that can be stored in an SPGist index: the prefix
value of an inner tuple, the key value for each node (downlink) in an
inner tuple, and the data value of a leaf tuple.  One reason this was
not obvious is that only two of these types can be configured by the
opclass config function; the leaf tuple datatype is hard-wired to be
the same as the indexed column's type.  Why is that?  It seems to me
to be both confusing and restrictive.  For instance, if you'd designed
the suffix tree opclass just a little differently, it would be wanting
to store char not text in leaf nodes.  Shouldn't we change this to
allow the leaf data type to be specified explicitly?

2. The code is a bit self-contradictory about whether the index is
complete or not.  The top-level ambuild and aminsert code rejects
NULLs (but there seems to be some provision for storing nulls in
leaf tuples --- I'm confused what that's used for, but apparently
not for actual data nulls).  If this is a designed, permanent limitation
then SPGist indexes can never be expected to represent every heap tuple,
which means there is no value in full-index scans.  However there is
code in spgWalk() to support a full-index scan.  It's dead code, because
pg_am.amoptionalkey is not set for spgist and so the planner will never
create a qual-free index scan plan.  Perhaps we should rip out the
full-index-scan code?  Or do you have plans to support indexing nulls
later?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-12-06 Thread Tom Lane
Oleg Bartunov o...@sai.msu.su writes:
 There is one annoying problem under MAC OS (Linux, FreeBSD have no problem), 
 we 
 just can't figure out how to find it, since we are not familiar with MAC OS - 
 it fails to restart after 'kill -9' backend, but only if sources were 
 compiled with -O2 option (no problem occured with -O0). Since the fail happens
 not every time, we use following script to reproduce the problem. We ask
 MAC OS guru to help us debugging this problem.

I don't think it's Mac-specific at all; it looks to me like garden
variety uninitialized data, specifically that there are paths through
doPickSplit that don't set xlrec.newPage.  The crash I'm seeing is

TRAP: FailedAssertion(!(offset = (((PageHeader) (page))-pd_lower = 
(__builtin_offsetof (PageHeaderData, pd_linp)) ? 0 : PageHeader) 
(page))-pd_lower - (__builtin_offsetof (PageHeaderData, pd_linp))) / 
sizeof(ItemIdData))) + 1), File: spgxlog.c, Line: 81)

#0  0x7fff883f982a in __kill ()
#1  0x7fff85bdda9c in abort ()
#2  0x000103165a71 in ExceptionalCondition (conditionName=value 
temporarily unavailable, due to optimizations, errorType=value temporarily 
unavailable, due to optimizations, fileName=value temporarily unavailable, 
due to optimizations, lineNumber=value temporarily unavailable, due to 
optimizations) at assert.c:57
#3  0x000102eeec73 in addOrReplaceTuple (page=0x74cc Address 0x74cc out of 
bounds, tuple=0x7faa1182d64c  , size=88, offset=70) at spgxlog.c:81
#4  0x000102eed4bc in spgRedoPickSplit [inlined] () at 
/Users/tgl/pgsql/src/backend/access/spgist/spgxlog.c:504
#5  0x000102eed4bc in spg_redo (record=0x7fff62a5ccf0) at spgxlog.c:803
#6  0x000102ec4f48 in StartupXLOG () at xlog.c:6534
#7  0x000103054378 in StartupProcessMain () at startup.c:220
#8  0x000102ef4449 in AuxiliaryProcessMain (argc=2, argv=0x7fff62a60030) at 
bootstrap.c:414

The xlog record it's working on is

(gdb) p *(spgxlogPickSplit*)(0x7fcb20826600 + 32)
$6 = {
  node = {
spcNode = 1663, 
dbNode = 41578, 
relNode = 204800
  }, 
  nTuples = 75, 
  nNodes = 4, 
  blknoSrc = 988, 
  nDelete = 74, 
  blknoInner = 929, 
  offnumInner = 70, 
  newPage = 1 '\001', 
  blknoParent = 929, 
  offnumParent = 13, 
  nodeI = 2, 
  stateSrc = {
attType_attlen = 16, 
fakeTupleSize = 32, 
isBuild = 1
  }
}

Since newPage is set, addOrReplaceTuple gets called on a freshly
initialized page, and not surprisingly complains that offset 70 is
way out of range.  Maybe there's something wrong with the replay
logic, but what I'm thinking is that newPage should not have been
true here, which means that doPickSplit failed to set it correctly,
which doesn't look at all improbable.  I added a memset at the
top of doPickSplit to force the whole struct to zeroes, and so far
haven't seen the crash again.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-10-06 Thread Oleg Bartunov

We are working on the hackers documentation, hope to submit it before my
himalaya track.

Oleg
On Sun, 2 Oct 2011, Tom Lane wrote:


Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

On 06.09.2011 20:34, Oleg Bartunov wrote:

Here is the latest spgist patch, which has all planned features as well as
all overhead, introduced by concurrency and recovery, so performance
measurement should be realistic now.



I'm ignoring the text suffix-tree part of this for now, because of the
issue with non-C locales that Alexander pointer out.


It seems to me that SP-GiST simply cannot work for full text comparisons
in non-C locales, because it's critically dependent on the assumption
that comparisons of strings are consistent with comparisons of prefixes
of those strings ... an assumption that's just plain false for most
non-C locales.

We can dodge that problem in the same way that we did in the btree
pattern_ops opclasses, namely implement the opclass only for the =
operator and the special operators ~~ etc.  I think I favor doing this
for the first round, because it's a simple matter of removing code
that's currently present in the patch.  Even with only = support
the opclass would be extremely useful.

Something we could consider later is a way to use the index for the
regular text comparison operators ( etc), but only when the operator
is using C collation.  This is not so much a matter for the index
implementation as it is about teaching the planner to optionally
consider collation when matching an operator call to the index.  It's
probably going to tie into the unfinished business of marking which
operators are collation sensitive and which are not.

In other news, I looked at the patch briefly, but I don't think I want
to review it fully without some documentation.  The absolute minimum
requirement IMO is documentation comparable to what we have for GIN,
ie a specification for the support methods and some indication of when
you'd use this index type in preference to others.  I'd be willing to
help copy-edit and SGML-ize such documentation, but I do not care to
reverse-engineer it from the code.

regards, tom lane



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-10-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 06.09.2011 20:34, Oleg Bartunov wrote:
 Here is the latest spgist patch, which has all planned features as well as
 all overhead, introduced by concurrency and recovery, so performance
 measurement should be realistic now.

 I'm ignoring the text suffix-tree part of this for now, because of the 
 issue with non-C locales that Alexander pointer out.

It seems to me that SP-GiST simply cannot work for full text comparisons
in non-C locales, because it's critically dependent on the assumption
that comparisons of strings are consistent with comparisons of prefixes
of those strings ... an assumption that's just plain false for most
non-C locales.

We can dodge that problem in the same way that we did in the btree
pattern_ops opclasses, namely implement the opclass only for the =
operator and the special operators ~~ etc.  I think I favor doing this
for the first round, because it's a simple matter of removing code
that's currently present in the patch.  Even with only = support
the opclass would be extremely useful.

Something we could consider later is a way to use the index for the
regular text comparison operators ( etc), but only when the operator
is using C collation.  This is not so much a matter for the index
implementation as it is about teaching the planner to optionally
consider collation when matching an operator call to the index.  It's
probably going to tie into the unfinished business of marking which
operators are collation sensitive and which are not.

In other news, I looked at the patch briefly, but I don't think I want
to review it fully without some documentation.  The absolute minimum
requirement IMO is documentation comparable to what we have for GIN,
ie a specification for the support methods and some indication of when
you'd use this index type in preference to others.  I'd be willing to
help copy-edit and SGML-ize such documentation, but I do not care to
reverse-engineer it from the code.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-22 Thread Heikki Linnakangas

On 06.09.2011 20:34, Oleg Bartunov wrote:

Here is the latest spgist patch, which has all planned features as well as
all overhead, introduced by concurrency and recovery, so performance
measurement should be realistic now.


I'm ignoring the text suffix-tree part of this for now, because of the 
issue with non-C locales that Alexander pointer out.


Regarding the quadtree, have you compared the performance of that with 
Alexander's improved split algorithm? I ran some tests using the test 
harness I still had lying around from the fast GiST index build tests:


testname |  time   | accesses | indexsize
-+-+--+---
 points unordered auto   | 00:03:58.188866 |   378779 | 522 MB
 points ordered auto | 00:07:14.362355 |   177534 | 670 MB
 points unordered auto   | 00:02:59.130176 |46561 | 532 MB
 points ordered auto | 00:04:00.50756  |45066 | 662 MB
 points unordered spgist | 00:03:05.569259 |78871 | 394 MB
 points ordered spgist   | 00:01:46.06855  |   422104 | 417 MB
(8 rows)

These tests were with a table with 750 random points. In the 
ordered-tests, the table is sorted by x,y coordinates. 'time' is the 
time used to build the index on it, and 'accesses' is the total number 
of index blocks hit by a series of 1 bounding box queries, measured 
from pg_statio_user_indexes.idx_blks_hit + idx_blks_read.


The first two tests in the list are with a GiST index on unpatched 
PostgreSQL. The next six tests are with Alexander's double-sorting split 
patch. The last two tests are with an SP-GiST index.


It looks like the query performance with GiST using the double-sorting 
split is better than SP-GiST, although the SP-GiST index is somewhat 
smaller. The ordered case seems pathologically bad, is that some sort of 
a worst-case scenario for quadtrees?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-22 Thread Alexander Korotkov
On Thu, Sep 22, 2011 at 2:05 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 Regarding the quadtree, have you compared the performance of that with
 Alexander's improved split algorithm? I ran some tests using the test
 harness I still had lying around from the fast GiST index build tests:

testname |  time   | accesses | indexsize
 -+**-+--+-**--
  points unordered auto   | 00:03:58.188866 |   378779 | 522 MB
  points ordered auto | 00:07:14.362355 |   177534 | 670 MB
  points unordered auto   | 00:02:59.130176 |46561 | 532 MB
  points ordered auto | 00:04:00.50756  |45066 | 662 MB
  points unordered spgist | 00:03:05.569259 |78871 | 394 MB
  points ordered spgist   | 00:01:46.06855  |   422104 | 417 MB
 (8 rows)

I assume first two rows to be produced by new linear split
algorithm(current) and secound two rows by double sorting split algorithm(my
patch).


 These tests were with a table with 750 random points. In the
 ordered-tests, the table is sorted by x,y coordinates. 'time' is the time
 used to build the index on it, and 'accesses' is the total number of index
 blocks hit by a series of 1 bounding box queries, measured from
 pg_statio_user_indexes.idx_**blks_hit + idx_blks_read.

 The first two tests in the list are with a GiST index on unpatched
 PostgreSQL. The next six tests are with Alexander's double-sorting split
 patch. The last two tests are with an SP-GiST index.

 It looks like the query performance with GiST using the double-sorting
 split is better than SP-GiST, although the SP-GiST index is somewhat
 smaller. The ordered case seems pathologically bad, is that some sort of a
 worst-case scenario for quadtrees?

Comparison of search speed using number of page accesses is
quite comprehensive for various GiST indexes. But when we're
comparing  SP-GiST vs GiST we should take into accoung that they have
different CPU/IO ratio. GiST scans whole page which it accesses. SP-GiST can
scan only fraction of page because several nodes can be packed into single
page. Thereby it would be interesting to compare also CPU load GiST
vs. SP-GiST. Also, there is some hope to reduce number of page accesses in
SP-GiST by improving clustering algorithm.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-21 Thread Heikki Linnakangas

On 05.09.2011 09:39, Oleg Bartunov wrote:

I attached wrong patch in previous message, sorry ! Here is a last version.


One little detail caught my eye: In spgSplitNodeAction, you call 
SpGistGetBuffer() within a critical section. That should be avoided, 
SpGistGetBuffer() can easily fail if you e.g run out of disk space.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-06 Thread Oleg Bartunov

Here is the latest spgist patch, which has all planned features as well as
all overhead, introduced by concurrency and recovery, so performance 
measurement should be realistic now.


Oleg

On Mon, 5 Sep 2011, Oleg Bartunov wrote:


I attached wrong patch in previous message, sorry ! Here is a last version.

This is a new WIP of SP-GiST patch, which provides support for:

1. Concurrent vacuum
2. Vacuum logging
3. WAL replay

Oleg
On Thu, 1 Sep 2011, Oleg Bartunov wrote:

This is updates SP-GiST patch, which fixed one bug and replaced test to the 
locale independent one.


On Wed, 31 Aug 2011, Oleg Bartunov wrote:


Hi there,

attached is our WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_patch-0.98.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-06 Thread Andreas Joseph Krogh
On 09/06/2011 07:34 PM, Oleg Bartunov wrote:
 Here is the latest spgist patch, which has all planned features as
 well as
 all overhead, introduced by concurrency and recovery, so performance
 measurement should be realistic now.

 Oleg

Sorry for not getting the might-be-obvious here, but will this patch
bring indexed substring-search to PG? So queries conceptually equal to
this will be possible to index: WHERE som_col @@
':substr1::substr2!substr3:' meaning contains substr1 AND ends with
substr2 OR starts with substr3?

-- 
Andreas Joseph Krogh andr...@officenet.no - mob: +47 909 56 963
Senior Software Developer / CTO - OfficeNet AS - http://www.officenet.no
Public key: http://home.officenet.no/~andreak/public_key.asc


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-06 Thread Alexander Korotkov
On Tue, Sep 6, 2011 at 10:21 PM, Andreas Joseph Krogh
andr...@officenet.nowrote:

 Sorry for not getting the might-be-obvious here, but will this patch
 bring indexed substring-search to PG? So queries conceptually equal to
 this will be possible to index: WHERE som_col @@
 ':substr1::substr2!substr3:' meaning contains substr1 AND ends with
 substr2 OR starts with substr3?

Such applications are potentionally possible, but aren't presented yet.
Currently only k-d-tree and suffix-tree are presented. Suffix-tree supports
prefix search like btree with *_pattern_ops. (Oleg will corect me if I'm
missing something)

For the queries you mentioned you may see LIKE acceleration in wildspeed
module (http://www.sai.msu.su/~megera/wiki/wildspeed) and pg_trgm of 9.1 (
http://www.postgresql.org/docs/9.1/static/pgtrgm.html).

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-05 Thread Oleg Bartunov
Attached is updated SP-GiST patch, which provides full logging support and 
fixed several bugs (Thanks ALexander Korotkov for help).


Regards,
 Oleg

On Thu, 1 Sep 2011, Oleg Bartunov wrote:

This is updates SP-GiST patch, which fixed one bug and replaced test to the 
locale independent one.


On Wed, 31 Aug 2011, Oleg Bartunov wrote:


Hi there,

attached is our WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_patch-0.89.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-05 Thread Oleg Bartunov

I attached wrong patch in previous message, sorry ! Here is a last version.

This is a new WIP of SP-GiST patch, which provides support for:

1. Concurrent vacuum
2. Vacuum logging
3. WAL replay

Oleg
On Thu, 1 Sep 2011, Oleg Bartunov wrote:

This is updates SP-GiST patch, which fixed one bug and replaced test to the 
locale independent one.


On Wed, 31 Aug 2011, Oleg Bartunov wrote:


Hi there,

attached is our WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_patch-0.92.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-01 Thread Oleg Bartunov
This is updates SP-GiST patch, which fixed one bug and 
replaced test to the locale independent one.


On Wed, 31 Aug 2011, Oleg Bartunov wrote:


Hi there,

attached is our WIP-patch for 9.2 development source tree, which provides
implementation of SP-GiST (prototype was presented at PGCon-2011, see
http://www.pgcon.org/2011/schedule/events/309.en.html and presentation
for details) as a core feature.  Main differences from prototype version:

1. Now it's part of pg core, not contrib module
2. It provides more operations for quadtree and suffix tree
3. It uses clustering algorithm of nodes on disk and has much better
utilization of disk space. Fillfactor is supported
4. Some corner cases were eliminated
5. It provides support for concurency and recovery (inserts are
logged, supports for deletes, and log replay will be added really
soon)

So, now code contains almost all possible overhead of production code
and we ask hackers to test performance on real data sets. We expect
the same performance for random data (since almost no overlaps) and
much better performance on real-life data, plus much better index
creation time. Also, we appreciate your comments and suggestions about
API.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

spgist_patch-0.85.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: SP-GiST, Space-Partitioned GiST

2011-09-01 Thread Alexander Korotkov
Hi!

Ie expect some problems in support of comparison operators for text, because
locale string comparison can have unexpected behaviour.
Let's see the example. Create table with words and add extra leading space
to some of them.

test=# create table dict(id serial, word text);
NOTICE:  CREATE TABLE will create implicit sequence dict_id_seq for serial
column dict.id
CREATE TABLE
test=# \copy dict(word) from '/usr/share/dict/american-english';
test=# update dict set word = ' '||word where id%2=0;
UPDATE 49284

I use Ubuntu 11.04 with ru_RU.utf8 locale. So, comparison operators ignores
leading spaces.

test=# select * from dict where word between 'cart' and 'cary';
  id   |  word
---+
  3029 | Carter
  3031 | Cartesian
  3033 | Carthage's
  3035 | Cartier
  3037 | Cartwright
  3039 | Caruso
  3041 | Carver
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
  3030 |  Carter's
  3032 |  Carthage
  3034 |  Carthaginian
  3036 |  Cartier's
  3038 |  Cartwright's
  3040 |  Caruso's
  3042 |  Carver's
 28420 |  cart's
 28422 |  cartel
 28424 |  cartels
 28426 |  cartilage's
 28428 |  cartilaginous
 28430 |  cartographer
 28432 |  cartographers
 28434 |  cartography's
 28436 |  carton's
 28438 |  cartoon
 28440 |  cartooned
 28442 |  cartoonist
 28444 |  cartoonists
 28446 |  cartridge
 28448 |  cartridges
 28450 |  cartwheel
 28452 |  cartwheeled
 28454 |  cartwheels
 28456 |  carved
 28458 |  carver's
 28460 |  carves
 28462 |  carving's
(59 rows)

But if I create spgist index query result differs.

test=# create index dict_idx on dict using spgist (word);
CREATE INDEX
test=# select * from dict where word between 'cart' and 'cary';
  id   |  word
---+
 28419 | cart
 28421 | carted
 28423 | cartel's
 28425 | cartilage
 28427 | cartilages
 28429 | carting
 28431 | cartographer's
 28433 | cartography
 28435 | carton
 28437 | cartons
 28439 | cartoon's
 28441 | cartooning
 28443 | cartoonist's
 28445 | cartoons
 28447 | cartridge's
 28449 | carts
 28451 | cartwheel's
 28453 | cartwheeling
 28455 | carve
 28457 | carver
 28459 | carvers
 28461 | carving
 28463 | carvings
(23 rows)

--
With best regards,
Alexander Korotkov.