Re: [HACKERS] Fractal tree indexing

2016-11-16 Thread Andrew Borodin
Hi hackers!

Here is prototype of procrastinating GiST. Or lazy GiST. Or buffered GiST.
Indeed, there is nothing fractal about it.

The concept is leaf tuples stall on internal pages. From time to time they
are pushed down in batches. This code is far from perfect, I've coded this
only for the research purposes.

I have tested code with insertion of random 3D cubes. On 1 million of
insertions classic GiST is 8% faster, on 3M performance is equal, on 30M
lazy GiST if 12% faster.
I've tested it with SSD disk, may be with HDD difference will be more
significant.
Test scenario is here https://github.com/x4m/postgres_g/blob/bufdev/test.sql

In theory, this is cache friendly implementation (but not cache oblivious)
and it must be faster even for small datasets without huge disk work. But
it has there extra cost of coping batch of tuples to palloc`d array,
couln't avoid that.

This is just a proof-of-concept for performance measrures:
1. make check fails for two tests
2. WAL is broken
3. code is a mess in some places

I'm not sure 12% of performance worth it. I'll appreciate any ideas on what
to do next.

Best regards, Andrey Borodin.

2013-02-13 22:54 GMT+05:00 Simon Riggs :

> On 13 February 2013 16:48, Heikki Linnakangas 
> wrote:
> > On 13.02.2013 18:20, Tom Lane wrote:
> >>
> >> Heikki Linnakangas  writes:
> >>>
> >>> The basic idea of a fractal tree index is to attach a buffer to every
> >>> non-leaf page. On insertion, instead of descending all the way down to
> >>> the correct leaf page, the new tuple is put on the buffer at the root
> >>> page. When that buffer fills up, all the tuples in the buffer are
> >>> cascaded down to the buffers on the next level pages. And recursively,
> >>> whenever a buffer fills up at any level, it's flushed to the next
> level.
> >>
> >>
> >> [ scratches head... ]  What's "fractal" about that?  Or is that just a
> >> content-free marketing name for this technique?
> >
> >
> > I'd call it out as a marketing name. I guess it's fractal in the sense
> that
> > all levels of the tree can hold "leaf tuples" in the buffers; the
> structure
> > looks the same no matter how deep you zoom, like a fractal.. But
> "Buffered"
> > would be more appropriate IMO.
>
> I hope for their sake there is more to it than that. It's hard to see
> how buffering can be patented.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index b8aa9bc..29770d2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -265,6 +265,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
boolis_rootsplit;
int npage;
 
+
is_rootsplit = (blkno == GIST_ROOT_BLKNO);
 
/*
@@ -524,6 +525,8 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
gistfillbuffer(page, itup, ntup, InvalidOffsetNumber);
}
 
+   GistPageGetOpaque(page)->nlazy=1; //DO NOT FORGET: remark pages 
after split
+
MarkBufferDirty(buffer);
 
if (BufferIsValid(leftchildbuf))
@@ -589,6 +592,7 @@ gistplacetopage(Relation rel, Size freespace, GISTSTATE 
*giststate,
  * this routine assumes it is invoked in a short-lived memory context,
  * so it does not bother releasing palloc'd allocations.
  */
+
 void
 gistdoinsert(Relation r, IndexTuple itup, Size freespace, GISTSTATE *giststate)
 {
@@ -597,7 +601,6 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, 
GISTSTATE *giststate)
GISTInsertStack firststack;
GISTInsertStack *stack;
GISTInsertState state;
-   boolxlocked = false;
 
memset(, 0, sizeof(GISTInsertState));
state.freespace = freespace;
@@ -610,6 +613,21 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, 
GISTSTATE *giststate)
firststack.downlinkoffnum = InvalidOffsetNumber;
state.stack = stack = 
 
+
+   gistdoinsertloop(r,itup,freespace, giststate, stack, state, 0);
+}
+
+void
+gistdoinsertloop(Relation r, IndexTuple itup, Size freespace, GISTSTATE 
*giststate,GISTInsertStack *givenstack, GISTInsertState state, GISTInsertStack 
*nolazy)
+{
+   ItemId  iid;
+   IndexTuple  idxtuple;
+
+   boolxlocked = false;
+   GISTInsertStack *stack = givenstack;
+
+
+
/*
 * Walk down along the path of smallest penalty, updating the parent
 * pointers with the key we're inserting as we go. If we crash in the
@@ -685,9 +703,86 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace, 
GISTSTATE *giststate)
GISTInsertStack *item;
OffsetNumber downlinkoffnum;
 
+
+
+   

[HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma
Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.

I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).

Regards,

Atri

--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 11:01, Atri Sharma wrote:

Hi all,

Just a curiosity I couldnt control. I was recently reading about
Fractal tree indexing
(http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
how TokuDB engine for MySQL is really working nicely with big data.


Hmm, sounds very similar to the GiST buffering build work Alexander 
Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST, 
and the buffers are permanent, rather than used only during index build. 
It's also somewhat similar to the fast insert mechanism in GIN, except 
that the gin fast insert buffer is just a single buffer, rather than a 
buffer at each node.



I was wondering, do we have support for fractal tree indexing? I mean,
it really does seem to help manage big data, so we could think of
supporting it in some form for our large data set clients( if it is
not happening already someplace which I have missed).


There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma
On Wed, Feb 13, 2013 at 3:08 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 13.02.2013 11:01, Atri Sharma wrote:

 Hi all,

 Just a curiosity I couldnt control. I was recently reading about
 Fractal tree indexing
 (http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
 how TokuDB engine for MySQL is really working nicely with big data.


 Hmm, sounds very similar to the GiST buffering build work Alexander Korotkov
 did for 9.2. Only the buffers are for B-trees rather than GiST, and the
 buffers are permanent, rather than used only during index build. It's also
 somewhat similar to the fast insert mechanism in GIN, except that the gin
 fast insert buffer is just a single buffer, rather than a buffer at each
 node.


 I was wondering, do we have support for fractal tree indexing? I mean,
 it really does seem to help manage big data, so we could think of
 supporting it in some form for our large data set clients( if it is
 not happening already someplace which I have missed).


 There are no fractal trees in PostgreSQL today. Patches are welcome ;-).

 - Heikki

 Hi Heikki,

Yeah,it is pretty close to GisT, but as you said, it still works on BTree.

On the other hand, one thing I really liked about Fractal trees is
that it attempts to address the problems with BTrees. I feel fractal
trees can provide us with a new way altogether to handle new data,
rather than building on top of BTrees.

I would love to chip in, but would require lots of help :)

Do you think building a new index in postgres with fractal trees as
the basis would serve the purpose? or is there something else we
should think of?

Atri

-- 
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Alexander Korotkov
On Wed, Feb 13, 2013 at 1:38 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 13.02.2013 11:01, Atri Sharma wrote:

 Hi all,

 Just a curiosity I couldnt control. I was recently reading about
 Fractal tree indexing
 (http://www.tokutek.com/2012/**12/fractal-tree-indexing-**overview/http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/)
 and
 how TokuDB engine for MySQL is really working nicely with big data.


 Hmm, sounds very similar to the GiST buffering build work Alexander
 Korotkov did for 9.2. Only the buffers are for B-trees rather than GiST,
 and the buffers are permanent, rather than used only during index build.
 It's also somewhat similar to the fast insert mechanism in GIN, except that
 the gin fast insert buffer is just a single buffer, rather than a buffer at
 each node.


  I was wondering, do we have support for fractal tree indexing? I mean,
 it really does seem to help manage big data, so we could think of
 supporting it in some form for our large data set clients( if it is
 not happening already someplace which I have missed).


 There are no fractal trees in PostgreSQL today. Patches are welcome ;-).


I remember we have already discussed fractal trees privately. Short
conclusions are so:
1) Fractal tree indexes are patented. It is distributed as commercial
extension to MySQL. So we can't include it into PostgreSQL core.
2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
extension because lack of WAL extensibility.
We could think about WAL extensibility which would help other applications
as well.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma
Wed:


 I remember we have already discussed fractal trees privately. Short
 conclusions are so:
 1) Fractal tree indexes are patented. It is distributed as commercial
 extension to MySQL. So we can't include it into PostgreSQL core.
 2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
 extension because lack of WAL extensibility.
 We could think about WAL extensibility which would help other applications
 as well.


Sounds nice. WAL extensibility can help.

Atri



--
Regards,

Atri
l'appren


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Greg Stark
On Wed, Feb 13, 2013 at 10:19 AM, Atri Sharma atri.j...@gmail.com wrote:
 2) Tokutek can't provide full-fledged fractal tree indexes as PostgreSQL
 extension because lack of WAL extensibility.
 We could think about WAL extensibility which would help other applications
 as well.


 Sounds nice. WAL extensibility can help.

The problem with WAL extensibility is that extensions can come and go
and change over time. If the database can't interpret some WAL record
or misinterprets it because a module is missing or changed since that
record was written then you could lose your whole database. I think a
fundamental part of extensibility is isolating the effects of the
extensions from the rest of the system so that problem would have to
be tackled. Perhaps making each file owned by a single resource
manager and having the database be able to deal with individual files
being corrupted. But that doesn't deal with all record types and there
are situations where you really want to have part of a file contain
data managed by another resource manager.

Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad

On 13-Feb-2013, at 18:21, Greg Stark st...@mit.edu wrote
 
 
 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.
 
 
How do we handle the case you mentioned, maybe a module that has been removed 
since a record was made? Is the solution that we encapsulate WAL from those 
kind of changes, and keep the WAL records same for everyone,irrespective 
whether they use an external module or not(I inferred this from Heikki's 
idea,or am I missing something here?)

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Alexander Korotkov
On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark st...@mit.edu wrote:

 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.


It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 15:31, Alexander Korotkov wrote:

On Wed, Feb 13, 2013 at 4:51 PM, Greg Starkst...@mit.edu  wrote:


Heikki was talking about a generic WAL record type that would just
store a binary delta between the version of the block when it was
locked and when it was unlocked. That would handle any extension
cleanly as far as data modification goes as long as the extension was
working through our buffer manager. It seems like an attractive idea
to me.


It will, for sure, works well when atomic page changes are enough for us.
However, some operations, for example, page splits, contain changes in
multiple pages. Replaying changes in only some of pages is not fair. Now,
it's hard for me to imagine how to generalize it into generic WAL record
type.


You could have a generic WAL record that applies changes to multiple 
pages atomically.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad

On 13-Feb-2013, at 19:05, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 15:31, Alexander Korotkov wrote:
 On Wed, Feb 13, 2013 at 4:51 PM, Greg Starkst...@mit.edu  wrote:
 
 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.
 
 It will, for sure, works well when atomic page changes are enough for us.
 However, some operations, for example, page splits, contain changes in
 multiple pages. Replaying changes in only some of pages is not fair. Now,
 it's hard for me to imagine how to generalize it into generic WAL record
 type.
 
 You could have a generic WAL record that applies changes to multiple pages 
 atomically.
 
 

Sounds extremely interesting and fun.How would we go about implementing it?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Simon Riggs
On 13 February 2013 13:35, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 You could have a generic WAL record that applies changes to multiple pages
 atomically.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
.
 
 I think its a good idea, the best idea even, but we still have no idea
 what the requirements are without a clear case for an external index.
 It could easily turn out that we invent a plausible API that's not
 actually of use because of requirements for locking. Whoever wants
 that can do the legwork.
 
 IIRC each of the new index types has required some changes to the
 generic APIs, which makes sense.
 
 

Does that mean we can add support for fractal tree indexes(or some thing on 
similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for 
generic WAL record(I am kind of obsessed with the idea of seeing fractal 
indexes being supported in Postgres).

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Andrew Dunstan


On 02/13/2013 09:13 AM, Atri Sharma wrote:


Sent from my iPad

On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
.

I think its a good idea, the best idea even, but we still have no idea
what the requirements are without a clear case for an external index.
It could easily turn out that we invent a plausible API that's not
actually of use because of requirements for locking. Whoever wants
that can do the legwork.

IIRC each of the new index types has required some changes to the
generic APIs, which makes sense.



Does that mean we can add support for fractal tree indexes(or some thing on 
similar lines) in the regular way by changing the generic APIs?

IMO, we could design the fractal tree index and use it as the use case for 
generic WAL record(I am kind of obsessed with the idea of seeing fractal 
indexes being supported in Postgres).




If they are patented as Alexander says upthread, then surely the idea is 
dead in the water.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma
 
 
 
 If they are patented as Alexander says upthread, then surely the idea is dead 
 in the water.
 
 
True, I think so too.

But,the generic WAL seems an awesome idea and I would love to help.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Craig Ringer
On 02/13/2013 10:43 PM, Andrew Dunstan wrote:

 On 02/13/2013 09:13 AM, Atri Sharma wrote:

 Sent from my iPad

 On 13-Feb-2013, at 19:31, Simon Riggs si...@2ndquadrant.com wrote:
 .
 I think its a good idea, the best idea even, but we still have no idea
 what the requirements are without a clear case for an external index.
 It could easily turn out that we invent a plausible API that's not
 actually of use because of requirements for locking. Whoever wants
 that can do the legwork.

 IIRC each of the new index types has required some changes to the
 generic APIs, which makes sense.


 Does that mean we can add support for fractal tree indexes(or some
 thing on similar lines) in the regular way by changing the generic APIs?

 IMO, we could design the fractal tree index and use it as the use
 case for generic WAL record(I am kind of obsessed with the idea of
 seeing fractal indexes being supported in Postgres).


 If they are patented as Alexander says upthread, then surely the idea
 is dead in the water.
Isn't practically everything patented, with varying degrees of validity
and patent defensibility? Particularly the trick of renewing expired
patents by submitting tiny variations.

I realise that this general situation is different to knowing about a
specific patent applying to a specific proposed technique, particularly
regarding the USA's insane triple-damages-for-knowing-about-it thing,
and that a patent can well and truly block the adoption of a technique
into Pg core. It might not prevent its implementation as an out-of-tree
extension though, even if that requires some enhancements to core APIs
to make it possible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Tom Lane
Atri Sharma atri.j...@gmail.com writes:
 Do you think building a new index in postgres with fractal trees as
 the basis would serve the purpose? or is there something else we
 should think of?

First explain why you couldn't build it as an opclass for gist or
spgist ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad

On 13-Feb-2013, at 20:30, Tom Lane t...@sss.pgh.pa.us wrote:

 
 First explain why you couldn't build it as an opclass for gist or
 spgist ...
 
 

That needs thinking about a bit.I was confused about the current indexes 
because they all build on BTrees.But, building an opclass with GiST should be a 
good solution.

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 17:49, Atri Sharma wrote:

On 13-Feb-2013, at 20:30, Tom Lanet...@sss.pgh.pa.us  wrote:


First explain why you couldn't build it as an opclass for gist or
spgist ...


That needs thinking about a bit.I was confused about the current indexes 
because they all build on BTrees.But, building an opclass with GiST should be a 
good solution.


That makes no sense. I don't see any way to implement this in an 
opclass, and it wouldn't make sense to re-implement this for every 
opclass anyway.


The basic idea of a fractal tree index is to attach a buffer to every 
non-leaf page. On insertion, instead of descending all the way down to 
the correct leaf page, the new tuple is put on the buffer at the root 
page. When that buffer fills up, all the tuples in the buffer are 
cascaded down to the buffers on the next level pages. And recursively, 
whenever a buffer fills up at any level, it's flushed to the next level. 
This speeds up insertions, as you don't need to fetch and update the 
right leaf page on every insert; the lower-level pages are updated in 
batch as a buffer fills up.


As I said earlier, this is very similar to the way the GiST buffering 
build algorithm works. It could be applied to any tree-structured access 
method, including b-tree, GiST and SP-GiST.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad
.
 
 That makes no sense. I don't see any way to implement this in an opclass, and 
 it wouldn't make sense to re-implement this for every opclass anyway.
 
 The basic idea of a fractal tree index is to attach a buffer to every 
 non-leaf page. On insertion, instead of descending all the way down to the 
 correct leaf page, the new tuple is put on the buffer at the root page. When 
 that buffer fills up, all the tuples in the buffer are cascaded down to the 
 buffers on the next level pages. And recursively, whenever a buffer fills up 
 at any level, it's flushed to the next level. This speeds up insertions, as 
 you don't need to fetch and update the right leaf page on every insert; the 
 lower-level pages are updated in batch as a buffer fills up.
 
 As I said earlier, this is very similar to the way the GiST buffering build 
 algorithm works. It could be applied to any tree-structured access method, 
 including b-tree, GiST and SP-GiST.
 

Can we implement it in a generic manner then? I mean,irrespective of the tree 
it is being applied to,be it BTree,gist or spgist?

Another thing,in case of a large tree which is split over multiple pages, how 
do we reduce the cost of I/o to fetch and rewrite all those pages?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every 
 non-leaf page. On insertion, instead of descending all the way down to 
 the correct leaf page, the new tuple is put on the buffer at the root 
 page. When that buffer fills up, all the tuples in the buffer are 
 cascaded down to the buffers on the next level pages. And recursively, 
 whenever a buffer fills up at any level, it's flushed to the next level. 

[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Andrew Dunstan


On 02/13/2013 11:20 AM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.

[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?





And if that's all it is then I have some doubt about its patentability. 
For one thing I'd be mildly surprised if there weren't prior art. But of 
course, IANAL :-)


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 18:20, Tom Lane wrote:

Heikki Linnakangashlinnakan...@vmware.com  writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.


[ scratches head... ]  What's fractal about that?  Or is that just a
content-free marketing name for this technique?


I'd call it out as a marketing name. I guess it's fractal in the sense 
that all levels of the tree can hold leaf tuples in the buffers; the 
structure looks the same no matter how deep you zoom, like a fractal.. 
But Buffered would be more appropriate IMO.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Heikki Linnakangas

On 13.02.2013 18:43, Andrew Dunstan wrote:


On 02/13/2013 11:20 AM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

The basic idea of a fractal tree index is to attach a buffer to every
non-leaf page. On insertion, instead of descending all the way down to
the correct leaf page, the new tuple is put on the buffer at the root
page. When that buffer fills up, all the tuples in the buffer are
cascaded down to the buffers on the next level pages. And recursively,
whenever a buffer fills up at any level, it's flushed to the next level.

[ scratches head... ] What's fractal about that? Or is that just a
content-free marketing name for this technique?


And if that's all it is then I have some doubt about its patentability.
For one thing I'd be mildly surprised if there weren't prior art. But of
course, IANAL :-)


Agreed, but IANAL either. The papers the GiST buffering build algorithm 
was based pre-dates Tokutek's fractal indexes, for starters. Of course, 
all I know about fractal indexes is what I've read on some presentation 
slides on the 'net, so I might be missing something.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma

Yeah,it is just a fancy name for something that has nothing to do with 
fractals.I guess everything suave these days is fractal!

That said,the buffered concept itself looks really cool and should help us in 
large data sets.I am eager to get off the mark with it.

Will we be building the index from scratch? And how would we go about it?

We will need to be careful about the buffer size per node, in order to ensure 
that the pushing of tuples from nodes in large data sets does not become a 
substantial overhead.

Atri
Sent from my iPad

On 13-Feb-2013, at 22:21, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 18:43, Andrew Dunstan wrote:
 
 On 02/13/2013 11:20 AM, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.
 [ scratches head... ] What's fractal about that? Or is that just a
 content-free marketing name for this technique?
 
 And if that's all it is then I have some doubt about its patentability.
 For one thing I'd be mildly surprised if there weren't prior art. But of
 course, IANAL :-)
 
 Agreed, but IANAL either. The papers the GiST buffering build algorithm was 
 based pre-dates Tokutek's fractal indexes, for starters. Of course, all I 
 know about fractal indexes is what I've read on some presentation slides on 
 the 'net, so I might be missing something.
 
 - Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Atri Sharma


Sent from my iPad

On 13-Feb-2013, at 22:21, Heikki Linnakangas hlinnakan...@vmware.com wrote:

 On 13.02.2013 18:43, Andrew Dunstan wrote:
 
 On 02/13/2013 11:20 AM, Tom Lane wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic  fractal indexes is what I've read on some presentation slides 
 on the 'net, so I might be missing something.
 

I think that buffering can be applied to BTree, R Tree and GisT in more or less 
the same manner.

Is there a way we can abstract the buffering part out of them all?

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Simon Riggs
On 13 February 2013 16:48, Heikki Linnakangas hlinnakan...@vmware.com wrote:
 On 13.02.2013 18:20, Tom Lane wrote:

 Heikki Linnakangashlinnakan...@vmware.com  writes:

 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.


 [ scratches head... ]  What's fractal about that?  Or is that just a
 content-free marketing name for this technique?


 I'd call it out as a marketing name. I guess it's fractal in the sense that
 all levels of the tree can hold leaf tuples in the buffers; the structure
 looks the same no matter how deep you zoom, like a fractal.. But Buffered
 would be more appropriate IMO.

I hope for their sake there is more to it than that. It's hard to see
how buffering can be patented.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Joshua D. Drake


On 02/13/2013 09:54 AM, Simon Riggs wrote:


I'd call it out as a marketing name. I guess it's fractal in the sense that
all levels of the tree can hold leaf tuples in the buffers; the structure
looks the same no matter how deep you zoom, like a fractal.. But Buffered
would be more appropriate IMO.


I hope for their sake there is more to it than that. It's hard to see
how buffering can be patented.


Talk to Apple about that. It only needs to be worded correctly.

JD







--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Josh Berkus
On 02/13/2013 01:01 AM, Atri Sharma wrote:
 Hi all,
 
 Just a curiosity I couldnt control. I was recently reading about
 Fractal tree indexing
 (http://www.tokutek.com/2012/12/fractal-tree-indexing-overview/) and
 how TokuDB engine for MySQL is really working nicely with big data.
 
 I was wondering, do we have support for fractal tree indexing? I mean,
 it really does seem to help manage big data, so we could think of
 supporting it in some form for our large data set clients( if it is
 not happening already someplace which I have missed).

Sadly, fractal trees are an invention of Tokutek and are heavily and
publically patented.  That's why I haven't pursued the idea, and indeed
why Tokutek hasn't developed a PostgreSQL plug-in.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Daniel Farina
On Wed, Feb 13, 2013 at 8:20 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas hlinnakan...@vmware.com writes:
 The basic idea of a fractal tree index is to attach a buffer to every
 non-leaf page. On insertion, instead of descending all the way down to
 the correct leaf page, the new tuple is put on the buffer at the root
 page. When that buffer fills up, all the tuples in the buffer are
 cascaded down to the buffers on the next level pages. And recursively,
 whenever a buffer fills up at any level, it's flushed to the next level.

 [ scratches head... ]  What's fractal about that?  Or is that just a
 content-free marketing name for this technique?

The name in the literature is Cache Oblivious Lookahead Array, aka
COLA.  The authors also are founders of TokuTek, and seemed to have
take pains to ring-fence mentions of the algorithm with reference to
its patent.

Well, at least nobody can blame them for submarine patent action.

-- 
fdr


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fractal tree indexing

2013-02-13 Thread Greg Stark
On Wed, Feb 13, 2013 at 1:31 PM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Wed, Feb 13, 2013 at 4:51 PM, Greg Stark st...@mit.edu wrote:

 Heikki was talking about a generic WAL record type that would just
 store a binary delta between the version of the block when it was
 locked and when it was unlocked. That would handle any extension
 cleanly as far as data modification goes as long as the extension was
 working through our buffer manager. It seems like an attractive idea
 to me.


 It will, for sure, works well when atomic page changes are enough for us.
 However, some operations, for example, page splits, contain changes in
 multiple pages. Replaying changes in only some of pages is not fair. Now,
 it's hard for me to imagine how to generalize it into generic WAL record
 type.

I think multiple-page operations where you have all the pages locked
at the same time, like tuple updates for example, are fairly simple.
The existing WAL format can handle at most three such buffers in a
single record so we can just have a fixed size buffer large enough to
hold three buffers and perform the diff on the three when the
extension says it has completed an atomic update. The simplest API
which I suspect would suffice for virtually all users would be to tie
this to buffer locking and unlocking.

The types of records which this would not suffice for would be

a) things that need extra book-keeping during recovery in case some
later record was not recorded. We have only a few such records -- page
splits as you say -- and hopefully extensions wouldn't need them
because frankly they're pretty scary.

b) records that are needed not just to maintain consistency of the
data in the database but to provide some other behaviour -- for
instance the WAL records for locks that 2PC needs or the snapshot
records that hot standby needs. But then these types of records might
be more amenable to an extensible WAL format. Since the loss of them
wouldn't leave the database corrupted, just prevent that feature from
operating correctly.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers