Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Antonin Houska
On 11/29/2013 01:13 AM, Andreas Karlsson wrote:

 When doing partial matching the code need to be able to return the union 
 of all TIDs in all the matching posting trees in TID order (to be able 
 to do AND and OR operations with multiple search keys later). It does 
 this by traversing them posting tree after posting tree and collecting 
 them all in a TIDBitmap which is later iterated over.

I think it's not a plain union. My understanding is that - to evaluate a
single key (typically array) - you first need to get all the TID streams
for that key (i.e. one posting list/tree per element of the key array)
and then iterate all these streams in parallel and 'merge' them using
consistent() function. That's how I understand ginget.c:keyGetItem().

So the problem of partial match is (IMO) that there can be too many TID
streams to merge - much more than the number of elements of the key array.

// Antonin Houska (Tony)



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Andreas Karlsson

On 11/29/2013 09:54 AM, Antonin Houska wrote:

On 11/29/2013 01:13 AM, Andreas Karlsson wrote:


When doing partial matching the code need to be able to return the union
of all TIDs in all the matching posting trees in TID order (to be able
to do AND and OR operations with multiple search keys later). It does
this by traversing them posting tree after posting tree and collecting
them all in a TIDBitmap which is later iterated over.


I think it's not a plain union. My understanding is that - to evaluate a
single key (typically array) - you first need to get all the TID streams
for that key (i.e. one posting list/tree per element of the key array)
and then iterate all these streams in parallel and 'merge' them using
consistent() function. That's how I understand ginget.c:keyGetItem().


For partial matches the merging is done in two steps: first a simple 
union of all the streams per key and then second merging those union 
streams using the consistent() function.


It is the first step that can be lossy.


So the problem of partial match is (IMO) that there can be too many TID
streams to merge - much more than the number of elements of the key array.


Agreed.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Antonin Houska
On 11/29/2013 01:57 PM, Andreas Karlsson wrote:
 On 11/29/2013 09:54 AM, Antonin Houska wrote:
 On 11/29/2013 01:13 AM, Andreas Karlsson wrote:

 When doing partial matching the code need to be able to return the union
 of all TIDs in all the matching posting trees in TID order (to be able
 to do AND and OR operations with multiple search keys later). It does
 this by traversing them posting tree after posting tree and collecting
 them all in a TIDBitmap which is later iterated over.

 I think it's not a plain union. My understanding is that - to evaluate a
 single key (typically array) - you first need to get all the TID streams
 for that key (i.e. one posting list/tree per element of the key array)
 and then iterate all these streams in parallel and 'merge' them  using
 consistent() function. That's how I understand ginget.c:keyGetItem().
 
 For partial matches the merging is done in two steps: first a simple 
 union of all the streams per key and then second merging those union 
 streams using the consistent() function.

Yes, short after I sent my previous mail I realized that your union
probably referred to the things that collectMatchBitmap() does.

// Tony



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Tom Lane
Andreas Karlsson andr...@proxel.se writes:
 I decided to look into how much work implementing the todo item about 
 supporting amgettuple in GIN would be, since exclusion constraints on 
 GIN would be neat. Robert Haas suggested a solution[1], but to fix it we 
 also need to look into why the commit message mentions that it did not 
 work anyway with the partial matches.
 ...
 This TIDBitmap becomes lossy if it too many TIDs are added to it, and 
 this case is what broke amgettuple for partial matches.

Right, see
http://www.postgresql.org/message-id/49ac300f.1050...@enterprisedb.com

Note that fixing the potential lossiness in scanning is not the only
roadblock to re-enabling amgettuple.  Fast updates also pose problems:
http://www.postgresql.org/message-id/4974b002.3040...@sigaev.ru

Half of that is basically the same lossiness problem, but the other
half is that we're relying on the bitmap to suppress duplicate reports
of the same TID.  It's fairly hard to see how you'd avoid that without
creating other problems.

Note that Robert's proposed solution is no solution, because it just
puts you right back in the bind of needing guaranteed non-lossy
storage of a TID set that might be too big to fit in memory.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Heikki Linnakangas

On 11/29/2013 07:13 PM, Tom Lane wrote:

Andreas Karlsson andr...@proxel.se writes:

I decided to look into how much work implementing the todo item about
supporting amgettuple in GIN would be, since exclusion constraints on
GIN would be neat. Robert Haas suggested a solution[1], but to fix it we
also need to look into why the commit message mentions that it did not
work anyway with the partial matches.
...
This TIDBitmap becomes lossy if it too many TIDs are added to it, and
this case is what broke amgettuple for partial matches.


Right, see
https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/message-id/49AC300F.1050903%40enterprisedb.comk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=OqhHlGFG81LH1EqJLzTW8HuXdXslGEL%2FPu1f27HxV%2Bs%3D%0As=9f3fad064e2845bd2b99c85f684d237fbe96e542081e4b2dc49b1fe51f91f144

Note that fixing the potential lossiness in scanning is not the only
roadblock to re-enabling amgettuple.  Fast updates also pose problems:
https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/message-id/4974B002.3040202%40sigaev.ruk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=OqhHlGFG81LH1EqJLzTW8HuXdXslGEL%2FPu1f27HxV%2Bs%3D%0As=0e08a781fcc17a3d68ce247344a3499a23a9f545b937f254439dadfaf7b9b8ab

Half of that is basically the same lossiness problem, but the other
half is that we're relying on the bitmap to suppress duplicate reports
of the same TID.  It's fairly hard to see how you'd avoid that without
creating other problems.

Note that Robert's proposed solution is no solution, because it just
puts you right back in the bind of needing guaranteed non-lossy
storage of a TID set that might be too big to fit in memory.


You can always call amgetbitmap, and return the tuples from the bitmap 
one by one. For a lossy result, re-check all tuples on the page. IOW, do 
a bitmap index + heap scan. You could do that within indexam.c, and 
present the familiar index_getnext() interface for callers. Or you could 
modify the exclusion constraint code to do that if amgettuple is not 
available


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Todo item: Support amgettuple() in GIN

2013-11-29 Thread Andreas Karlsson

On 11/29/2013 06:13 PM, Tom Lane wrote:

Note that Robert's proposed solution is no solution, because it just
puts you right back in the bind of needing guaranteed non-lossy
storage of a TID set that might be too big to fit in memory.


The solution should work if we could guarantee that a TIDBitmap based on 
the fast update pending list always will fit in the memory. That does 
not sound like a good assumption to me.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Todo item: Support amgettuple() in GIN

2013-11-28 Thread Andreas Karlsson

Hi,

I decided to look into how much work implementing the todo item about 
supporting amgettuple in GIN would be, since exclusion constraints on 
GIN would be neat. Robert Haas suggested a solution[1], but to fix it we 
also need to look into why the commit message mentions that it did not 
work anyway with the partial matches.


So I looked into that first, and here is my explanation for why it is 
broken for partial matches. I am sending this to the mailing list to 
check if I am correct and if so update the todo list with this new 
information.


= Explanation

When doing normal matching the code simply traverses the matching 
posting tree in TID order.


When doing partial matching the code need to be able to return the union 
of all TIDs in all the matching posting trees in TID order (to be able 
to do AND and OR operations with multiple search keys later). It does 
this by traversing them posting tree after posting tree and collecting 
them all in a TIDBitmap which is later iterated over.


This TIDBitmap becomes lossy if it too many TIDs are added to it, and 
this case is what broke amgettuple for partial matches.


To fix this it seems to me that either lossy pages would need to be 
rescanned by gingettuple or a version of collectMatchBitmap needs to be 
written which merges the matched posting trees in another way, probably 
by iterating over all of them at the same time.


Does this diagnosis sound correct?

= Footnotes

1. 
http://www.postgresql.org/message-id/ca+tgmobzhfrjnyz-fyw5kdtrurk0hjwp0vtp5fgzle6evsw...@mail.gmail.com


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers