Re: [HACKERS] Todo item: Support amgettuple() in GIN
On 11/29/2013 01:13 AM, Andreas Karlsson wrote: When doing partial matching the code need to be able to return the union of all TIDs in all the matching posting trees in TID order (to be able to do AND and OR operations with multiple search keys later). It does this by traversing them posting tree after posting tree and collecting them all in a TIDBitmap which is later iterated over. I think it's not a plain union. My understanding is that - to evaluate a single key (typically array) - you first need to get all the TID streams for that key (i.e. one posting list/tree per element of the key array) and then iterate all these streams in parallel and 'merge' them using consistent() function. That's how I understand ginget.c:keyGetItem(). So the problem of partial match is (IMO) that there can be too many TID streams to merge - much more than the number of elements of the key array. // Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Todo item: Support amgettuple() in GIN
On 11/29/2013 09:54 AM, Antonin Houska wrote: On 11/29/2013 01:13 AM, Andreas Karlsson wrote: When doing partial matching the code need to be able to return the union of all TIDs in all the matching posting trees in TID order (to be able to do AND and OR operations with multiple search keys later). It does this by traversing them posting tree after posting tree and collecting them all in a TIDBitmap which is later iterated over. I think it's not a plain union. My understanding is that - to evaluate a single key (typically array) - you first need to get all the TID streams for that key (i.e. one posting list/tree per element of the key array) and then iterate all these streams in parallel and 'merge' them using consistent() function. That's how I understand ginget.c:keyGetItem(). For partial matches the merging is done in two steps: first a simple union of all the streams per key and then second merging those union streams using the consistent() function. It is the first step that can be lossy. So the problem of partial match is (IMO) that there can be too many TID streams to merge - much more than the number of elements of the key array. Agreed. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Todo item: Support amgettuple() in GIN
On 11/29/2013 01:57 PM, Andreas Karlsson wrote: On 11/29/2013 09:54 AM, Antonin Houska wrote: On 11/29/2013 01:13 AM, Andreas Karlsson wrote: When doing partial matching the code need to be able to return the union of all TIDs in all the matching posting trees in TID order (to be able to do AND and OR operations with multiple search keys later). It does this by traversing them posting tree after posting tree and collecting them all in a TIDBitmap which is later iterated over. I think it's not a plain union. My understanding is that - to evaluate a single key (typically array) - you first need to get all the TID streams for that key (i.e. one posting list/tree per element of the key array) and then iterate all these streams in parallel and 'merge' them using consistent() function. That's how I understand ginget.c:keyGetItem(). For partial matches the merging is done in two steps: first a simple union of all the streams per key and then second merging those union streams using the consistent() function. Yes, short after I sent my previous mail I realized that your union probably referred to the things that collectMatchBitmap() does. // Tony -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Todo item: Support amgettuple() in GIN
Andreas Karlsson andr...@proxel.se writes: I decided to look into how much work implementing the todo item about supporting amgettuple in GIN would be, since exclusion constraints on GIN would be neat. Robert Haas suggested a solution[1], but to fix it we also need to look into why the commit message mentions that it did not work anyway with the partial matches. ... This TIDBitmap becomes lossy if it too many TIDs are added to it, and this case is what broke amgettuple for partial matches. Right, see http://www.postgresql.org/message-id/49ac300f.1050...@enterprisedb.com Note that fixing the potential lossiness in scanning is not the only roadblock to re-enabling amgettuple. Fast updates also pose problems: http://www.postgresql.org/message-id/4974b002.3040...@sigaev.ru Half of that is basically the same lossiness problem, but the other half is that we're relying on the bitmap to suppress duplicate reports of the same TID. It's fairly hard to see how you'd avoid that without creating other problems. Note that Robert's proposed solution is no solution, because it just puts you right back in the bind of needing guaranteed non-lossy storage of a TID set that might be too big to fit in memory. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Todo item: Support amgettuple() in GIN
On 11/29/2013 07:13 PM, Tom Lane wrote: Andreas Karlsson andr...@proxel.se writes: I decided to look into how much work implementing the todo item about supporting amgettuple in GIN would be, since exclusion constraints on GIN would be neat. Robert Haas suggested a solution[1], but to fix it we also need to look into why the commit message mentions that it did not work anyway with the partial matches. ... This TIDBitmap becomes lossy if it too many TIDs are added to it, and this case is what broke amgettuple for partial matches. Right, see https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/message-id/49AC300F.1050903%40enterprisedb.comk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=OqhHlGFG81LH1EqJLzTW8HuXdXslGEL%2FPu1f27HxV%2Bs%3D%0As=9f3fad064e2845bd2b99c85f684d237fbe96e542081e4b2dc49b1fe51f91f144 Note that fixing the potential lossiness in scanning is not the only roadblock to re-enabling amgettuple. Fast updates also pose problems: https://urldefense.proofpoint.com/v1/url?u=http://www.postgresql.org/message-id/4974B002.3040202%40sigaev.ruk=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0Ar=xGch4oNJbpD%2BKPJECmgw4SLBhytSZLBX7UnkZhtNcpw%3D%0Am=OqhHlGFG81LH1EqJLzTW8HuXdXslGEL%2FPu1f27HxV%2Bs%3D%0As=0e08a781fcc17a3d68ce247344a3499a23a9f545b937f254439dadfaf7b9b8ab Half of that is basically the same lossiness problem, but the other half is that we're relying on the bitmap to suppress duplicate reports of the same TID. It's fairly hard to see how you'd avoid that without creating other problems. Note that Robert's proposed solution is no solution, because it just puts you right back in the bind of needing guaranteed non-lossy storage of a TID set that might be too big to fit in memory. You can always call amgetbitmap, and return the tuples from the bitmap one by one. For a lossy result, re-check all tuples on the page. IOW, do a bitmap index + heap scan. You could do that within indexam.c, and present the familiar index_getnext() interface for callers. Or you could modify the exclusion constraint code to do that if amgettuple is not available - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Todo item: Support amgettuple() in GIN
On 11/29/2013 06:13 PM, Tom Lane wrote: Note that Robert's proposed solution is no solution, because it just puts you right back in the bind of needing guaranteed non-lossy storage of a TID set that might be too big to fit in memory. The solution should work if we could guarantee that a TIDBitmap based on the fast update pending list always will fit in the memory. That does not sound like a good assumption to me. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Todo item: Support amgettuple() in GIN
Hi, I decided to look into how much work implementing the todo item about supporting amgettuple in GIN would be, since exclusion constraints on GIN would be neat. Robert Haas suggested a solution[1], but to fix it we also need to look into why the commit message mentions that it did not work anyway with the partial matches. So I looked into that first, and here is my explanation for why it is broken for partial matches. I am sending this to the mailing list to check if I am correct and if so update the todo list with this new information. = Explanation When doing normal matching the code simply traverses the matching posting tree in TID order. When doing partial matching the code need to be able to return the union of all TIDs in all the matching posting trees in TID order (to be able to do AND and OR operations with multiple search keys later). It does this by traversing them posting tree after posting tree and collecting them all in a TIDBitmap which is later iterated over. This TIDBitmap becomes lossy if it too many TIDs are added to it, and this case is what broke amgettuple for partial matches. To fix this it seems to me that either lossy pages would need to be rescanned by gingettuple or a version of collectMatchBitmap needs to be written which merges the matched posting trees in another way, probably by iterating over all of them at the same time. Does this diagnosis sound correct? = Footnotes 1. http://www.postgresql.org/message-id/ca+tgmobzhfrjnyz-fyw5kdtrurk0hjwp0vtp5fgzle6evsw...@mail.gmail.com -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers