In the last week:
   1) I tried to find out what's wrong with my last patch: reducing the 
contention on SerializableFinishedListLock. 
   2) I used a skip list to replace the linked list to speedup conflict 
tracking. But there are still some bugs resulting in segment fault. 
       Here is the code: 
https://github.com/liumx10/postgresql/commit/c469e14e27c31edba5caff001647440040f53c84
       I will figure it out in the next days.


Here is the details.
1) Reducing the contention on SerializableFinishedListLock
In the last email, I reported that reducing the contention on FinishedListLock 
didn't increase the performance. The system became even 
slower in some cases. I thought the key point was not the problem of my codes. 
The original Pseudo code looks like this:
     
     lockAcquire(SerializableFinishedListLock)
     for  predlock in this Sxact:
            lockAcquire(patitionlock);
           // do something;
            unlock(paritionlock);
     releaseAcquire(SerializableFinishedListLock)


SerializableFinishedListLock likes a coarse-grained lock, while partition locks 
like fine-grained lock. My patch essentially remove the outer lock.
But unfortunately, there are many other places requiring partition locks. We 
have only 16 partition locks in all. 
So even without SerializableFinishedListLock, contentions will happen on 
partition locks. The performance wouldn't be improved. 


Why the performance became worse in some cases? Context switch may be one of 
the reason. 
It was easier to fail on getting locks when using fine-grained locks,
which would hang the current process and result in a context switch.
For example, if it failed to get partition locks 5 times in a for loop, there 
would be 5 context switch at least.
I used vmstat to count the context switch. In the benchmark where my patch had 
a worse performance, there were more context switch happened than using the 
original code. 
I also modify the source code to count where requiring partition locks failed. 
In the original version, only a few (no more than 1%) locking failure came from 
the pseudo code above. 
But in my patch, more than 50% failure came from this part. 


2) Skip list
To make it simple, I didn't develop an independent data structure for skip 
list. Instead, I used two linked list to simulate skip list. 
Most transactions have no more than 50 conflicts, so two level skip list is 
enough now. 
I'll make a clean patch after I figure out bugs. 


--

Sincerely


Mengxing Liu






Reply via email to