Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-29 Thread Mengxing Liu
Thanks, I've updated the proposal. Just one issue: 
I agree that we can make skip list a general data structure.  But can we use 
the fixed-level skip list as a Plan B? Or a quick attempt before the general 
data structure ? 
Because I am not familiar with shared memory structure and tricks used in it, 
and I cannot estimate how much time it would take. 


> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-28 00:16:11 (Tuesday)
> To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate 
> O(N^2) scaling from rw-conflict tracking in serializable transactions
> 
> On Sat, Mar 25, 2017 at 9:24 PM, Mengxing Liu
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> 
> > I've updated the proposal according to your comments.
> > But I am still wondering that can you review it for a double-check
> > to make sure I've made everything clear?
> 
> Additional comments added.
> 
> I'm afraid a few new issues came to mind reading it again.  (Nothing
> serious; just some points that could benefit from a little
> elaboration.)
> 
> --
> Kevin Grittner
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-25 Thread Mengxing Liu
Thanks for your time. I've updated the proposal according to your comments. 
But I am still wondering that can you review it for a double-check to make sure 
I've made everything clear? 
You can read my replies for reference. 

> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-25 04:53:36 (Saturday)
> To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate 
> O(N^2) scaling from rw-conflict tracking in serializable transactions
> 
> On Wed, Mar 22, 2017 at 2:24 AM, Mengxing Liu
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> 
> > I've finished a draft proposal for "Eliminate O(N^2) scaling from
> > rw-conflict tracking in serializable transactions".
> 
> I've attached some comments to the document; let me know if they
> don't show up for you or you need clarification.
> 
> Overall, if the comments are addressed, I think it is great.
> 
> --
> Kevin Grittner


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-15 Thread Mengxing Liu



> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-15 23:20:07 (Wednesday)
> To: DEV_OPS <dev...@ww-it.cn>
> Cc: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Mar 14, 2017 at 3:45 PM, Kevin Grittner <kgri...@gmail.com> wrote:
> >> On 3/14/17 17:34, Mengxing Liu wrote:
> >>> There are several alternative benchmarks. Tony suggested that we
> >>> should use TPC-E and TPC-DS.
> >
> > More benchmarks is better, all other things being equal.  Keep in
> > mind that good benchmarking practice with PostgreSQL generally
> > requires a lot of setup time (so that we're starting from the exact
> > same conditions for every run), a lot of run time (so that the
> > effects of vacuuming, bloat, and page splitting all comes into play,
> > like it would in the real world), and a lot of repetitions of each
> > run (to account for variation).  In particular, on a NUMA machine it
> > is not at all unusual to see bifurcated
> 
> Sorry I didn't finish that sentence.
> 
> On a NUMA machine It is not at all unusual to see bifurcated results
> -- with each run coming in very close to one number or a second
> number, often at about a 50/50 rate, with no numbers falling
> anywhere else.  This seems to be based on where the processes and
> memory allocations happen to land.
> 

Do you mean that for a NUMA machine, there usually exists two different results 
of its performance? 
Just two? Neither three nor four?  

Anyway, firstly, I think I should get familiar with PostgreSQL and tools you 
recommended to me at first. 
Then I will try to have a better comprehension about it, to make  our 
discussion more efficient.

Recently, I am busy preparing for the presentation at ASPLOS'17  and other lab 
works. 
But I promise I can finish all preparation works before May. Here is my working 
plan: 
At first, I will compile and install PostgreSQL by myself and try the profile 
tools (perf or oprofile).
Then I will run one or two benchmarks using different config, where I may need 
your help to ensure that my tests are close to the practical situation.
 
PS: Disable NUMA in BIOS means that CPU can use its own memory controller when 
accessing local memory to reduce hops. 
On the contrary, "enable" means UMA. Therefore, I think Tony is right, we 
should disable this setting.
I got the information from here. 
http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/

Regards.

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-22 Thread Mengxing Liu
Hi, Kevin.

I've finished a draft proposal for "Eliminate O(N^2) scaling from rw-conflict 
tracking in serializable transactions".  You can find it from GSOC website or 
by the link below. 
https://docs.google.com/document/d/17TAs3EJIokwPU7UTUmnlVY3ElB-VHViyX1zkQJmrD1A/edit?usp=sharing

I was wondering if you have time to review the proposal and give me some 
comments?  

> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-17 21:57:18 (Friday)
> To: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Cc: 
> Subject: [HACKERS] Guidelines for GSoC student proposals
> 
> I've found various sources that give hints about what a student
> proposal should look like, but nothing I could just give as a link,
> so I pulled together what I could find, tempered by my own ideas and
> opinions.  I suggest that we send the below, or something like it to
> each student who expresses interest in making a proposal, or who
> submits a proposal that doesn't meet the below guidelines.  Thoughts
> or suggestions for changes before we do?  Remember, time is short,
> so this cannot be a 200 message bike-shedding debate -- we just need
> to provide some sort of guidance to students in a timely way, with
> the timeline being:
> 
>   February 27 - March 20
> Potential student participants discuss application ideas with
> mentoring organizations
>   March 20 16:00 UTC
> Student application period opens
>   April 3 16:00 UTC
> Student application deadline
> 
> Each GSoC student proposal should be a PDF file of 6 to 8 pages.  In
> the end, Google will publish these documents on a web page, so the
> student should make each proposal something which they will be happy
> to have future potential employers review.
> 
> Some ideas for desirable content:
> 
>   - A resume or CV of the student, including any prior GSoC work
>   - Their reasons for wanting to participate
>   - What else they have planned for the summer, and what their time
> commitment to the GSoC work will be
>   - A clear statement that there will be no intellectual property
> problems with the work they will be doing -- that the PostgreSQL
> community will be able to use their work without encumbrances
> (e.g., there should be no agreements related to prior or
> ongoing work which might assign the rights to the work they do
> to someone else)
>   - A description of what they will do, and how
>   - Milestones with dates
>   - What they consider to be the test that they have successfully
> completed the project
> 
> Note that a student proposal is supposed to be far more detailed
> than the ideas for projects provided by the organization -- those
> are intended to be ideas for what the student might write up as a
> proposal, not ready-to-go proposal documents.
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guidelines for GSoC student proposals / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-30 Thread Mengxing Liu

> > I agree that we can make skip list a general data structure.  But
> > can we use the fixed-level skip list as a Plan B? Or a quick attempt
> > before the general data structure ?
> > Because I am not familiar with shared memory structure and tricks
> > used in it, and I cannot estimate how much time it would take.
> 
> It's not really too bad for fixed allocation shared memory, and I
> can help with that.  If I thought it would save much I could see
> doing a prototype without generalization, but you would still have
> most of the same shared memory issues, since the structure *must*
> live in shared memory.
> 

Thank you. If there is no other problem, I will submit the proposal. 

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-14 Thread Mengxing Liu

I send this email to Tony, too. Because he promised to help me with testing and 
benchmarking.

> 
> >> The worst problems have been
> >> seen with 32 or more cores on 4 or more sockets with a large number
> >> of active connections.  I don't know whether you have access to a
> >> machine capable of putting this kind of stress on it (perhaps at
> >> your university?), but if not, the community has access to various
> >> resources we should be able to schedule time on.
> >
> > There is a NUMA machine ( 120 cores, 8 sockets) in my lab.
> 
> Fantastic!  Can you say a bit more about the architecture and OS?
> 

Intel(R) Xeon(R) CPU at 2.3GHz, with 1TB physical DRAM and 1.5 TB SSD, running 
Ubuntu 14.04, Kernel 3.19.
I guess NUMA is disabled in BIOS, but I will check that. 
However, there is only one NIC, so network could be the bottleneck if we use 
too many cores?

> > I think it's enough to put this kind of stress.
> 
> The researchers who saw this bottleneck reported that performance
> started to dip at 16 cores and the problem was very noticeable at 32
> cores.  A stress test with 120 cores on 8 sockets will be great!
> 
> I think perhaps the first milestone on the project should be to
> develop a set of benchmarks we want to compare to at the end.  That
> would need to include a stress test that clearly shows the problem
> we're trying to solve, along with some cases involving 1 or two
> connections -- just to make sure we don't harm performance for
> low-contention situations.
> 

Thank for your advice! It's really helpful for my proposal. 

There are several alternative benchmarks. Tony suggested that we should use 
TPC-E and TPC-DS. 
Personally, I am more familiar with TPC-C. And pgbench seems only have TPC-B 
built-in benchmark.
Well, I think we can easily find the implementations of these benchmarks for 
PostgreSQL.
The paper you recommended to me used a special benchmark defined by themselves. 
But it is quite simple and easy to implement.

For me, the challenge is profiling the execution. Is there any tools in 
PostgreSQL to analysis where is the CPU cycles consumed?
Or I have to instrument and time by myself?


Regards.

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-11 Thread Mengxing Liu



> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-03-12 04:24:29 (Sunday)
> To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Cc: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in 
> serializable transactions
> 
> On Fri, Mar 10, 2017 at 9:12 PM, Mengxing Liu
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> 
> > My name is Mengxing Liu. I am interested in the project "Eliminate
> > O(N^2) scaling from rw-conflict tracking in serializable
> > transactions”. After discussing with Kevin off-list, I think it's
> > time to post discussion here. I am afraid that there are two things
> > that I need your help. Thank you very much.
> 
> Welcome to the -hackers list!  This is a key part of how development
> happens in the community.  Don't be shy about posting questions and
> ideas, but also expect fairly blunt discussion to occur at times;
> don't let that put you off.
> 

Thanks, Kevin. I will keep that in mind. 

> >>> So the task is clear. We can use a tree-like or hash-like data
> >>> structure to speed up this function.
> >>
> >> Right -- especially with a large number of connections holding a
> >> large number of conflicts.  In one paper with high concurrency, they
> >> found over 50% of the CPU time for PostgreSQL was going to these
> >> functions (including functions called by them).  This seems to me to
> >> be due to the O(N^2) (or possibly worse) performance from the number
> >> of connections.
> >
> > Anyone knows the title of this paper? I want to reproduce its
> > workloads.
> 
> I seem to remember there being a couple other papers or talks, but
> this is probably the most informative:
> 
> http://sydney.edu.au/engineering/it/research/tr/tr693.pdf
> 

Thanks, It's exactly what I need.

> >> Remember, I think most of the work here is going to be in
> >> benchmarking.  We not only need to show improvements in simple test
> >> cases using readily available tools like pgbench, but think about
> >> what types of cases might be worst for the approach taken and show
> >> that it still does well -- or at least not horribly.  It can be OK
> >> to have some slight regression in an unusual case if the common
> >> cases improve a lot, but any large regression needs to be addressed
> >> before the patch can be accepted.  There are some community members
> >> who are truly diabolical in their ability to devise "worst case"
> >> tests, and we don't want to be blind-sided by a bad result from one
> >> of them late in the process.
> >>
> >
> > Are there any documents or links introducing how to test and
> > benchmark PostgreSQL? I may need some time to learn about it.
> 
> There is pgbench:
> 
> https://www.postgresql.org/docs/devel/static/pgbench.html
> 
> A read-only load and a read/write mix should both be tested,
> probably with a few different scales and client counts to force the
> bottleneck to be in different places.  The worst problems have been
> seen with 32 or more cores on 4 or more sockets with a large number
> of active connections.  I don't know whether you have access to a
> machine capable of putting this kind of stress on it (perhaps at
> your university?), but if not, the community has access to various
> resources we should be able to schedule time on.
> 

There is a NUMA machine ( 120 cores, 8 sockets) in my lab. I think it's enough 
to put this kind of stress. 
Anyway, I would ask for help if necessary.

> It may pay for you to search through the archives of the last year
> or two to look for other benchmarks and see what people have
> previously done.
> 

I would try.



--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-10 Thread Mengxing Liu
Hi George, 

I am Mengxing Liu. Happy to meet someone with the same idea : )

I have been concentrating on it for a long time, reading papers, reading source 
codes, and discussing details with Mr Grittner.  So I really understand your 
passion on it. But definitely I don't want all these effects to be in vain. So, 
maybe a little ruthless, would you mind to consider transferring to the other 
one?


> -原始邮件-
> 发件人: "Kevin Grittner" <kgri...@gmail.com>
> 发送时间: 2017-03-10 22:57:03 (星期五)
> 收件人: "George Papadrosou" <gpapadro...@gmail.com>, "刘梦醒" 
> <liu-m...@mails.tsinghua.edu.cn>
> 抄送: "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> 主题: Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> [including Mengxing Liu in response, for reasons that should become
> obvious below...]
> 
> Hi George,
> 
> On Thu, Mar 9, 2017 at 6:49 PM, George Papadrosou <gpapadro...@gmail.com> 
> wrote:
> 
> > my name is George Papadrosou, this is my first semester as
> > graduate student at Georgia Tech and would like to submit a
> > proposal to Google Summer of Code, for the project "Eliminate
> > O(N^2) scaling from rw-conflict tracking in serializable
> > transactions”.
> 
> I was recently contacted off-list by Mengxing Liu, who has been
> looking at the same project, and said he was planning to submit a
> GSoC proposal.  Rather than have one of you sit this out, do either
> of you feel comfortable taking a different project instead?  Since
> you've both been looking at the serializable code and supporting
> documents, perhaps one of you could change to the other suggested
> Serializable project?
> 
> > I am going to prepare a draft proposal for this project and share
> > it with you soon. The project’s description is pretty clear, do
> > you think it should be more strictly defined in the proposal?
> 
> At a minimum, the proposal should include list of milestones you
> expect to reach along the way, and a timeline indicating when you
> expect to reach them.  Some description of the benchmarks you intend
> to run would be also be very good.
> 
> > Until then, I would like to familiarize myself a bit with the
> > codebase and fix some bug/todo. I didn’t find many [E] marked
> > tasks in the todo list so the task I was thinking is "\s without
> > arguments (display history) fails with libedit, doesn't use pager
> > either - psql \s not working on OSX”. However, it works on my OSX
> > El Capitan laptop with Postgres 9.4.4. Would you suggest some
> > other starter task?
> 
> There is a CommitFest in progress; reviewing patches is a good way
> to become involved and familiar with the community processes.
> 
> https://wiki.postgresql.org/wiki/CommitFest
> 
> --
> Kevin Grittner


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-03-10 Thread Mengxing Liu
Hi, all 

My name is Mengxing Liu. I am interested in the project "Eliminate O(N^2) 
scaling from rw-conflict tracking in serializable transactions”. After 
discussing with Kevin off-list, I think it's time to post discussion here. I am 
afraid that there are two things that I need your help. Thank you very much.

> 
> > So the task is clear. We can use a tree-like or hash-like data
> > structure to speed up this function.
> 
> Right -- especially with a large number of connections holding a
> large number of conflicts.  In one paper with high concurrency, they
> found over 50% of the CPU time for PostgreSQL was going to these
> functions (including functions called by them).  This seems to me to
> be due to the O(N^2) (or possibly worse) performance from the number
> of connections.

Anyone knows the title of this paper? I want to reproduce its workloads.

>
> Remember, I think most of the work here is going to be in
> benchmarking.  We not only need to show improvements in simple test
> cases using readily available tools like pgbench, but think about
> what types of cases might be worst for the approach taken and show
> that it still does well -- or at least not horribly.  It can be OK
> to have some slight regression in an unusual case if the common
> cases improve a lot, but any large regression needs to be addressed
> before the patch can be accepted.  There are some community members
> who are truly diabolical in their ability to devise "worst case"
> tests, and we don't want to be blind-sided by a bad result from one
> of them late in the process.
> 

Are there any documents or links introducing how to test and benchmark 
PostgreSQL? I may need some time to learn about it.

Thanks.

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GOSC' 17][Performance report] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-31 Thread Mengxing Liu
Hi, all. I wrote a performance report to conclude what I've done so far.
https://docs.google.com/document/d/16A6rfJnQSTkd0SHK-2XzDiLj7aZ5nC189jGPcfVdhMQ/edit?usp=sharing



Three patch are attached. I used the benchmark below to test the performance.
https://github.com/liumx10/pg-bench
It requires golang (>= 1.6) if you want to reproduce the code. 


NOTE:
1. The reason why hash table or skip list didn't improve the performance is that
maintaining the conflict list became slower even though conflict tracking was 
faster.
So far, skip list is the most promising way. But the code is a little tricky.


BTW, if there is a case in which inserting an conflict element is rare but 
conflict checking is common,
my patch may be better. 


2. Reducing contention on global lock has a better performance in this 
evaluation.
But two weeks ago when I used another machine, it has a worse performance. 
It's hard to explain why... 


You can reply directly if you have any questions or comments. 

--

Sincerely


Mengxing Liu








HTAB-for-conflict-tracking.patch
Description: Binary data


skip-list-for-conflict-tracking.patch
Description: Binary data


reduce-contention-on-FinishedListLock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-14 Thread Mengxing Liu
In the last week, I tried these two ideas.

> -Original Messages-
> From: "Alvaro Herrera" <alvhe...@2ndquadrant.com>
> Sent Time: 2017-08-08 01:51:52 (Tuesday)
> * I wonder if a completely different approach to the finished xact
>   maintenance problem would be helpful.  For instance, in
>   ReleasePredicateLocks we call ClearOldPredicateLocks()
>   inconditionally, and there we grab the SerializableFinishedListLock
>   lock inconditionally for the whole duration of that routine, but
>   perhaps it would be better to skip the whole ClearOld stuff if the
>   SerializableFinishedListLock is not available?  Find out some way to
>   ensure that the cleanup is always called later on, but maybe skipping
>   it once in a while improves overall performance.
> 

I implemented the idea by this way: using LWLockConditionalAcquire instead of 
LWLockAcquire.
If the lock is held by others, just return. It's OK because the routine is used 
to clear old predicate locks 
but it doesn't matter who does the job. 

But we both ignored one thing: this idea doesn't speedup the releasing 
operation. It just avoids some processes
waiting for the lock. If the function ClearOldPredicateLocks is the bottleneck, 
skipping doesn't help anymore.
I attached the result of evaluation. This idea ( conditional lock) has the 
worst performance.

>
> * the whole predicate.c stuff is written using SHM_QUEUE.  I suppose
>   SHM_QUEUE works just fine, but predicate.c was being written at about
>   the same time (or a bit earlier) than the newer ilist.c interface was
>   being created, which I think had more optimization work thrown in.
>   Maybe it would be good for predicate.c to ditch use of SHM_QUEUE and
>   use ilist.c interfaces instead?  We could even think about being less
>   strict about holding exclusive lock on SerializableFinished for the
>   duration of ClearOldPredicateLocks, i.e. use only a share lock and
>   only exchange for exclusive if a list modification is needed.
> 

I used the double linked list in ilist.c but it didn't improve the performance.
I did a micro benchmark to compare the performance of SHM_QUEUE & ilist.c 
and didn't find any difference. So the result is reasonable.

But there is a voice to get rid of SHM_QUEUE because it does not make sense
to have two same implementations. What's your opinion?
 
> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sincerely


Mengxing Liu





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 6] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-16 Thread Mengxing Liu
In the last week:
   1) I tried to find out what's wrong with my last patch: reducing the 
contention on SerializableFinishedListLock. 
   2) I used a skip list to replace the linked list to speedup conflict 
tracking. But there are still some bugs resulting in segment fault. 
   Here is the code: 
https://github.com/liumx10/postgresql/commit/c469e14e27c31edba5caff001647440040f53c84
   I will figure it out in the next days.


Here is the details.
1) Reducing the contention on SerializableFinishedListLock
In the last email, I reported that reducing the contention on FinishedListLock 
didn't increase the performance. The system became even 
slower in some cases. I thought the key point was not the problem of my codes. 
The original Pseudo code looks like this:
 
 lockAcquire(SerializableFinishedListLock)
 for  predlock in this Sxact:
lockAcquire(patitionlock);
   // do something;
unlock(paritionlock);
 releaseAcquire(SerializableFinishedListLock)


SerializableFinishedListLock likes a coarse-grained lock, while partition locks 
like fine-grained lock. My patch essentially remove the outer lock.
But unfortunately, there are many other places requiring partition locks. We 
have only 16 partition locks in all. 
So even without SerializableFinishedListLock, contentions will happen on 
partition locks. The performance wouldn't be improved. 


Why the performance became worse in some cases? Context switch may be one of 
the reason. 
It was easier to fail on getting locks when using fine-grained locks,
which would hang the current process and result in a context switch.
For example, if it failed to get partition locks 5 times in a for loop, there 
would be 5 context switch at least.
I used vmstat to count the context switch. In the benchmark where my patch had 
a worse performance, there were more context switch happened than using the 
original code. 
I also modify the source code to count where requiring partition locks failed. 
In the original version, only a few (no more than 1%) locking failure came from 
the pseudo code above. 
But in my patch, more than 50% failure came from this part. 


2) Skip list
To make it simple, I didn't develop an independent data structure for skip 
list. Instead, I used two linked list to simulate skip list. 
Most transactions have no more than 50 conflicts, so two level skip list is 
enough now. 
I'll make a clean patch after I figure out bugs. 


--

Sincerely


Mengxing Liu








[HACKERS] [GSOC][weekly report 5] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-09 Thread Mengxing Liu
Sorry for report late. Our lab's machines crashed for several days. 


As I reported in the last email,
 
https://www.postgresql.org/message-id/5b6b452.16851.15cf1ec010e.coremail.liu-m...@mails.tsinghua.edu.cn
I tried to decrease the contention on SerializableFinishedListLock. 
It works actually.  But I don't know why my optimization results in another 
problems: more contention on PREDICATELOCK_MANAGER_LWLOCK.


Here is the result of statistics collector. There were 36 connections in all, 
so most of them are waiting for locks. 
 (benchmark is ssibench, but other benchmarks have the same result) 


Original:
SerializableXactHashLock, 0.36
predicate_lock_manager, 6.00
wal_insert, 0.07
SerializableFinishedListLock, 16.50


After optimization: 
SerializableXactHashLock, 0.35
predicate_lock_manager, 11.53
buffer_content, 0.12
SerializableFinishedListLock, 11.47


So in the end, the performance did not change obviously. 
Even if I enlarged the number of predicate locks (NUM_PREDICATELOCK_PARTITIONS) 
from 16 to 1024, 
there were still so many contentions. 


It bothered me for several days. The patch is attached. 
To focus on this issue, I removed other parts of modification. 
Do you have any advices for this ? 



--

Sincerely


Mengxing Liu








finishedlock.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 7] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-23 Thread Mengxing Liu
In the last week, I focus on
1) Creating an independent skip list data structure and related interfaces.
Now it has only two levels so that I don't have to modify too much existing 
code.  But it is easy to be transformed into the data structure with any number 
of levels if necessary. Unfortunately, its performance is not good. In some 
cases, it's only 1/2 of original version. It reminded me that even 
conflict-tracking didn't consume too much CPU time, it was on the critical path 
and wrapped by a pair of lock acquiring and releasing. Slower conflicts 
tracking would result in more lock contentions, which would make the 
performance drop quickly. 
2) Using some tricks to improve its performance. 
For example, I found if the length of the conflict list is smaller than 10, the 
original linked list is faster 
than the skip list. So I used it in a hybrid way: if the total conflicts are 
more than 10, using skip list; otherwise using linked list. 
Now, the performance is approximately equal to the original version in 
different benchmarks. 
But I don't found a case in which the new version is much faster. 


The patch is attached. 


So far, I have tried: 1) using hash table for conflict tracking.
2) reducing the global lock contention 
3) using skip list for conflict tracking.
But all of them did not improve the performance. So I'm a little confused now 
about what to do next. 
Could you please give me any suggestions? 


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-26 Thread Mengxing Liu
Hi, all. There was a very strange phenomenon I couldn't explain. So I was 
wondering if you can help me.


I was trying to replace the linked list with a skip list in serializable 
transaction object for faster conflict tracking. But the performance is bad.
So I used the instruction "rdtsc" to compare the speed of my skip list and the 
original linked list. The skip list was about 1.5x faster.


The interesting thing is that if I added the instruction "rdstc" at the end of 
the function "RWConflictExists", 
the performance of the whole system was increased by at most 3 times! 
Here is the result. 


| benchmarks | without rdtsc  | with rdtsc |
| simpe read/write | 4.91 | 14.16 |
| ssibench | 9.72 | 10.24 |
| tpcb | 26.45 | 26.38 |


( The simple read/write benchmark has the most number of conflicts. )


The patch is attached. All the difference of the two columns is with/without a 
simple line of code:
__asm__ __volatile__ ("rdtsc"); 
But I don't know why this instruction will influence the performance so much!


BTW, after adding the "rdtsc" instruction, the performance is better than the 
original version about 10% at most.
That means, the skip list can work! 


Looking forward to your advices. 

--
Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 4] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-28 Thread Mengxing Liu
In the last week:

I added a tpcb benchmark and refactored the test code. It likes a framework 
now. We can add other benchmarks easily if necessary. 
https://github.com/liumx10/pg-bench
Analyzed the code acquiring SerializableFinishedListLock and provided a new 
proposal to improve it. My proposal is as below. 
As I reported in previous emails, lots of backend were blocked by 
SerializableFinishedListLock in heavy contention. 
There are only three functions acquiring this lock:

SummarizeOldestCommittedSxact: After transforming the conflict information into 
SLRU format, we need to release this SerailizableXact object. The lock protects 
the release operation and removing it from the finished list.
ReleasePredicateLocks: After releasing the predicate locks of the current 
transaction, if the transaction is rolled back, we need to release the 
SerializableXact object; if the transaction is committed, we should insert it 
to the finished list. The lock protects the release operation or insert 
operation.
ClearOldPredicateLocks: Look through the finished transaction list to find if 
any transaction object can be released. Thus the lock protects looking through 
the list, releasing transaction objects, and removing objects from the list.
As we can see, the lock protects two things: 1) the finished transaction list  
2) releasing serializable transaction objects. 
Using a single global lock to protect all will cause a lot of contentions. 
So I decoupled the lock's duty into two parts: protecting the finished 
transaction list and 
protecting releasing a single serializable transaction objects.


The SerializableFinishedListLock is reserved to protect the finished 
transaction list. 
Thus the function SummarizeOldestCommittedSxact and ReleasePredicateLocks are 
not changed.
For function ClearOldPredicateLocks, I scan the list and pick up the 
transactions which could be released first,
but not release these objects directly. After releasing the 
SerializableFinishedListLock, invoking
 ReleaseOneSerializableXact to release them.


At first, I want to use a partition lock or spinlock in each SerailizableXact 
object to protect ReleaseOneSerializableXact.
But I found it is not necessary to add new locks. in 
ReleaseOneSerializableXact, firstly, it released all predicate locks, 
which is protected by SerializablePredicateLockListLock; Then, it released all 
conflicts, which is protected by SerializableXactHashLock.
So I didn't change the function ReleaseOneSerializableXact. 


I have implemented this idea. But unfortunately, it didn't improve the 
performance or reduce the contention on SerializableFinishedListLock. 
I'll try to find out why in the next days.
But I'm really looking forward to your advice for my proposal. We can't be too 
careful to modify the code related to locks. 


--

Sincerely


Mengxing Liu








Re: [HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-30 Thread Mengxing Liu
Thanks for your reply. 


Actually, the result of without "rdtsc" is reasonable. I used perf to analyze 
the performance and found that
even thought the function tracking conflicts (RWConflictExists) was faster, the 
function inserting conflicts (SetRWConflict)
was too slower. According to your suggestion, I found  there were much more 
waiting events of predicate_lock_manager.
That means, slower SetRWConflict resulted in more lock conflicts. 
So I took some effort to made it faster in the last days.  

Why the code with "rdtsc" is much faster? I thought that may be caused by some 
mistakes.
When I changed a machine to run the code, this phenomenon didn't happen 
anymore..
-Original Messages-
From: "Robert Haas" <robertmh...@gmail.com>
Sent Time: 2017-07-29 02:46:47 (Saturday)
To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
Cc: "Alvaro Herrera" <alvhe...@2ndquadrant.com>, kgrittn <kgri...@gmail.com>, 
"pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
Subject: Re: [HACKERS] [GSOC] Eliminate O(N^2) scaling from rw-conflict 
tracking in serializable transactions


On Wed, Jul 26, 2017 at 11:41 AM, Mengxing Liu <liu-m...@mails.tsinghua.edu.cn> 
wrote:

Hi, all. There was a very strange phenomenon I couldn't explain. So I was 
wondering if you can help me.


I was trying to replace the linked list with a skip list in serializable 
transaction object for faster conflict tracking. But the performance is bad.
So I used the instruction "rdtsc" to compare the speed of my skip list and the 
original linked list. The skip list was about 1.5x faster.


The interesting thing is that if I added the instruction "rdstc" at the end of 
the function "RWConflictExists", 
the performance of the whole system was increased by at most 3 times! 
Here is the result. 


| benchmarks | without rdtsc  | with rdtsc |
| simpe read/write | 4.91 | 14.16 |
| ssibench | 9.72 | 10.24 |
| tpcb | 26.45 | 26.38 |


( The simple read/write benchmark has the most number of conflicts. )


The patch is attached. All the difference of the two columns is with/without a 
simple line of code:
__asm__ __volatile__ ("rdtsc"); 
But I don't know why this instruction will influence the performance so much!


Lock contention is really expensive, so a slight delay that is just long enough 
to prevent the contention from happening can sometimes improve performance.  
This example is surprisingly dramatic, though.  Of course, we can't commit it 
this way -- it will break on non-x86.


I would suggest that you gather information on what wait events are occurring 
in the "without rdtsc" case.  Like this:



$ script

$ psql

psql=> select wait_event from pg_stat_activity;

psql=> \watch 0.5

...run test in another window...

^C

\q

^D

...use awk or perl or something to count up the wait events and see where the 
contention is happening...


--

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--

Sincerely


Mengxing Liu








[HACKERS] [GSOC][weekly report 8] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-07-30 Thread Mengxing Liu
In the last week, I focused on tuning the performance of skip list and fixed 
several bugs.
1. As only out-conflicts are checked in RWConflictExists, I removed all 
modification concerned with in-conflicts;
2. If the conflict list is too short, I inserted an item just like inserting 
into an ordered linked list, that is, 
comparing items existing in the list one by one to find the right position. 
Using skip list is slow when the list is short.
3. Not using the abstract skip list interface; I was afraid that it would bring 
a little overhead. But results showed that 
it had no influence. I will roll it back if necessary.


Sadly, The performance is not better than the original version.  But it's 
highest one among all efforts I did before.
It likes hash table. Tracking conflict is faster but inserting conflicts is 
slower.
In a quick test, cpu cycles consumed by two functions 
RWConflictExists/SetRWConflict is as below.
| | RWConflictExists | SetRWConflict |
| linked list | 1.39% | 0.14% |
| skip list | 1.15% | 0.35% |


According to the suggestions of Alvaro,  I'll give a detailed performance 
report tomorrow.


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-08 Thread Mengxing Liu
> From: "Alvaro Herrera" <alvhe...@2ndquadrant.com>

> * I wonder why did you settle on a skip list as the best optimization
>   path for this.  Did you consider other data structures?  (We don't
>   seem to have much that would be usable in shared memory, I fear.)
>
 
There are three typical alternative data structures: 1) unordered linked list, 
with O(N) searching and O(1) inserting/removing. 
2) tree-like data structures,  with O(log(N)) inserting/searching/removing. 
Skip list just likes a tree.
3) hash table , with O(1) inserting/searching.  But constant is much bigger 
than linked list. 
Any data structures can be classified into one of data structures above.

In PG serializable module,  number of conflicts is much smaller than we 
excepted, which means N is small.
So we should be very careful about constants before Big O notation. 
Sometimes O(N) complexity of linked list is faster than O(1) complexity of hash 
table.  
For example, when N is smaller than 5, HTAB is slower than SHM_QUEUE when 
searching an element.
And in all cases, remove operation of hash table is slower than that of linked 
list, 
which would result in more serious problem: more contention on 
SeriliazableFinishedListLock.

Skip list is better than HTAB because its remove operation has O(1) complexity. 
I think any other tree-like data structures won't do better. 

Shared memory is also the reason why I chose hash table and skip list at the 
beginning.
It's quite difficult to develop a totally new data structure in shared memory,
while there were well tested hash table and linked list code already.

> * I wonder if a completely different approach to the finished xact
>   maintenance problem would be helpful.  For instance, in
>   ReleasePredicateLocks we call ClearOldPredicateLocks()
>   inconditionally, and there we grab the SerializableFinishedListLock
>   lock inconditionally for the whole duration of that routine, but
>   perhaps it would be better to skip the whole ClearOld stuff if the
>   SerializableFinishedListLock is not available?  Find out some way to
>   ensure that the cleanup is always called later on, but maybe skipping
>   it once in a while improves overall performance.
> 

Yes, that sounds nice. Actually, for a special backend worker, it's OK to skip 
the ClearOldPredicateLocks.
Because other workers will do the clean job later. I'll try it later.

> * the whole predicate.c stuff is written using SHM_QUEUE.  I suppose
>   SHM_QUEUE works just fine, but predicate.c was being written at about
>   the same time (or a bit earlier) than the newer ilist.c interface was
>   being created, which I think had more optimization work thrown in.
>   Maybe it would be good for predicate.c to ditch use of SHM_QUEUE and
>   use ilist.c interfaces instead?  We could even think about being less
>   strict about holding exclusive lock on SerializableFinished for the
>   duration of ClearOldPredicateLocks, i.e. use only a share lock and
>   only exchange for exclusive if a list modification is needed.
> 

I read the code in ilist.c but I didn't see too much difference with SHM_QUEUE. 
Anyway, I'll do a small test to compare the performance of these two data 
structures.

> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sincerely


Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-07 Thread Mengxing Liu
In the last week, I focus on:


1) Continuing on optimization of skip list. Here is the new logic:
At first, I use an unordered linked list to do all inserting/searching 
operations.
When the number of conflicts is more than the threshold, transform the linked 
list into an ordered skip list.
Then all inserting/searching operations are done by the skip list.
By this way, for transactions with only a few conflicts, overhead of inserting 
is O(1). 
the patch is attached.


It helped a little, but just a little. In the end, the skip list has the same 
performance of the linked list.


2) Studying the performance of less contention on FinishedListLock. 
I found it can get benefit when the number of conflicts is medium. It is easy 
to understand the optimization 
has no influence when conflicts are too rare. But when there are too many 
conflicts, partition lock
will be the new bottleneck and the performance can't be improved. A chart is 
attached. This optimization
can achieve 14% speedup at most. 


Do you think this optimization can be accepted?


--

Sincerely


Mengxing Liu








skip-list-for-conflict-tracking.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-05-09 Thread Mengxing Liu
Hi, Alvaro and Kevin. I'm  Mengxing.  

This is a “synchronization” email to  tell you what I've done and my next plan. 
I'm looking forward to your advice. 




According to my proposal, I want to prepare the experimental environment during 
the community bonding period. 

As this is the first time I discuss with Alvaro, here I will introduce the 
environment again. 

My lab have a Lenovo System x3950 X6 machine. 

https://www.lenovo.com/images/products/system-x/pdfs/datasheets/x3950_x6_ds.pdf

More specifically, there are 8 sockets, each has 15 Intel(R) Xeon(R) CPU 
E7-8870 v2 @ 2.30GHz. 

Thus we have 120 cores in total. The storage is a 1 TB SSD, with SAS interface, 
500MB write bandwidth. 

OS is ubuntu 14.04. 




I've compiled & installed PostgreSQL and have tested it by using pgbench. I 
tuned parameter for database according to

http://www.dbdude.com/postgresql/postgresqlperformance.php#BESTPRACTICES

For pgbench, I set scale factor (-s) 512. Other configurations are default 
values. 




Because we have too many cores, SSD becomes the bottleneck. In my test, pgbench 
can scale to 36 connections. ( 18 KTPS throughput). CPU utilization is smaller 
than 30%. 

Therefore:

1. Is there anything wrong in my tuning parameters?For example, should I close 
"fsync"? Because we don't care recovery here. 

2. Can we use just two sockets (30 physical cores) to run database server? Then 
the CPU can be the bottleneck so that it  shows the problem we try to solve.







BTW. Here is the question of Stephen. 

>  What method of communication will be used among the mentors and with 
> Mengxing.

What method do you prefer?




>  Frequency of status updates (must be at least once a week and more often is 
> encouraged).

How about reporting my status once a week?




>  What steps will be taken next during the community bonding period.

As I wrote in the proposal, I want to establish the environment and read the 
related source code. Do you have any suggestions for me?




--
Mengxing Liu








Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-07 Thread Mengxing Liu


> From: "Kevin Grittner" <kgri...@gmail.com>
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> 
> > "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> > cpu user time should be about 12*4 = 48.
> > There seems to be no process blocked by IO.
> >
> > procs ---memory-- ---swap-- -io -system-- 
> > --cpu-
> >  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id 
> > wa st
> > 28  0  0 981177024 315036 7084376000 0 900  1  
> > 0 99  0  0
> > 21  1  0 981178176 315036 7084378400 0 0 25482 329020 
> > 12  3 85  0  0
> > 18  1  0 981179200 315036 7084379200 0 0 26569 323596 
> > 12  3 85  0  0
> > 17  0  0 981175424 315036 7084380800 0 0 25374 322992 
> > 12  4 85  0  0
> > 12  0  0 981174208 315036 7084382400 0 0 24775 321577 
> > 12  3 85  0  0
> >  8  0  0 981179328 315036 7084533600 0 0 13115 199020  
> > 6  2 92  0  0
> > 13  0  0 981179200 315036 7084579200 0 0 22893 301373 
> > 11  3 87  0  0
> > 11  0  0 981179712 315036 7084580800 0 0 26933 325728 
> > 12  4 84  0  0
> > 30  0  0 981178304 315036 7084582400 0 0 23691 315821 
> > 11  4 85  0  0
> > 12  1  0 981177600 315036 7084583200 0 0 29485 320166 
> > 12  4 84  0  0
> > 32  0  0 981180032 315036 7084584800 0 0 25946 316724 
> > 12  4 84  0  0
> > 21  0  0 981176384 315036 7084586400 0 0 24227 321938 
> > 12  4 84  0  0
> > 21  0  0 981178880 315036 7084588000 0 0 25174 326943 
> > 13  4 83  0  0
> 
> This machine has 120 cores?  Is hyperthreading enabled?  If so, what
> you are showing might represent a total saturation of the 30 cores.
> Context switches of about 300,000 per second is pretty high.  I can't
> think of when I've seen that except when there is high spinlock
> contention.
> 

Yes, and the hyper-threading is closed. 

> Just to put the above in context, how did you limit the test to 30
> cores?  How many connections were open during the test?
> 

I used numactl to limit the test in the first two sockets (15 cores in each 
socket)
And there are 90 concurrent connections. 

> > The flame graph is attached. I use 'perf' to generate the flame graph. Only 
> > the CPUs running PG server are profiled.
> > I'm not familiar with other part of PG. Can you find anything unusual in 
> > the graph?
> 
> Two SSI functions stand out:
> 10.86% PredicateLockTuple
>  3.51% CheckForSerializableConflictIn
> 
> In both cases, most of that seems to go to lightweight locking.  Since
> you said this is a CPU graph, that again suggests spinlock contention
> issues.
> 
> -- 

Yes. Is there any other kind of locks besides spinlock? I'm reading locks in PG 
now. If all locks are spinlock, the CPU should be used 100%. But now only 50% 
CPU are used. 
I'm afraid there are extra time waiting for mutex or semaphore.
These SSI functions will cost more time than reported, because perf doesn't 
record the time sleeping & waiting for locks. 
CheckForSerializableConflictIn takes 10% of running time. (refer to my last 
email) 

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-08 Thread Mengxing Liu
Thank you very much! I follow your advice and here is the result. 

SerializableXactHashLock 73
predicate_lock_manager 605
WALWriteLock 3
SerializableFinishedListLock 665

There are more than 90 events each time.  
SerializableXactHashLock/SerializableFinishedListLock are both used in SSI. 
I think that's why PG is so slow in high contention environment.


> -Original Messages-
> From: "Robert Haas" <robertmh...@gmail.com>
> Sent Time: 2017-06-08 01:30:58 (Thursday)
> To: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Cc: kgrittn <kgri...@gmail.com>, "Alvaro Herrera" <alvhe...@2ndquadrant.com>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> On Tue, Jun 6, 2017 at 12:16 PM, Mengxing Liu
> <liu-m...@mails.tsinghua.edu.cn> wrote:
> > I think disk I/O is not the bottleneck in our experiment, but the global 
> > lock is.
> 
> A handy way to figure this kind of thing out is to run a query like
> this repeatedly during the benchmark:
> 
> SELECT wait_event_type, wait_event FROM pg_stat_activity;
> 
> I often do this by using psql's \watch command, often \watch 0.5 to
> run it every half-second.  I save all the results collected during the
> benchmark using 'script' and then analyze them to see which wait
> events are most frequent.  If your theory is right, you ought to see
> that SerializableXactHashLock occurs as a wait event very frequently.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSOC][weekly report 3] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-21 Thread Mengxing Liu

> From: "Heikki Linnakangas" <hlinn...@iki.fi>
>
> Hmm. The hash table ought to speed up the RWConflictExists() function 
> right? Where in the flame graph is RWConflictExists()? If it only 
> accounts for a small amount of the overall runtime, even drastic speedup 
> there won't make much difference to the total.
>

Yes. It is much smaller than we have expected. I did a small test  for HTAB and 
linked list: 
when the set size is smaller than 10, linked list is as quick as HTAB. Here is 
the result.
(x microseconds running 10 million searching) 

setSize: 5, LIST: 423569, HTAB: 523681
setSize: 10, LIST: 540579, HTAB: 430111
setSize: 15, LIST: 752879, HTAB: 429519
setSize: 20, LIST: 940792, HTAB: 431388
setSize: 25, LIST: 1163760, HTAB: 446043
setSize: 30, LIST: 1352990, HTAB: 429057
setSize: 35, LIST: 1524097, HTAB: 431547
setSize: 40, LIST: 1714976, HTAB: 429023

As we can see, the hash table can only benefit in a very extreme situation. 
However, even if there are 100 concurrent connections, the average length of 
conflict 
list is about 10. linked list is not the bottleneck. 

> 
> Nope, there is no such function, I'm afraid.
> 

Oh, that's really a bad news.

> > In a previous email, I reported that many backends wait for the lock 
> > “SerializableFinishedListLock”;
> > If we don't implement functions like “ReleaseOneSerializableXact” quickly, 
> > they will be the bottleneck.
> 
> Yeah, that's probably what's causing the decrease in throughput you're 
> seeing.
> 

An new evidence: I use "SELECT wait_event_type, wait_event FROM 
pg_stat_activity;" and sum by event_type to analyze the wait event.

The result of original version:
SerializableXactHashLock 27
predicate_lock_manager 512
WALWriteLock 3
SerializableFinishedListLock 402

The result of new version:
SerializableXactHashLock 38
predicate_lock_manager 473
WALWriteLock 3
SerializableFinishedListLock 1068

Obviously, more backends are blocked by SerializableFinishedListLock. 

>
> You might need to also keep the linked lists, and use the hash table to 
> only look up particular items in the linked list faster.
> 
> I was surprised to see that you're creating a lot of smallish hash 
> tables, three for each PredXact entry. I would've expected all the 
> PredXact entries to share the same hash tables, i.e. have only three 
> hash tables in total. That would be more flexible in allocating 
> resources among entries. (It won't help with the quick-release, though)
>

Yes, it would looks more elegant and require less memory resources. 
( because hash table objects also consume memory )
But just for performance, it would be less efficient than my patch. 
Because it has to maintain linked lists, besides hash tables.  In other words,
it does more works than my patch.

Another point is that removing linked list may have more opportunities to reduce
lock contentions. Otherwise, we need a global lock to protect the linked list. 

--
Sincerely


Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-02 Thread Mengxing Liu
Hi,  Alvaro and Kevin.

> Anyway, this is just my analysis. 
> So I want to hack the PG and count the conflict lists' size of transactions. 
> That would be more accurate.

In the last week, I hacked the PG to add an additional thread to count 
RWConflict list lengths. 
And tune the benchmark to get more conflict. But the result is still not good.

> 
> > 
> > Yeah, you need a workload that generates a longer conflict list -- if
> > you can make the tool generate a conflict list with a configurable
> > length, that's even better (say, 100 conflicts vs. 1000 conflicts).
> > Then we can see how the conflict list processing scales.
> > 
> 
> Yes, I tried to increase the read set to make more conflicts. 
> However the abort ratio will also increase. The CPU cycles consumed by 
> conflict tracking are still less than 1%.
> According to the design of PG, a transaction will be aborted if there is a 
> rw-antidependency. 
> In this case, a transaction with a longer conflict list, is more possible to 
> be aborted.
> That means, the conflict list doesn't have too many chances to grow too long. 
> 

To solve this problem, I use just two kinds of transactions: Read-only 
transactions and Update-only transactions.
In this case, no transaction would  have an In-RWconflict and an Out-RWconflict 
at the same time.  
Thus transactions would not be aborted by conflict checking. 

Specifically, The benchmark is like this:
The table has 10K rows. 
Read-only transactions read 1K rows and Update-only transactions update 20 
random rows of the table. 

In this benchmark, about 91% lists are shorter than 10; 
lengths of 6% conflict lists are between 10 and 20. Only 2% lists are longer 
than 20. The CPU utilization of CheckForSerializableConflictOut/In is 
0.71%/0.69%.

I tried to increase the write set. As a result, conflict list become longer. 
But the total CPU utilization is decreased (about 50%).
CPU is not the bottleneck anymore. I'm not familiar with other part of PG. Is 
it caused by LOCK? Is there any chance to get rid of this problem?

BTW, I found that the email is not very convenient, especially when I  have 
some problem and want to discuss with you.
Would you mind scheduling a meeting every week by Skye, or other instant 
message software you like?
I would not take you too much time. Maybe one hour is enough.   


Sincerely.
--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-03 Thread Mengxing Liu


> -Original Messages-
> From: "Kevin Grittner" <kgri...@gmail.com>
> Sent Time: 2017-06-03 01:44:16 (Saturday)
> To: "Alvaro Herrera" <alvhe...@2ndquadrant.com>
> Cc: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: Re: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> > Mengxing Liu wrote:
> 
> >> The CPU utilization of CheckForSerializableConflictOut/In is
> >> 0.71%/0.69%.
> 
> How many cores were on the system used for this test?  The paper
> specifically said that they didn't see performance degradation on
> the PostgreSQL implementation until 32 concurrent connections with
> 24 or more cores.  The goal here is to fix *scaling* problems.  Be
> sure you are testing at an appropriate scale.  (The more sockets,
> cores, and clients, the better, I think.)
> 
> 

I used 15 cores for server and 50 clients. 
I tried 30 cores. But the CPU utilization is about 45%~70%. 
How can we distinguish  where the problem is? Is disk I/O or Lock ?

> On Fri, Jun 2, 2017 at 10:08 AM, Alvaro Herrera
> <alvhe...@2ndquadrant.com> wrote:
> 
> > Kevin mentioned during PGCon that there's a paper by some group in
> > Sydney that developed a benchmark on which this scalability
> > problem showed up very prominently.
> 
> https://pdfs.semanticscholar.org/6c4a/e427e6f392b7dec782b7a60516f0283af1f5.pdf
> 
> "[...] we see a much better scalability of pgSSI than the
> corresponding implementations on InnoDB. Still, the overhead of
> pgSSI versus standard SI increases significantly with higher MPL
> than one would normally expect, reaching around 50% with MPL 128."
> 

Actually, I implemented the benchmark described in the paper at first. I 
reported the result in a previous email.
The problem is that the transaction with longer conflict list is easier to be 
aborted, so the conflict list can not grow too long.
I modify the benchmark by using Update-only transaction and Read-only 
transaction to get rid of this problem. In this way, dangerous structure will 
never be generated.

> "Our profiling showed that PostgreSQL spend 2.3% of the overall
> runtime in traversing these list, plus 10% of its runtime waiting on
> the corresponding kernel mutexes."
> 

Does "traversing these list" means the function "RWConflictExists"? 
And "10% waiting on the corresponding kernel mutexes" means the lock in the 
function CheckForSerializableIn/out ?

> If you cannot duplicate their results, you might want to contact the
> authors for more details on their testing methodology.
> 

If I used 30 cores for server, and 90 clients, RWConflictExists consumes 1.0% 
CPU cycles, and CheckForSerializableIn/out takes 5% in all. 
But the total CPU utilization of PG is about 50%. So the result seems to be 
matched. 
If we can solve this problem, we can use this benchmark for the future work.
 

Sincerely

--
Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [GSOC] [Weekly report 2] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-14 Thread Mengxing Liu
 Hi, all. In the last week, I replaced linked list with hash table in 
SerializableXact. 
 I only replace inConflicts and outConflicts. The other linked lists, such as 
possibleUnsafeConflicts, I will modify them after other things work well.

There are still some bugs: the abort rate is much higher than before (from 
11.3% to 72%). I'll figure out what's wrong in the next few days.

My design is as follow:

For hash table, key is the pointer of SerializableXact; Value is the 
RWConflictData object. 
Hashcode is generated based on the SerializableXact pointer. 
So, given a SerializableXact, we can quickly find if it is conflict with 
another SerializableXact.

Every SerializableXact has two tables: inConflictsTab and outConflictsTab. 
They are allocated and initialized when creating PredXactList (in the function 
InitPredicateLocks). 
When a SerializableXact object is released, it will release all its RWConflict 
objects, the hash tables are empty again. 
Thus They can be reused directly after releasing. 

NOTE: I stored RWConflictData in hash tables, instead of RWConflict object 
allocated by RWConflictPool. 
 After I remove other linked lists, the RWConflictPool can be omitted.

My code is on the :
https://github.com/liumx10/postgresql/commit/3fd9a7488de5ae19ce2ce19eae5f303153a079ff

There are 3 main modifications in summary:
1) Initializing hash table. Related functions: InitPredicateLocks

2) Setting, releasing and checking RWConflicts. Related functions: 
RWConflictExists, SetRWConflict, ReleaseRWConflict

3) There are some functions scanning all RWConflict in a SerializableXact.
Related functions: ReleasePredicateLocks, ReleaseOneSerializableXact, 
CheckForSerializableConflictOut, 
 CheckForSerializableConflictIn, OnConflict_CheckForSerializationFailure, 
PreCommit_CheckForSerializationFailure




Sincerely
--
Mengxing Liu








Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-06-06 Thread Mengxing Liu
Hi, Kevin and Alvaro. 

I think disk I/O is not the bottleneck in our experiment, but the global lock 
is. 

For disk I/O, there are two evidences:
1) The total throughput is not more than 10 Ktps. Only a half are update 
transactions. An update transaction modifies 20 tuples; each tuple's size is 
about 100B.  
So the data written to the disk should be less than 10MB per second. Even we 
take write-ahead log in consideration (just double it), the data should be less 
than 20MB/s. 
I replaced ramdisk with SSD, and "iostat" shows the same result, while our 
SSD's sequential write speed is more than 700MB/s. 
2) I changed isolation level from "serializable" to "read committed". As the 
isolation requirement becomes looser, throughput is increased. But in this 
case, the CPU utilization is nearly 100%. (It's about 50% in serializable model)
Therefore, disk I/O is not the bottleneck.

For the lock:
I read the source code in predicate.c; I found many functions use a global 
lock:  SerializableXactHashLock. So there is only one process on working at any 
time!
As the problem of CPU not being fully used just happened after I had changed 
isolation level to "serailizable", this global lock should be the bottleneck.
Unfortunately, "perf" seems unable to record time waiting for locks.
I did it by hand.  Specifically, I use function "gettimeofday" just before 
acquiring locks and after releasing locks. 
In this way, I found function CheckForSerializableIn/CheckForSerializableOut 
takes more than 10% of running time, which is far bigger than what reported by 
perf in the last email.

If my statement is right, it sounds like good news as we find where the problem 
is.
Kevin has mentioned that the lock is used to protect the linked list. So I want 
to replace the linked list with hash table in the next step. After that I will 
try to remove this lock carefully.
But  in this way, our purpose has been changed. O(N^2) tracking is not the 
bottleneck, the global lock is.

By the way, using "gettimeofday" to profile is really ugly. 
Perf lock can only record kernel mutex, and requires kernel support, so I 
didn't use it.
Do you have any good idea about profiling time waiting for locks?


> -Original Messages-
> From: "Mengxing Liu" <liu-m...@mails.tsinghua.edu.cn>
> Sent Time: 2017-06-05 00:27:51 (Monday)
> To: "Kevin Grittner" <kgri...@gmail.com>
> Cc: "Alvaro Herrera" <alvhe...@2ndquadrant.com>, 
> "pgsql-hackers@postgresql.org" <pgsql-hackers@postgresql.org>
> Subject: Re: Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling from 
> rw-conflict tracking in serializable transactions
> 
> 
> 
> 
> > -Original Messages-
> > From: "Kevin Grittner" <kgri...@gmail.com>
> 
> > > I tried 30 cores. But the CPU utilization is about 45%~70%.
> > > How can we distinguish  where the problem is? Is disk I/O or Lock?
> > 
> > A simple way is to run `vmstat 1` for a bit during the test.  Can
> > you post a portion of the output of that here?  If you can configure
> > the WAL directory to a separate mount point (e.g., use the --waldir
> > option of initdb), a snippet of `iostat 1` output would be even
> > better.
> 
> "vmstat 1" output is as follow. Because I used only 30 cores (1/4 of all),  
> cpu user time should be about 12*4 = 48. 
> There seems to be no process blocked by IO. 
> 
> procs ---memory-- ---swap-- -io -system-- 
> --cpu-
>  r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa 
> st
> 28  0  0 981177024 315036 7084376000 0 900  1  0 
> 99  0  0
> 21  1  0 981178176 315036 7084378400 0 0 25482 329020 12  
> 3 85  0  0
> 18  1  0 981179200 315036 7084379200 0 0 26569 323596 12  
> 3 85  0  0
> 17  0  0 981175424 315036 7084380800 0 0 25374 322992 12  
> 4 85  0  0
> 12  0  0 981174208 315036 7084382400 0 0 24775 321577 12  
> 3 85  0  0
>  8  0  0 981179328 315036 7084533600 0 0 13115 199020  6  
> 2 92  0  0
> 13  0  0 981179200 315036 7084579200 0 0 22893 301373 11  
> 3 87  0  0
> 11  0  0 981179712 315036 7084580800 0 0 26933 325728 12  
> 4 84  0  0
> 30  0  0 981178304 315036 7084582400 0 0 23691 315821 11  
> 4 85  0  0
> 12  1  0 981177600 315036 7084583200 0 0 29485 320166 12  
> 4 84  0  0
> 32  0  0 981180032 315036 7084584800 0 0 25946 316724 12  
> 4 84  0  0
> 21  0  0 981176384 315036 7084586400 0 0 24227 321938 12  
> 4 84  0  0
>