RE: [computer-go] MPI vs Thread-safe
In the MPI runs we use an 8-core node, so the playouts per node are higher. I don't ponder, since the program isn't scaling anyway. The number of nodes with high visits is smaller, and I only send nodes that changed since the last send. I do progressive unpruning, so most children have zero vists. There are at most 30 active children in a node. I don't broadcast. I use a reduce algorithm, which doesn't save a lot with four nodes, but saves something. The actual bandwidth is quite small. Also the Microsoft cluster has Infiniband. David > -Original Message- > From: computer-go-boun...@computer-go.org [mailto:computer-go- > boun...@computer-go.org] On Behalf Of Brian Sheppard > Sent: Friday, October 30, 2009 11:50 AM > To: computer-go@computer-go.org > Subject: [computer-go] MPI vs Thread-safe > > Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores > per > process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If > you > think for 30 seconds (pondering + move time) then you are at 2.4 > million > nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is > roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal > moves). If > you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. > That > is how much data the master has to broadcast for a complete refresh. > And the > data has to be sent to the master for aggregation, but that is a > smaller > number because most processes will only modify a small number of nodes > within the refresh cycle. > > Now, there are all kinds of reductions to that number. For instance, > MPI > supports process groups and there are network layer tricks that can (in > theory) supply all listeners in a single message. And nodes don't have > to be > updated if they are not changed, and you can compress the counters, and > so > on. > > I'm just saying that it looks like a lot of data. It can't be as much > as I > just calculated, because Gigabit Ethernet would saturate at something > less > than 100 megabytes/sec. But how much is it? > > > >Does Mogo share RAVE values as well over MPI? > > I would think so, because RAVE is critical to MoGo's move ordering > policies. > > > >It might be the MFGO bias. > > This doesn't add up to me. If the problem were in something so basic > then > the serial program wouldn't play more strongly beyond 4 times as much > clock > time. > > > >It might be due to a bug in my MPI implementation, or any number > >of other possible bugs. > > Bugs in non-MPI areas would also kill off improvement in the serial > program > beyond 4x. So if you aren't observing that then you can look elsewhere. > > But of course it is possible for problems to exist only in scaling. > > For example, suppose that a node is created in process A and gets 100 > trials. It would then have its UCT data passed to other processes, but > other > nodes have not necessarily created the node, nor done progressive > widening > and so on. Such a node exists in a state that could not be created in a > serial program, which would cause a loss in move ordering. This is a > bug in > parallelization, though not in MPI specifically. > > And you are quite right that bugs can manifest as a limit to > scalability. > > The difficulty of debugging MPI processes is perhaps the biggest > complaint > about that model. > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MPI vs Thread-safe
On Fri, Oct 30, 2009 at 10:50:05AM -0600, Brian Sheppard wrote: > >> Parallelization *cannot* provide super-linear speed-up. > > > >I don't see that at all. > > This is standard computer science stuff, true of all parallel programs and > not just Go players. No parallel program can be better than N times a serial > version. > > The result follows from a simulation argument. Suppose that you had a > parallel process that performed better than N times a serial program. > Construct a new serial program that simulates the parallel process. There is > a contradiction. Of course, if I'm claiming that running M playouts in N threads is better than running N*M playouts, I must agree that running N separate groups of M playouts in sequence and letting them vote must yield the same result - but thank you for this perspective, I haven't thought about it like this. Clearly, there is a lot of room for experimentation and measurements here... -- Petr "Pasky" Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] MPI vs Thread-safe
Back of envelope calculation: MFG processes 5K nodes/sec/core * 4 cores per process = 20K nodes/sec/process. Four processes makes 80K nodes/sec. If you think for 30 seconds (pondering + move time) then you are at 2.4 million nodes. Figure about 25,000 nodes having 100 visits or more. UCT data is roughly 2500 bytes per node (2 counters of 4 bytes for 300 legal moves). If you have 4 nodes then bandwidth is 25,000 * 4 * 2500 = 250 megabytes. That is how much data the master has to broadcast for a complete refresh. And the data has to be sent to the master for aggregation, but that is a smaller number because most processes will only modify a small number of nodes within the refresh cycle. Now, there are all kinds of reductions to that number. For instance, MPI supports process groups and there are network layer tricks that can (in theory) supply all listeners in a single message. And nodes don't have to be updated if they are not changed, and you can compress the counters, and so on. I'm just saying that it looks like a lot of data. It can't be as much as I just calculated, because Gigabit Ethernet would saturate at something less than 100 megabytes/sec. But how much is it? >Does Mogo share RAVE values as well over MPI? I would think so, because RAVE is critical to MoGo's move ordering policies. >It might be the MFGO bias. This doesn't add up to me. If the problem were in something so basic then the serial program wouldn't play more strongly beyond 4 times as much clock time. >It might be due to a bug in my MPI implementation, or any number >of other possible bugs. Bugs in non-MPI areas would also kill off improvement in the serial program beyond 4x. So if you aren't observing that then you can look elsewhere. But of course it is possible for problems to exist only in scaling. For example, suppose that a node is created in process A and gets 100 trials. It would then have its UCT data passed to other processes, but other nodes have not necessarily created the node, nor done progressive widening and so on. Such a node exists in a state that could not be created in a serial program, which would cause a loss in move ordering. This is a bug in parallelization, though not in MPI specifically. And you are quite right that bugs can manifest as a limit to scalability. The difficulty of debugging MPI processes is perhaps the biggest complaint about that model. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] New list member
Welcome Aldric, Not a frequent poster myself, here are two resources that you may find useful. 1) an extensive library of articles related to computer-go is collected on http://www.citeulike.org/group/5884/library This list provides a wealth of articles tracing back many years and used to be very current. More recently, not everybody is sending links of their papers, which is a pity as they are sometimes hard to find elsewhere. For Machine Learning papers in particular, Markus Enzenberger's Neurogo papers are highly recommended and a good starting point. But there are others as well, including several Ph.D. theses. Those are nice because they are (always) much longer than papers, (sometimes) more instructive, and contain long reference lists. 2) Erik van der Werf did excellent work for his thesis on Machine Learning algorithms for particular functions. His website is http://erikvanderwerf.tengen.nl/ and contains links to his papers. Good luck, René 2009/10/29 Aldric Giacomoni > Hi everyone, > > I've been following the list for about a week and a half, and thought I > ought to introduce myself. I don't know if this much activity is normal on > the list, but I'm glad to see there is so much to read :) > > > > My name is Aldric - just in case you hadn't guessed. I am 27 years old, and > I live on the east coast of the US. I can be found online ( > godiscussions.com, KGS, Twitter, and many other non-go-related places) as > Trevoke. I've been playing go since 2006, and have reached the rank of 6k > on KGS so far. I graduated college in 2004 with a major in Computer Science > and Mathematics. I've been studying isshinryu karate since 2004 and am > preparing for my 2d test. > > I've always had an interest in "Artificial Intelligence" (I found recently > some files on my machine dealing with machine learning and dating back to > 2002), but never pushed it. I figured, oh, it's too complex, I'll study that > someday. For various reasons, I've decided to do some graduate studies (a > Doctorate) "in AI". You should now be able to take a good guess at why I > joined this list. > > I currently have zero knowledge of artificial intelligence, besides the few > papers about MTCS, and the paper around Crazy Stone and such, by Remy > Coulom, that I read in the past few days, following Olivier's message to > this list. I'm waiting for a first order of books from Amazon.com to get my > feet wet.. I know one thing, which I am also aware is still vague: "I want > to help solve go". I realize this may involve some pattern recognition, > knowledge representation, and a few more topics.. But I know this is where I > want to go with it. > > Are there any resources which you could recommend to someone who would like > to learn? Any pitfalls you can recommend I avoid (or stumble into) ? As a > warning, I am an avid reader and a rather obstinate individual when I have > decided to study/learn something. > > Oh, and finally, besides Olivier's suggestion to apply for a Doctorate with > his team, do you know of anybody in the world who may consider taking > someone like me (fluent in French, English, Italian, no ties, can travel, > etc etc) in their team, to do that kind of work ? And.. If so, the > repetition of the earlier question: what do you think I need to know before > I can study with them? > > > > Thanks in advance :-) > > > > --Aldric > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] MPI vs Thread-safe
>> Parallelization *cannot* provide super-linear speed-up. > >I don't see that at all. This is standard computer science stuff, true of all parallel programs and not just Go players. No parallel program can be better than N times a serial version. The result follows from a simulation argument. Suppose that you had a parallel process that performed better than N times a serial program. Construct a new serial program that simulates the parallel process. There is a contradiction. Technically, this argument only establishes the fact up to multiplicative constant. But in the case of parallel Go players, I cannot accept that simulating a parallel process using serial code would result in a slowdown. (If anything, serialization would be faster.) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [SPAM] Re: [computer-go] First ever win of a computer against a pro 9P as black (game of Go, 9x9).
Many Faces caches both local tactical results (ladders etc), and life and death reading results. For tactics it records a "shadow". For life and death it saves the whole tree. The tactical caches are still active in the UCT-MC code since reading tactics is part of move generation. The Life and Death search was a separate pass in the traditional program so it is not used by UCT-MC (yet). David > > This sounds a lot like a description of GNU Go's persistent reading > cache, > which calculates "reading shadow" for all its readings. Has something > similar tried for other programs? > > -- > Seo Sanghyeon > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] MPI vs Thread-safe
I share all uct-nodes with more than N visits, where N is currently 100, but performance doesn't seem very sensitive to N. Does Mogo share RAVE values as well over MPI? I agree that low scaling is a problem, and I don't understand why. It might be the MFGO bias. With low numbers of playouts MFGO bias boosts the strength since it provides a lot of go knowledge. With high playouts the brittleness and missing pieces of the MFGO knowledge are exposed so the strength plateaus. Perhaps with more nodes the strength would start increasing again, but we didn't test it. It might be due to a bug in my MPI implementation, or any number of other possible bugs. If a bug causes it to lose a small number of games no matter how many playouts are made, it would look like a lack of scaling. My focus this year has been making the single node version as strong as possible since that's what I'm selling, so I haven't spent much effort in fixing this MPI scaling problem (and I don't have a lot of time available on a cluster with more than 4 nodes). David > -Original Message- > From: computer-go-boun...@computer-go.org [mailto:computer-go- > boun...@computer-go.org] On Behalf Of Brian Sheppard > Sent: Friday, October 30, 2009 7:49 AM > To: computer-go@computer-go.org > Subject: [computer-go] MPI vs Thread-safe > > >I only share UCT wins and visits, and the MPI version only > >scales well to 4 nodes. > > The scalability limit seems very low. > > Just curious: what is the policy for deciding what to synchronize? I > recall > that MoGo shared only the root node. > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MPI vs Thread-safe
On Fri, Oct 30, 2009 at 07:53:15AM -0600, Brian Sheppard wrote: > >I personally just use root parallelization in Pachi > > I think this answers my question; each core in Pachi independently explores > a tree, and the master thread merges the data. This is even though you have > shared memory on your machine. Yes. > You also have possibilities for largely lockless thread safety. For > instance, the Intel architecture has atomic memory access instructions that > allow lockless data safety. Remi Coulom published a paper on this subject. > > I am not even sure that "leaf" parallelization is really ruled out. For > example, if GPU-based implementation works, then leaf parallelization must > be reconsidered. Fuego is probably prime example of well-working leaf parallelization, using atomic memory operations and virtual losses; see the TR for details and measurements. > > similar to what you probably mean by MPI, though without resyncs > > MPI is "message passing interface" an industry standard API for supporting > high performance computing. > > It is used for sharing data among multiple processes (that is, no shared > memory). I recall that MoGo published that their massively scalable strategy > is based on this approach. Yes, I know what MPI is, I just wasn't sure what "_the_ MPI method" would neccessarily be. > >confirming the paper's finding that the play improvement is > >larger than multiplying number of sequential playouts appropriately. > > Well, this is another reason why I doubt the results from the Mango paper. > Parallelization *cannot* provide super-linear speed-up. The existence of > super-linear speed-up proves that the underlying single-threaded program is > flawed. I don't see that at all. It seems quite logical to me that the results could be better when threads vote on resulting move instead of spending appropriately more in single search. I think the explanation that this compensates better for various (short-term?) biases popping within the UCT tree is logical. It's some time since I last measured the multi-threading gains and I made quite a lot of tweaks in the UCT exploration since, maybe I should re-measure again; it's just that measuring this takes a lot of CPU resources, thus is a pain. :-) -- Petr "Pasky" Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Reservations about Parallelization Strategy
While re-reading the parallelization papers, I tried to formulate why I thought that they couldn't be right. The issue with Mango reporting a super-linear speed-up was an obvious red flag, but that doesn't mean that their conclusions were wrong. It just means that Mango's exploration policy needs tweaking. I have three objections. First, root parallelization does not fully exploit the RAM attached to each compute node. If I have N compute nodes then I would like to search a tree that is N times as large. But in root parallelization I cannot actually do that because lower layers of the tree are duplicated across compute processes. Second, root parallelization does not share information about how to explore away from the root. Tactics necessary to refute variations must be discovered across all N processes. This seems wasteful of CPU resources. Third, it seems wasteful of network bandwidth. In particular, the root node is shared because of a side-effect: it makes N processes independently agree on how to distribute attention across root moves. If you count up the bytes required to represent a node, and how many bytes are required to specify a distribution of attention, you will find that the node is much, much larger. These reservations might not be significant. For example, most of a tree is stored near the leaves, so the cost of duplicating nodes near the root might be small. And the network bandwidth might not be a bottleneck. But I do believe that the second objection is a killer. Theory says that it is just as hard to choose replies to our moves as it is to choose our moves. So a strategy that splits only at the root cannot be optimal. One obvious alternative to root parallelization is "deep parallelization" in which we share the root node and children of already shared nodes that have at least N trials. Deep parallelization should asymptotically solve the problem of sharing deep tactics. But this approach worsens the drawbacks from my first and third objections. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] MPI vs Thread-safe
>I only share UCT wins and visits, and the MPI version only >scales well to 4 nodes. The scalability limit seems very low. Just curious: what is the policy for deciding what to synchronize? I recall that MoGo shared only the root node. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] MPI vs Thread-safe
Sent from my iPhone On Oct 30, 2009, at 9:53 AM, "Brian Sheppard" wrote: confirming the paper's finding that the play improvement is larger than multiplying number of sequential playouts appropriately. Well, this is another reason why I doubt the results from the Mango paper. Parallelization *cannot* provide super-linear speed-up. The existence of super-linear speed-up proves that the underlying single-threaded program is flawed. I had the same issue with the paper. It told me that near the root, Mango settles on a subtree too quickly. Root parallelism was the only method that lead to a bushy root, and was therefore the only successful strategy. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] MPI vs Thread-safe
>I personally just use root parallelization in Pachi I think this answers my question; each core in Pachi independently explores a tree, and the master thread merges the data. This is even though you have shared memory on your machine. >Have you read the Parallel Monte-Carlo Tree Search paper? Yes, both the Mango team's work and Bruno Bouzy's pioneering work. >It sums up the possibilities nicely. Well, I have doubts. I am not saying anything negative regarding their work, which I am confident is an accurate representation of the experimental data. But there are many possibilities for parallelization that are not covered by those papers. For instance, the dichotomy between "global" and "local" mutexes is artificial. You can take any number N of physical locks and multiplex locks for tree nodes onto those N. This provides a range of intermediate algorithms that do not have as much contention as "global," and not as much data as "local." You also have possibilities for largely lockless thread safety. For instance, the Intel architecture has atomic memory access instructions that allow lockless data safety. Remi Coulom published a paper on this subject. I am not even sure that "leaf" parallelization is really ruled out. For example, if GPU-based implementation works, then leaf parallelization must be reconsidered. > similar to what you probably mean by MPI, though without resyncs MPI is "message passing interface" an industry standard API for supporting high performance computing. It is used for sharing data among multiple processes (that is, no shared memory). I recall that MoGo published that their massively scalable strategy is based on this approach. >confirming the paper's finding that the play improvement is >larger than multiplying number of sequential playouts appropriately. Well, this is another reason why I doubt the results from the Mango paper. Parallelization *cannot* provide super-linear speed-up. The existence of super-linear speed-up proves that the underlying single-threaded program is flawed. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] First ever win of a computer against a pro 9P as black (game of Go, 9x9).
2009/10/30 terry mcintyre : > This may be useful in computer Go. One of the reasons human pros do well is > that they compute certain sub-problems once, and don't repeat the effort > until something important changes. They know in an instant that certain > positions are live or dead or seki; they know when a move ( reducing a > liberty, for example ) disturbs that result. This could probably be emulated > with theorem-proving ability. Presently, search algorithms have to > rediscover these results many times over; this is (in my opinion) why > computer programs get significantly weaker when starved for time; they > cannot think deeply enough to solve problems which may be solved in an > eyeblink by a pro. This sounds a lot like a description of GNU Go's persistent reading cache, which calculates "reading shadow" for all its readings. Has something similar tried for other programs? -- Seo Sanghyeon ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [SPAM] Re: [computer-go] Re: First ever win of a computer against a pro 9P as black (game of Go, 9x9).
Thanks a lot for this information. This is really very interesting and not widely known. Maybe chess is less closed than I would have believed :-) Olivier > > > Arno Nickel played three games with Hydra over a few months in 2005. > He won 2.5-0.5 > > http://en.wikipedia.org/wiki/Arno_Nickel > > I doubt much have changed since 2005. Also note that, while certainly > among top players, Arno Nickel was never a correspondence chess > champion. > > -- > Seo Sanghyeon > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > -- = Olivier Teytaud (TAO-inria) olivier.teyt...@inria.fr Tel (33)169154231 / Fax (33)169156586 Equipe TAO (Inria-Futurs), LRI, UMR 8623(CNRS - Universite Paris-Sud), bat 490 Universite Paris-Sud 91405 Orsay Cedex France (one of the 56.5 % of french who did not vote for Sarkozy in 2007) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Re: First ever win of a computer against a pro 9P as black (game of Go, 9x9).
2009/10/30 Olivier Teytaud : >> I think in correpondence chess humans still hold against computers >> >> Petri > > Are there sometimes games organized like that ? This is really impressive to > me. Arno Nickel played three games with Hydra over a few months in 2005. He won 2.5-0.5 http://en.wikipedia.org/wiki/Arno_Nickel I doubt much have changed since 2005. Also note that, while certainly among top players, Arno Nickel was never a correspondence chess champion. -- Seo Sanghyeon ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Re: First ever win of a computer against a pro 9P as black (game of Go, 9x9).
I cant recall any offocoal challenges. I do remember some such statement in some other challenge, but failed to google it up. Human computer chess challenges are not likely to happen anymore. What would be the point for human? Hydra could probably beat anyone. And as processors get faster any of top 10 programs in top commercial hardware and beat just about anyone. Why would anyone sponsor such an event. Corresponce chess has also suffered as everyone has access to computers and it pretty hard to prevent cheating. Petri 2009/10/30 Olivier Teytaud > > >> >> I think in correpondence chess humans still hold against computers >> >> Petri >> > > Are there sometimes games organized like that ? This is really impressive > to me. > > (maybe MCTS might win against alpha-beta in chess with huge time settings > :-) ) > > > > ___ > computer-go mailing list > computer-go@computer-go.org > http://www.computer-go.org/mailman/listinfo/computer-go/ > ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [SPAM] Re: [computer-go] Re: First ever win of a computer against a pro 9P as black (game of Go, 9x9).
> > > I think in correpondence chess humans still hold against computers > > Petri > Are there sometimes games organized like that ? This is really impressive to me. (maybe MCTS might win against alpha-beta in chess with huge time settings :-) ) ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/