[computer-go] Rating variability on CGOS
About two weeks ago I took Pebbles offline for an extensive overhaul of its board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of roughly 2475. When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I wondered what was going on. I found a contributing factor: Valkyria has massively different results against Pachi than against Pebbles. It happens that Pachi started playing a day or two after Pebbles went offline. Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a lot more often than Pachi: Pachi: 185 / 273 = 67.8% Pebbles: 429 / 503 = 85.3% There are a lot of lessons here... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rating variability on CGOS
On Thu, Oct 08, 2009 at 01:48:05PM -0600, Brian Sheppard wrote: About two weeks ago I took Pebbles offline for an extensive overhaul of its board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of roughly 2475. When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I wondered what was going on. I found a contributing factor: Valkyria has massively different results against Pachi than against Pebbles. It happens that Pachi started playing a day or two after Pebbles went offline. Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a lot more often than Pachi: Pachi: 185 / 273 = 67.8% Pebbles: 429 / 503 = 85.3% There are a lot of lessons here... So, I'm curious how well Pachi will do against Pebbles. ;-) I'm hoping that I'm nearing another major improvement in strength soon, so the current version may not stay on CGOS much longer. (Curiously, the two identical Pachi instances were even after many games very far apart in ELO points (about 100 ELO), somehow one of the instances was winning against the other in about 80% of games; it corrected itself after few more days without me doing anything, though. Stochastic environments are funny.) -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rating variability on CGOS
When I get more time to work on Valkyria again maybe I should look closely at the games against Pachi... -Magnus Quoting Brian Sheppard sheppar...@aol.com: About two weeks ago I took Pebbles offline for an extensive overhaul of its board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of roughly 2475. When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I wondered what was going on. I found a contributing factor: Valkyria has massively different results against Pachi than against Pebbles. It happens that Pachi started playing a day or two after Pebbles went offline. Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a lot more often than Pachi: Pachi: 185 / 273 = 67.8% Pebbles: 429 / 503 = 85.3% There are a lot of lessons here... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ -- Magnus Persson Berlin, Germany ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
Re: [computer-go] Rating variability on CGOS
One must be very careful about proclaiming wild transitivity issues. I'm not saying it's not an issue, there is some going on with every program on CGOS, but with less than 500 games between any two players you are going to get error margins of +/- 30-50 ELO or something like that. And CGOS ratings are always going to have high error margins because they are incrementally rated.You should go by the bayeselo rating to get something more stable and you really need to have over 1000 games to begin trusting the ratings within 20 or 30 ELO. - Don On Thu, Oct 8, 2009 at 4:50 PM, Petr Baudis pa...@ucw.cz wrote: On Thu, Oct 08, 2009 at 01:48:05PM -0600, Brian Sheppard wrote: About two weeks ago I took Pebbles offline for an extensive overhaul of its board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of roughly 2475. When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I wondered what was going on. I found a contributing factor: Valkyria has massively different results against Pachi than against Pebbles. It happens that Pachi started playing a day or two after Pebbles went offline. Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a lot more often than Pachi: Pachi: 185 / 273 = 67.8% Pebbles: 429 / 503 = 85.3% There are a lot of lessons here... So, I'm curious how well Pachi will do against Pebbles. ;-) I'm hoping that I'm nearing another major improvement in strength soon, so the current version may not stay on CGOS much longer. (Curiously, the two identical Pachi instances were even after many games very far apart in ELO points (about 100 ELO), somehow one of the instances was winning against the other in about 80% of games; it corrected itself after few more days without me doing anything, though. Stochastic environments are funny.) -- Petr Pasky Baudis A lot of people have my books on their bookshelves. That's the problem, they need to read them. -- Don Knuth ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Re: Rating variability on CGOS
In any rating scheme, who you play can be as important as how well. This is especially true for small groups. Suddenly adding or dropping a strong player will certainly cause all the other player's ratings to shift. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
[computer-go] Rating variability on CGOS
One must be very careful about proclaiming wild transitivity issues. I'm not saying it's not an issue, there is some going on with every program on CGOS, but with less than 500 games between any two players you are going to get error margins of +/- 30-50 ELO or something like that. Actually we are certain that significant differences are being observed. If we pool the Pachi and Pebbles data, then the null hypothesis is that Valkyria defeats both programs by 79%. The observed data differs by at least 3.5 standard deviations. Note that we are talking about 150 rating points. ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/
RE: [computer-go] Rating variability on CGOS
Many Faces also had more trouble against pachi than you would expect from its rating. Perhaps Pachi is generally stronger, but throws away some percentage of games (even against weak players) because of some bug. David -Original Message- From: computer-go-boun...@computer-go.org [mailto:computer-go- boun...@computer-go.org] On Behalf Of Brian Sheppard Sent: Thursday, October 08, 2009 12:48 PM To: computer-go@computer-go.org Subject: [computer-go] Rating variability on CGOS About two weeks ago I took Pebbles offline for an extensive overhaul of its board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of roughly 2475. When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I wondered what was going on. I found a contributing factor: Valkyria has massively different results against Pachi than against Pebbles. It happens that Pachi started playing a day or two after Pebbles went offline. Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a lot more often than Pachi: Pachi: 185 / 273 = 67.8% Pebbles: 429 / 503 = 85.3% There are a lot of lessons here... ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/ ___ computer-go mailing list computer-go@computer-go.org http://www.computer-go.org/mailman/listinfo/computer-go/