[computer-go] Rating variability on CGOS

2009-10-08 Thread Brian Sheppard
About two weeks ago I took Pebbles offline for an extensive overhaul of its
board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of
roughly 2475.

When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
wondered what was going on.

I found a contributing factor: Valkyria has massively different results
against Pachi than against Pebbles. It happens that Pachi started playing a
day or two after Pebbles went offline.

Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a
lot more often than Pachi:

Pachi:   185 / 273 = 67.8%
Pebbles: 429 / 503 = 85.3%

There are a lot of lessons here...

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rating variability on CGOS

2009-10-08 Thread Petr Baudis
On Thu, Oct 08, 2009 at 01:48:05PM -0600, Brian Sheppard wrote:
 About two weeks ago I took Pebbles offline for an extensive overhaul of its
 board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of
 roughly 2475.
 
 When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
 wondered what was going on.
 
 I found a contributing factor: Valkyria has massively different results
 against Pachi than against Pebbles. It happens that Pachi started playing a
 day or two after Pebbles went offline.
 
 Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a
 lot more often than Pachi:
 
 Pachi:   185 / 273 = 67.8%
 Pebbles: 429 / 503 = 85.3%
 
 There are a lot of lessons here...

So, I'm curious how well Pachi will do against Pebbles. ;-) I'm hoping
that I'm nearing another major improvement in strength soon, so the
current version may not stay on CGOS much longer.

(Curiously, the two identical Pachi instances were even after many games
very far apart in ELO points (about 100 ELO), somehow one of the
instances was winning against the other in about 80% of games; it
corrected itself after few more days without me doing anything, though.
Stochastic environments are funny.)

-- 
Petr Pasky Baudis
A lot of people have my books on their bookshelves.
That's the problem, they need to read them. -- Don Knuth
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rating variability on CGOS

2009-10-08 Thread Magnus Persson
When I get more time to work on Valkyria again maybe I should look  
closely at the games against Pachi...


-Magnus

Quoting Brian Sheppard sheppar...@aol.com:


About two weeks ago I took Pebbles offline for an extensive overhaul of its
board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating of
roughly 2475.

When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
wondered what was going on.

I found a contributing factor: Valkyria has massively different results
against Pachi than against Pebbles. It happens that Pachi started playing a
day or two after Pebbles went offline.

Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles a
lot more often than Pachi:

Pachi:   185 / 273 = 67.8%
Pebbles: 429 / 503 = 85.3%

There are a lot of lessons here...

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/





--
Magnus Persson
Berlin, Germany
___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


Re: [computer-go] Rating variability on CGOS

2009-10-08 Thread Don Dailey
One must be very careful about proclaiming wild transitivity issues.  I'm
not saying it's not an issue,  there is some going on with every program on
CGOS,   but with less than 500 games between any two players you are going
to get error margins of +/-  30-50 ELO or something like that.

And CGOS ratings are always going to have high error margins because they
are incrementally rated.You should go by the bayeselo rating to get
something more stable and you really need to have over 1000 games to begin
trusting the ratings within 20 or 30 ELO.

- Don


On Thu, Oct 8, 2009 at 4:50 PM, Petr Baudis pa...@ucw.cz wrote:

 On Thu, Oct 08, 2009 at 01:48:05PM -0600, Brian Sheppard wrote:
  About two weeks ago I took Pebbles offline for an extensive overhaul of
 its
  board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating
 of
  roughly 2475.
 
  When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
  wondered what was going on.
 
  I found a contributing factor: Valkyria has massively different results
  against Pachi than against Pebbles. It happens that Pachi started playing
 a
  day or two after Pebbles went offline.
 
  Pebbles and Pachi are both rated around 2200, but Valkyria shreds Pebbles
 a
  lot more often than Pachi:
 
  Pachi:   185 / 273 = 67.8%
  Pebbles: 429 / 503 = 85.3%
 
  There are a lot of lessons here...

 So, I'm curious how well Pachi will do against Pebbles. ;-) I'm hoping
 that I'm nearing another major improvement in strength soon, so the
 current version may not stay on CGOS much longer.

 (Curiously, the two identical Pachi instances were even after many games
 very far apart in ELO points (about 100 ELO), somehow one of the
 instances was winning against the other in about 80% of games; it
 corrected itself after few more days without me doing anything, though.
 Stochastic environments are funny.)

 --
Petr Pasky Baudis
 A lot of people have my books on their bookshelves.
 That's the problem, they need to read them. -- Don Knuth
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

[computer-go] Re: Rating variability on CGOS

2009-10-08 Thread Dave Dyer

In any rating scheme, who you play can be as important as how well.
This is especially true for small groups.   

Suddenly adding or dropping a strong player will certainly cause 
all the other player's ratings to shift.

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


[computer-go] Rating variability on CGOS

2009-10-08 Thread Brian Sheppard
One must be very careful about proclaiming wild transitivity issues.  I'm
not saying it's not an issue, there is some going on with every program on
CGOS, but with less than 500 games between any two players you are going
to get error margins of +/- 30-50 ELO or something like that.

Actually we are certain that significant differences are being observed. If
we pool the Pachi and Pebbles data, then the null hypothesis is that
Valkyria defeats both programs by 79%. The observed data differs by at least
3.5 standard deviations.

Note that we are talking about 150 rating points.




___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/


RE: [computer-go] Rating variability on CGOS

2009-10-08 Thread David Fotland
Many Faces also had more trouble against pachi than you would expect from
its rating.  Perhaps Pachi is generally stronger, but throws away some
percentage of games (even against weak players) because of some bug.

David

 -Original Message-
 From: computer-go-boun...@computer-go.org [mailto:computer-go-
 boun...@computer-go.org] On Behalf Of Brian Sheppard
 Sent: Thursday, October 08, 2009 12:48 PM
 To: computer-go@computer-go.org
 Subject: [computer-go] Rating variability on CGOS
 
 About two weeks ago I took Pebbles offline for an extensive overhaul of
 its
 board representation. At that time Valkyria 3.3.4 had a 9x9 CGOS rating
 of
 roughly 2475.
 
 When I looked today, I saw Valkyria 3.3.4 rated at roughly 2334, so I
 wondered what was going on.
 
 I found a contributing factor: Valkyria has massively different results
 against Pachi than against Pebbles. It happens that Pachi started
 playing a
 day or two after Pebbles went offline.
 
 Pebbles and Pachi are both rated around 2200, but Valkyria shreds
 Pebbles a
 lot more often than Pachi:
 
 Pachi:   185 / 273 = 67.8%
 Pebbles: 429 / 503 = 85.3%
 
 There are a lot of lessons here...
 
 ___
 computer-go mailing list
 computer-go@computer-go.org
 http://www.computer-go.org/mailman/listinfo/computer-go/

___
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/