Yes it performed well for Hex8x8; i got a speed up of 60x compared to CPU
when i tested it about 2 years ago on a not-so-modern GPU (128 cores IIRC).
However, the playouts in Hex are much simpler than that of Go.
For instance, I check for termination of game once when the board is
completely full,
It is not exactly Go, but i have a monte-carlo tree searcher on the GPU for
the game of Hex 8x8.
I got about 60x speed up from it when i tested it about two years ago. I
specifically chose this game because
the moves and WDL rules are much simpler than that of Go.
Here is a github link