Esteban Cervetto wrote:

Hi!

> I would like to do something with the ambigüous names in my base.

Who wouldn't ;)

> My idea is perform the next algoritm
> 
> 1 - list all the games with ambiguous names (ex. "Karpov,
> A."), mark by "Original"

Currently we do not have a function for this. I think that
it is possible to write one if we find a volunteer.

If I get you correctly you want to expand "A." to the
correct christian name in the end, right? This is
"callenging".

> 2 - list all the games recommended in each ok the
> ambiguous names (ex.  "Karpov, Anatoly.", "Karpov,
> Alexander", "Karpov, Alexey") , mark by "Recommended"

I do not understand that precisely. Do you mean to flag all
games played by Karpov, Anatoly and Karpov, Alexander and
Karpov, Alexey?

> 3 - Merge 1 and 2 and get a list of names "N"

I can not really follow you. Do you mean to end up with a
list of player names like

Karpov, A               Original
Karpov, Anatoly         Recommended
Karpov, Alexander       Recommended
Karpov, Alexey          Recommended

Isn't this just

$ grep -e "^Karpov, A" ratings.ssp

giving something like

Karpov, Aleksey         #-   RUS [1986] 1987
Karpov, Alexander       #im  RUS/UZB [2425] 1959
Karpov, Alexander N     #-   RUS [1944] 1936
Karpov, Alexey          #-   UKR [2120] 1978
Karpov, Anatoly         #gm  URS/RUS [2780] 1951
Karpov, Andrey Ivan     #-   RUS [2159] 1966
Karpov, Arkadiy         #-   RUS [1988] 1959

> 4 - Perform  a query in the database and obtain Color,
> Opponent, year, tournament and result, where any of the
> players is in "N".

This is not done yet (except by the work arround I described
recently with "My Player names"). Given some time and a
coder it is possible, however. It's even not too difficult I
think.

> 5 - Perform in this table a Group by oponent, tournament,
> color and year query

Form my experience in the usual sources, player names are a
mess. Tournament names and dates are hell depending on the
source even the result is sort of an issue. Therefore, I
fear it is not reliable to group by tournament name. (Even
for the major ones it is hardly possible to get them unique
in an automatic fassion.) I remember e.g. this nice city
called Chalkidiki... I don't know how many spellings I have
(hopefully "had") around. Trounaments should only be held in
places like Washington, London, Paris, Berlin... No umlauts,
no funny chars, large enough to have the same spelling all
over the world and by default in latin glyphs, ie. 7bit
ASCII. Moscow starts already to become difficult.

> 6 - when this count ==2 means that there are a duplicate
> game

Wouldn't comparing the moves be a better criterion?

Ie. take your final list of games and then check which games
have the same moves.

It is not perfect either as I've some games around that are
obviously the same but it seems that the move order is not
clear througout the literature. Then this could be performed
by the dupe checker. One could use a weaker criterion by
checking the position at various points in the game instead
of the move order to overcome this. Say if I have the same
position in 9 out of 10 points + the same result + ... I
could end up with some score to identify the dupes.
Probably some bayesian statistics, like a spam filter.  Just
an idea, however. Nothing done in that direction at all.

Anyway, if you only want to have a base with almost valid
statistics a way to go would be to just throw out all games
that have the same moves, forgetting about all other
metadata. (This will, however, kill all crosstables and so
on.)

> 7 - replace the "Original" name by the  "Recommended" name

I do not get this, as your list of recommended names is
longer than one. In your example: is "Karpov, A" replaced by
the "Anatoly", "Alexander" or "Alexey"? Surely you wouldn't
want to replace "Karpov, Anatoly" by "Karpov, A", or?

Anyway, if I get you right, your idea sound good and can work
if your header flags are quite good. Ie. you have the same
spellings for all your trounaments and so on. I fear this is
not the case and think one would have to work on the actual
game moves instead. Then it is pretty much the idea of VIAF
(while they do not have the equivalent of the game moves,
but decent metadata.)

> I dont know how manage the scid bases :(  -------->(help?)

I fear for a really good deduping of a base you're left with
hand weeding. :( Especially if you want to have complete
tournaments in your base to allow for crosstables and so on.
There's not real automatism to help you here.

Game metadata in chess is "problematic", at best. E.g. take
"The Source"(tm), it has probably the worst game data
arround and you can be happy if they wrote the player names
correctly. Still, they only print the family name. Not even
the initials. You get something like

"Anand 2770 - Tkachiev 2632 Moscow (m/1) 2001"

:S

Even if you get better data transcriptions from the cyrillic
scripts are very difficult in our game.

E.g. you spell Karpo_v_ in German bases/literature you'll
more likely find Karpo_w_. I never saw the cyrillic chars
really transliterated, only transcribed. :( To give an
impression of the problem you might want to have a look at
"Karpov, Anatoly (1951-)":

http://www.viaf.org/viaf/95195768/marc21.xml

All category 700 lines are "Established Heading Linking
Entry - Personal Name". You've about 8(!) The fun, however,
starts at category 400 which is a list of other spellings of
the very same name. And beware, this compilation was done by
librarians all over the world, people who stick to quite
strict rules how to write a non-latin-entity in the local
charset. Most people don't invest that much effort.

-- 

Kind regards,                /                 War is Peace.
                             |            Freedom is Slavery.
Alexander Wagner            |         Ignorance is Strength.
                             |
                             | Theory     : G. Orwell, "1984"
                            /  In practice:   USA, since 2001


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Scid-users mailing list
Scid-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scid-users

Reply via email to