Esteban Cervetto wrote: Hi!
> I would like to do something with the ambigüous names in my base. Who wouldn't ;) > My idea is perform the next algoritm > > 1 - list all the games with ambiguous names (ex. "Karpov, > A."), mark by "Original" Currently we do not have a function for this. I think that it is possible to write one if we find a volunteer. If I get you correctly you want to expand "A." to the correct christian name in the end, right? This is "callenging". > 2 - list all the games recommended in each ok the > ambiguous names (ex. "Karpov, Anatoly.", "Karpov, > Alexander", "Karpov, Alexey") , mark by "Recommended" I do not understand that precisely. Do you mean to flag all games played by Karpov, Anatoly and Karpov, Alexander and Karpov, Alexey? > 3 - Merge 1 and 2 and get a list of names "N" I can not really follow you. Do you mean to end up with a list of player names like Karpov, A Original Karpov, Anatoly Recommended Karpov, Alexander Recommended Karpov, Alexey Recommended Isn't this just $ grep -e "^Karpov, A" ratings.ssp giving something like Karpov, Aleksey #- RUS [1986] 1987 Karpov, Alexander #im RUS/UZB [2425] 1959 Karpov, Alexander N #- RUS [1944] 1936 Karpov, Alexey #- UKR [2120] 1978 Karpov, Anatoly #gm URS/RUS [2780] 1951 Karpov, Andrey Ivan #- RUS [2159] 1966 Karpov, Arkadiy #- RUS [1988] 1959 > 4 - Perform a query in the database and obtain Color, > Opponent, year, tournament and result, where any of the > players is in "N". This is not done yet (except by the work arround I described recently with "My Player names"). Given some time and a coder it is possible, however. It's even not too difficult I think. > 5 - Perform in this table a Group by oponent, tournament, > color and year query Form my experience in the usual sources, player names are a mess. Tournament names and dates are hell depending on the source even the result is sort of an issue. Therefore, I fear it is not reliable to group by tournament name. (Even for the major ones it is hardly possible to get them unique in an automatic fassion.) I remember e.g. this nice city called Chalkidiki... I don't know how many spellings I have (hopefully "had") around. Trounaments should only be held in places like Washington, London, Paris, Berlin... No umlauts, no funny chars, large enough to have the same spelling all over the world and by default in latin glyphs, ie. 7bit ASCII. Moscow starts already to become difficult. > 6 - when this count ==2 means that there are a duplicate > game Wouldn't comparing the moves be a better criterion? Ie. take your final list of games and then check which games have the same moves. It is not perfect either as I've some games around that are obviously the same but it seems that the move order is not clear througout the literature. Then this could be performed by the dupe checker. One could use a weaker criterion by checking the position at various points in the game instead of the move order to overcome this. Say if I have the same position in 9 out of 10 points + the same result + ... I could end up with some score to identify the dupes. Probably some bayesian statistics, like a spam filter. Just an idea, however. Nothing done in that direction at all. Anyway, if you only want to have a base with almost valid statistics a way to go would be to just throw out all games that have the same moves, forgetting about all other metadata. (This will, however, kill all crosstables and so on.) > 7 - replace the "Original" name by the "Recommended" name I do not get this, as your list of recommended names is longer than one. In your example: is "Karpov, A" replaced by the "Anatoly", "Alexander" or "Alexey"? Surely you wouldn't want to replace "Karpov, Anatoly" by "Karpov, A", or? Anyway, if I get you right, your idea sound good and can work if your header flags are quite good. Ie. you have the same spellings for all your trounaments and so on. I fear this is not the case and think one would have to work on the actual game moves instead. Then it is pretty much the idea of VIAF (while they do not have the equivalent of the game moves, but decent metadata.) > I dont know how manage the scid bases :( -------->(help?) I fear for a really good deduping of a base you're left with hand weeding. :( Especially if you want to have complete tournaments in your base to allow for crosstables and so on. There's not real automatism to help you here. Game metadata in chess is "problematic", at best. E.g. take "The Source"(tm), it has probably the worst game data arround and you can be happy if they wrote the player names correctly. Still, they only print the family name. Not even the initials. You get something like "Anand 2770 - Tkachiev 2632 Moscow (m/1) 2001" :S Even if you get better data transcriptions from the cyrillic scripts are very difficult in our game. E.g. you spell Karpo_v_ in German bases/literature you'll more likely find Karpo_w_. I never saw the cyrillic chars really transliterated, only transcribed. :( To give an impression of the problem you might want to have a look at "Karpov, Anatoly (1951-)": http://www.viaf.org/viaf/95195768/marc21.xml All category 700 lines are "Established Heading Linking Entry - Personal Name". You've about 8(!) The fun, however, starts at category 400 which is a list of other spellings of the very same name. And beware, this compilation was done by librarians all over the world, people who stick to quite strict rules how to write a non-latin-entity in the local charset. Most people don't invest that much effort. -- Kind regards, / War is Peace. | Freedom is Slavery. Alexander Wagner | Ignorance is Strength. | | Theory : G. Orwell, "1984" / In practice: USA, since 2001 ------------------------------------------------------------------------------ This SF.net email is sponsored by Sprint What will you do first with EVO, the first 4G phone? Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first _______________________________________________ Scid-users mailing list Scid-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scid-users