I would caution against not correcting Le Quang Liem >> Le, Quang Liem, even if it objectively correct. It seems as though, even if incorrect, Ding, Liren, for example, is the convention in the chess world. For example, TWIC pgns use this style. For this reason, if spellcheck is going to consider Le Quang Liem correct, then it would be important to change Le, Quang Liem >> Le Quang Liem, at least to make delete twin games work correctly. I’m not sure how one would determine if a name should be written in the Chinese style or the western style. Checking if federation is CHN or VIE (or maybe other countries) wouldn’t work, as there are people with western-style names in China and people with Chinese-style names in other countries. It also wouldn’t work to check if the last name is among the 10,000 most common Chinese surnames, as there are people like the american player Joshua Mu, where Mu, Joshua is better than Mu Joshua. I think it is better to leave this as it is. Thanks, Matthew Larson
> On Jun 12, 2015, at 2:34 AM, Steve A <stevena...@gmail.com> wrote: > > >>> I am interested in manually improving the spelling.ssp file. I created my > >>> reference database by importing every pgn I could find into one base, and > >>> then using spellcheck and delete twin games to get rid of duplicates. I > >>> have noticed that spellcheck does not correct some East Asian names (e.g. > >>> Le Quang Liem does not go to Le, Quang Liem) and hyphenated or names with > >>> spaces in them (e.g. DeFirmian, Nick does not got to De Firmian, Nick). > >>> Although I wouldn’t be able to find everything like this, I could correct > >>> the spelling.ssp file when I run into something like this (by finding a > >>> duplicate game not deleted because the names are different). > > Yah - improving our spelling correction is on the todo list. But i'm not sure > if the spelling file is too bad. Franz has just released a new version of it > (which i patch a little for release with ScidvsPC). > https://sourceforge.net/projects/scid/files/Player%20Data/Latest%20data/ > <https://sourceforge.net/projects/scid/files/Player%20Data/Latest%20data/> > > Re not adding a comma to Asian names, i found this on wikipedia > http://en.wikipedia.org/wiki/Chinese_name > <http://en.wikipedia.org/wiki/Chinese_name> > "According to the Chicago Manual of Style, Chinese names are indexed by the > family name with no inversion and no comma" > I am not familiar if (for eg) Vietnamese names follow a similiar convention. > > Do you/anyone have any other issues with spelling.ssp ? > > We *do* need a list of improvements to be done to improve our spell checker > (I probably would leave chinese names without a comma). Things that come to > mind are names with a comma but no space, and name capitalisation. > > > I think I have discovered what seems to be a more important issue. It seems > > as though ScidvsMac is not able to correct more than 2000 names at a time. > > So, when I run spellcheck player names on my reference database, it detects > > 260,000 corrections, but when I hit Make corrections it only corrects 2000 > > times. > > Yes - this need investigating/addressing. I was not aware of it. > > > The spellcheck feature works beautifully, except that it doesn't remove > > (wh) and (bl) at the end of the player name, which could be fixed easily by > > adding a %Suffix " (wh)" "" line. > > Sounds reasonable, though i have never seen these used myself. > > Steven
------------------------------------------------------------------------------
_______________________________________________ Scidvspc-users mailing list Scidvspc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scidvspc-users