I've implemented the PGN export extensions because it is required for writing proper PGN files. (Also later this extension will be used to write C/CIF archives). I think that a more detailed description about the problem and the solution of this problem is helpful:
------------------------------------------------------------------------------------------- The actual database version 4.0 of Scid has still some weakness concerning the internationalization, the data (player name, site name, event name, comments inside move data, etc.) might be stored with any character set encoding, depending on the following situations: 1. Older Linux/Unix distributions are installed with Latin-1 encoding as default, and the strings has been stored with Latin-1 character set because older Tcl libraries did not support Unicode. 2. Newer Linux/Unix distributions are installed with UTF-8 encoding as default, this means that all strings will be stored with UTF-8 encoding. 3. Many applications have produced PGN files with unsuitable character encodings (including Scid), it is not seldom that a PGN file has extended ASCII (CP850 for example), or it is UTF-8 encoded, but without a leading UTF-8 BOM. While importing PGN files Scid is interpreting the content as system encoded, and this may result in defect encodings in such cases. Often the text content of these games cannot be displayed correctly. 4. In some older databases the data was stored with Latin-1, and has changed in newer games to UTF-8 because of an upgrade of the Tcl library version. The database is now a mix of different encodings. 5. The import of PGN files is interpreting Latin-1 as UTF-8, thus all data will be stored as Latin-1 encoded. 6. Older Windoze versions have stored CP1252 encoded data, but newer Windoze versions are storing UTF-8 (depending on the Tcl library version). This has an impact on the export of PGN files, quite often the written data is not properly Latin-1 encoded, and this is a violation of the PGN standard. Moreover the Latin-1 character set can be unsuitable, for example when exporting Russian comments to PGN with Latin-1 encoding the content will be unreadable, in this case an export to UTF-8 is required. (PGN files with UTF-8 encoding are not conform to the PGN standard, but ChessBase has introduced the UTF-8 encoding with a leading UTF-8 BOM at the start of the file to mark the file content as UTF-8. This is now a de-facto standard, most modern chess applications are supporting this extension.) The newer version of Scid vs PC has introduced some enhancements for a proper PGN export: 1. The user can choose between Latin-1 and UTF-8 encoding. Latin-1 will be in general preferred, but in some cases, for example if exporting content with Russian content, Latin-1 is unsuitable and UTF-8 should be used instead. 2. The export will be done with the use of a character set detector. This detector tries to detect the character set of the exported text and is converting the content either to Latin-1 or UTF-8, depending on the user's choice. In many cases this detector is even able to convert the result of defect encodings into a proper character set. Please note that English -speaking countries are in general not affected by these problems, the English characters are embedded in Latin-1, and thus also in UTF-8, but nearly the rest of the world is affected. ------------------------------------------------------------------------------------------- Gregor ------------------------------------------------------------------------------ Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business. Configured For All Businesses. Start Your Cloud Today. https://www.gigenetcloud.com/ _______________________________________________ Scidvspc-users mailing list Scidvspc-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scidvspc-users