Re: [sqlite] Re: Unicode Help
On 12/8/06, Kees Nuyt <[EMAIL PROTECTED]> wrote: On Fri, 8 Dec 2006 15:54:45 +, you wrote: > How do you set Notepad to Ecnoding = Unicode. > I cant see an option for that ? Perhaps it listens to a BOM? It does, and will also try heuristics to detect the encoding if no BOM is present. But, what I was referring to is File->Open; there's a dropdown at the bottom to choose the encoding type. Anyway, glad you got it sorted :) - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
On Fri, 8 Dec 2006 15:54:45 +, you wrote: > How do you set Notepad to Ecnoding = Unicode. > I cant see an option for that ? Perhaps it listens to a BOM? http://unicode.org/unicode/faq/utf_bom.html#22 It would mean you have to initialize your textfile before editing with some utility like awk: BOF file initutf.cmd linewrapped by mail !! @echo off echo Build a few common BOM prefixed UTF files echo BOM for UTF-8 awk "BEGIN{printf(\"\xEF\xBB\xBFUTF-8\"); exit 0}" >utf8.txt echo BOM for UTF-16 Little Endian awk "BEGIN{printf(\"\xFF\xFE\x55\x00\x54\x00\x46\x00\x2D\x00\x31\x00\x36\x00\x4C\x00\x45\x00\"); exit 0}" >utf16LE.txt echo BOM for UTF-16 Big Endian awk "BEGIN{printf(\"\xFE\xFF\x00\x55\x00\x54\x00\x46\x00\x2D\x00\x31\x00\x36\x00\x42\x00\x45\"); exit 0}" >utf16BE.txt EOF file initutf.cmd (tested, works with notepad.exe v5.1.2600.2180 Dutch) HTH -- ( Kees Nuyt ) c[_] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
EUREKA! Ok I got it working now. It turns out my source was UTF-8 Encoded, so even when i used the utf-16 functions it wasnt comming out right. I am now doing a converstion in delphi from UTF-8 to UTF16 and using all UTF-16 sqlite functions as recommended. Thanks a million for all your help, it was all your suggestions which lead me to the solution. Much appreciated. Have a good weekend. S On 12/8/06, Trevor Talbot <[EMAIL PROTECTED]> wrote: On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote: > Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully > unicode enabled (I beleive). I use WideStrings through out the entire > pipeline from xml I recieve into SQLite via the prepare16 back out through > column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to > return PWideChar which is then copied via System.Move into a widestring as > follows: [ DLL interfaces ] > Previously (before my langauge headaches :-) ) I was doing the above > without the APIs ending in 16, and everything was string and PChar in the > above layer. The layer that used this class has always had "WideString". > > I realise your probably not delphi pros, but if you do spot something stupid > I am doing I would appreciate any help you can offer. I've never used Delphi, "but I did sleep at a Holiday Inn last night"... It looks fine to me. To help check it, one thing you can try is writing the result of FieldAsString directly to a file as raw bytes, then in notepad open that with "encoding" set to "Unicode". E.g. something logically equivalent to: size := Length(field) * 2; SetLength(buffer, size ); System.Move(field^, buffer^, size); file.Write(buffer, size); I imagine you don't have to jump through hoops like that, but hopefully you see what I have in mind. If the result looks good in notepad, then you know this layer is fine, so the problem must be closer to the display layer. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
It looks fine to me. To help check it, one thing you can try is writing the result of FieldAsString directly to a file as raw bytes, then in notepad open that with "encoding" set to "Unicode". E.g. something logically equivalent to: size := Length(field) * 2; SetLength(buffer, size ); System.Move(field^, buffer^, size); file.Write(buffer, size); I imagine you don't have to jump through hoops like that, but hopefully you see what I have in mind. If the result looks good in notepad, then you know this layer is fine, so the problem must be closer to the display layer. Hi How do you set Notepad to Ecnoding = Unicode. I cant see an option for that ?
Re: [sqlite] Re: Unicode Help
On Dec 5, 2006, at 8:42 AM, Igor Tandetnik wrote: Da Martian <[EMAIL PROTECTED]> wrote: So if I look at a name with umlaughts in the database via sqlite3.exe I get: St├ñdt. Klinikum Neunkirchen gGmbH -- | an "a" with two dots on top "A with umlaut" is represented as two bytes in UTF-8. This is a huge simplification. At a bare minimum, 'ä' can be represented as either one or two Unicode code points -- one code point represented 'ä' or one representing 'a' and one representing the '¨' combining mark. How *that* is represented in the UTF-8 encoding of Unicode is another issue, that depends on the exact values of the code points involved. The particular example of 'ä' be represented as two bytes in UTF-8 in both cases (I don't know offhand) but that's not something that can be generalized. -- Chris - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote: Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully unicode enabled (I beleive). I use WideStrings through out the entire pipeline from xml I recieve into SQLite via the prepare16 back out through column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to return PWideChar which is then copied via System.Move into a widestring as follows: [ DLL interfaces ] Previously (before my langauge headaches :-) ) I was doing the above without the APIs ending in 16, and everything was string and PChar in the above layer. The layer that used this class has always had "WideString". I realise your probably not delphi pros, but if you do spot something stupid I am doing I would appreciate any help you can offer. I've never used Delphi, "but I did sleep at a Holiday Inn last night"... It looks fine to me. To help check it, one thing you can try is writing the result of FieldAsString directly to a file as raw bytes, then in notepad open that with "encoding" set to "Unicode". E.g. something logically equivalent to: size := Length(field) * 2; SetLength(buffer, size ); System.Move(field^, buffer^, size); file.Write(buffer, size); I imagine you don't have to jump through hoops like that, but hopefully you see what I have in mind. If the result looks good in notepad, then you know this layer is fine, so the problem must be closer to the display layer. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
I am still having issues trying to get my charaters standardizsed. I spent > all of yesterday playing with ideas but it still in the dark. Whatever you were doing the first time was fine: I have been having that very thought! So if I look at a name with umlaughts in the database via sqlite3.exe I get: > > Städt. Klinikum Neunkirchen gGmbH > -- > | > an "a" with two dots on top That text was properly encoded as UTF-8. The ONLY issue with that line is that the sqlite shell under Windows is incapable of displaying Unicode, so you need to retrieve the data from sqlite using a tool that is. The actual storage of it is perfect. I know the console sqlite3 wont show it. The delphi control I am using does unicode. > If I add the text using the *16 prepare and then retrieve it using the *16 > column_text, I still get the two seperate characters instead of the umlaught > thingie. So I can only assume that somehow my source isnt UTF-16. or I am > converting it somewhere in the middle. This is possible since I am using > Delphi and it has some implicit convertions, but I think I have got that > under control. AFAIK Delphi has no built-in Unicode support at all; you will need to find third-party support for everything, from processing to display controls. It is likely you are ending up with UTF-8 data at some point in the pipeline, and whatever you're doing to process it does not understand UTF-8. Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully unicode enabled (I beleive). I use WideStrings through out the entire pipeline from xml I recieve into SQLite via the prepare16 back out through column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to return PWideChar which is then copied via System.Move into a widestring as follows: Show code :) * DLL * function sqlite3_open(filename: PWideChar; var db: pointer): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_open16'; function sqlite3_close(db: pointer): integer; cdecl; external 'sqlite3.dll '; function sqlite3_exec(db: pointer; sql: PWideChar; callback: pointer; userdata: PWideChar; var errmsg: PWideChar): integer; cdecl; external ' sqlite3.dll' name 'sqlite3_exec16'; procedure sqlite3_free(ptr: PWideChar); cdecl; external 'sqlite3.dll'; function sqlite3_prepare(db: pointer; sql: PWideChar; nBytes: integer; var stmt: pointer; var ztail: PWideChar): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_prepare16'; function sqlite3_column_bytes(stmt: pointer; col: integer): integer; cdecl; external 'sqlite3.dll' name 'sqlite3_column_bytes16'; function sqlite3_column_text(stmt: pointer; col: integer): PWideChar; cdecl; external 'sqlite3.dll' name 'sqlite3_column_text16'; *CLASS* function TSqliteQueryResults.FieldAsString(i: integer): WideString; var size: integer; temp: PWideChar; begin size := FieldSize(i); SetLength(result, size div 2); temp := sqlite3_column_text(Fstmt, i); System.Move(sqlite3_column_text(Fstmt, i)^, PWideChar(result)^, size); end; function TSqliteQueryResults.FieldSize(i: integer): integer; begin result := sqlite3_column_bytes(Fstmt, i); end; * END Previously (before my langauge headaches :-) ) I was doing the above without the APIs ending in 16, and everything was string and PChar in the above layer. The layer that used this class has always had "WideString". I realise your probably not delphi pros, but if you do spot something stupid I am doing I would appreciate any help you can offer. Thanks,
Re: [sqlite] Re: Unicode Help
On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote: I am still having issues trying to get my charaters standardizsed. I spent all of yesterday playing with ideas but it still in the dark. Whatever you were doing the first time was fine: So if I look at a name with umlaughts in the database via sqlite3.exe I get: Städt. Klinikum Neunkirchen gGmbH -- | an "a" with two dots on top That text was properly encoded as UTF-8. The ONLY issue with that line is that the sqlite shell under Windows is incapable of displaying Unicode, so you need to retrieve the data from sqlite using a tool that is. The actual storage of it is perfect. Part of my problem is I dont have a clue what my source data is encoded as. Does anyone know of a tool which can try and guess the encoding? Basically its a custom java bean written by someone else. It takes reports from a third party system turns them into XML using a string buffer. They just append everything to a string buffer. The code which actually adds this to the output (the key peice) I cant actually see at this point. So by my best guess based on research is that java usually uses UTF-16. But if this is so, it should work. It sounds as though it is UTF-16 and working fine. If I add the text using the *16 prepare and then retrieve it using the *16 column_text, I still get the two seperate characters instead of the umlaught thingie. So I can only assume that somehow my source isnt UTF-16. or I am converting it somewhere in the middle. This is possible since I am using Delphi and it has some implicit convertions, but I think I have got that under control. AFAIK Delphi has no built-in Unicode support at all; you will need to find third-party support for everything, from processing to display controls. It is likely you are ending up with UTF-8 data at some point in the pipeline, and whatever you're doing to process it does not understand UTF-8. The problem is if I copy my source and paste it into Notepad say, it shows correctly cause notepad then does it own stuff, and if I save the notepad and read that it works fine. *sigh*. Notepad does support Unicode in various encodings, but that doesn't mean anything in this test, since your system codepage may well support the characters you're testing with anyway. 2) When using the NON16 version of prepare: If I add text which is in UTF16 what happens? 16 Version: If I add UTF16 text what happnes? if I add UTF-8 Text what happens? if I add ASCIII text what happnes? The answers to these depend on exactly how you're interfacing with it (what programming language, how the sqlite library functions are defined/declared, any use of library tools or auto-conversion semantics in the language, etc). Show code :)
Re: [sqlite] Re: Unicode Help
I think std function for convertions would be very helpful. I am still having issues trying to get my charaters standardizsed. I spent all of yesterday playing with ideas but it still in the dark. Part of my problem is I dont have a clue what my source data is encoded as. Does anyone know of a tool which can try and guess the encoding? Basically its a custom java bean written by someone else. It takes reports from a third party system turns them into XML using a string buffer. They just append everything to a string buffer. The code which actually adds this to the output (the key peice) I cant actually see at this point. So by my best guess based on research is that java usually uses UTF-16. But if this is so, it should work. If I add the text using the *16 prepare and then retrieve it using the *16 column_text, I still get the two seperate characters instead of the umlaught thingie. So I can only assume that somehow my source isnt UTF-16. or I am converting it somewhere in the middle. This is possible since I am using Delphi and it has some implicit convertions, but I think I have got that under control. The problem is if I copy my source and paste it into Notepad say, it shows correctly cause notepad then does it own stuff, and if I save the notepad and read that it works fine. *sigh*. So my questions are: 1) Any tools to determine encoding based on datas? 2) When using the NON16 version of prepare: If I add text which is in UTF16 what happens? 16 Version: If I add UTF16 text what happnes? if I add UTF-8 Text what happens? if I add ASCIII text what happnes? Thanks,
Re: [sqlite] Re: Unicode Help
Nicolas Williams wrote: On Wed, Dec 06, 2006 at 10:06:12AM -0600, John Stanton wrote: Marten Feldtmann wrote: But Tcl is not part of SQLite (and this is good) - this is just an add-on. The idea with the additional functions are pretty good ! How does Sqlite become Sqbloated? By function creep, one little step at a time. Well, let's see. Folks would implement user functions, and promoting code re-use would be good. OTOH keeping the SQLite core small is also good. So why not have a library of non-core user functions? That's almost like having them in the core disabled by default and with options to include them. Nico Isn't that how Sqlite is designed and how it is used already? User functions are dynamically linked when used but otherwise do not bloat the core. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
On Wed, Dec 06, 2006 at 10:06:12AM -0600, John Stanton wrote: > Marten Feldtmann wrote: > >But Tcl is not part of SQLite (and this is good) - this is just an > >add-on. The idea with the > >additional functions are pretty good ! > > > How does Sqlite become Sqbloated? By function creep, one little step at > a time. Well, let's see. Folks would implement user functions, and promoting code re-use would be good. OTOH keeping the SQLite core small is also good. So why not have a library of non-core user functions? That's almost like having them in the core disabled by default and with options to include them. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
Marten Feldtmann wrote: Ulrich Schöbel schrieb: SQLite includes a Tcl API. Tcl does all these conversions with ease. See the encoding convertto/convertfrom commands and fconfigure But Tcl is not part of SQLite (and this is good) - this is just an add-on. The idea with the additional functions are pretty good ! Marten - To unsubscribe, send email to [EMAIL PROTECTED] - How does Sqlite become Sqbloated? By function creep, one little step at a time. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
Ulrich Schöbel schrieb: SQLite includes a Tcl API. Tcl does all these conversions with ease. See the encoding convertto/convertfrom commands and fconfigure But Tcl is not part of SQLite (and this is good) - this is just an add-on. The idea with the additional functions are pretty good ! Marten - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
SQLite includes a Tcl API. Tcl does all these conversions with ease. See the encoding convertto/convertfrom commands and fconfigure. On Tuesday 05 December 2006 20:42, Nicolas Williams wrote: > On Tue, Dec 05, 2006 at 06:53:28PM +0100, Marten Feldtmann wrote: > > Perhaps it would be nice to change sqlite3 in that way, that (when > > columns with storage class text) these columns are converted to the host > > platform code page. But actually even in that situation you may have > > strings, which are not displayable on your screen - because you have no > > suitable font. > > No, but having built-in functions that can do codeset conversion would > be nice. > > -- convert from the default SQLite codeset/encoding (UTF-8) to a given > -- codeset > select iconv(foo, NULL, 'ISO-8859-1') from ...; > > -- convert to a codeset given by some row column > select iconv(foo, from_cs, to_cs) from ...; > > -- convert to the current locale's codeset > select iconv(foo, NULL, NULL) from ...; > > And functions for Unicode normalization and what not would be nice as > well. > > Nico - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
Hello Nicolas Williams, >No, but having built-in functions that can do codeset conversion would >be nice. SQLiteSpy can do this: http://www.yunqa.de/delphi/sqlitespy/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
On Tue, Dec 05, 2006 at 06:53:28PM +0100, Marten Feldtmann wrote: > Perhaps it would be nice to change sqlite3 in that way, that (when columns > with storage class text) these columns are converted to the host platform > code page. But actually even in that situation you may have strings, which > are not displayable on your screen - because you have no suitable font. No, but having built-in functions that can do codeset conversion would be nice. -- convert from the default SQLite codeset/encoding (UTF-8) to a given -- codeset select iconv(foo, NULL, 'ISO-8859-1') from ...; -- convert to a codeset given by some row column select iconv(foo, from_cs, to_cs) from ...; -- convert to the current locale's codeset select iconv(foo, NULL, NULL) from ...; And functions for Unicode normalization and what not would be nice as well. Nico -- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Unicode Help
Igor Tandetnik schrieb: Da Martian <[EMAIL PROTECTED]> wrote: So if I look at a name with umlaughts in the database via sqlite3.exe I get: Städt. Klinikum Neunkirchen gGmbH -- | an "a" with two dots on top "A with umlaut" is represented as two bytes in UTF-8. sqlite3.exe just dumps these bytes onto the console, which is not smart enough to interpret them as UTF-8 sequence. In other words, the data in the databse is fine, it's just displayed incorrectly (in the wrong codepage). You don't need to worry. So I tried the *16 versions, but now the field size returned by "sqlite3_column_bytes16" always seems to be larger than the string I get back resulting in junk characters on the end. In my storage framework for VASmalltalk I have to convert all my strings - to get all the stuff right. Each string you work with is not only characterized by the characters you have, but also the code page the string is defined in. In general this is very often the code page of the operating system (under Windows 1252, under OS/2 850 - using them in a German locale). To make it more complicate it is also needed to have a suitable font for that code page. VASmalltalk for example works internally with code page 819 on all available platforms, but under Windows it has to support the code page 1252. Therefore when storing strings within this IDE one has to convert the string from code page 819 to UTF-8. UTF-8 is a special code page under Windows: 65001. Therefore I convert the strings from 819 to 65001 and then send this converted string to the API call. Perhaps it would be nice to change sqlite3 in that way, that (when columns with storage class text) these columns are converted to the host platform code page. But actually even in that situation you may have strings, which are not displayable on your screen - because you have no suitable font. Marten - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Re: Unicode Help
Da Martian <[EMAIL PROTECTED]> wrote: So if I look at a name with umlaughts in the database via sqlite3.exe I get: Städt. Klinikum Neunkirchen gGmbH -- | an "a" with two dots on top "A with umlaut" is represented as two bytes in UTF-8. sqlite3.exe just dumps these bytes onto the console, which is not smart enough to interpret them as UTF-8 sequence. In other words, the data in the databse is fine, it's just displayed incorrectly (in the wrong codepage). You don't need to worry. So I tried the *16 versions, but now the field size returned by "sqlite3_column_bytes16" always seems to be larger than the string I get back resulting in junk characters on the end. Show how you put the data in, and how you get it back out. Realize that terminatng NUL character is not stored nor retrieved from the DB: you might see garbage at the end simply because your string is not NUL-terminated. Igor Tandetnik - To unsubscribe, send email to [EMAIL PROTECTED] -