Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Trevor Talbot

On 12/8/06, Kees Nuyt <[EMAIL PROTECTED]> wrote:

On Fri, 8 Dec 2006 15:54:45 +, you wrote:

> How do you set Notepad to Ecnoding = Unicode.
> I cant see an option for that ?

Perhaps it listens to a BOM?


It does, and will also try heuristics to detect the encoding if no BOM
is present.  But, what I was referring to is File->Open; there's a
dropdown at the bottom to choose the encoding type.

Anyway, glad you got it sorted :)

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Kees Nuyt
On Fri, 8 Dec 2006 15:54:45 +, you wrote:

> How do you set Notepad to Ecnoding = Unicode.
> I cant see an option for that ?

Perhaps it listens to a BOM?
http://unicode.org/unicode/faq/utf_bom.html#22

It would mean you have to initialize your textfile before
editing with some utility like awk:

 BOF file initutf.cmd  linewrapped by mail !!
@echo off
echo Build a few common BOM prefixed UTF files

echo BOM for UTF-8
awk "BEGIN{printf(\"\xEF\xBB\xBFUTF-8\"); exit 0}" >utf8.txt

echo BOM for UTF-16 Little Endian
awk
"BEGIN{printf(\"\xFF\xFE\x55\x00\x54\x00\x46\x00\x2D\x00\x31\x00\x36\x00\x4C\x00\x45\x00\");
exit 0}" >utf16LE.txt

echo BOM for UTF-16 Big Endian
awk
"BEGIN{printf(\"\xFE\xFF\x00\x55\x00\x54\x00\x46\x00\x2D\x00\x31\x00\x36\x00\x42\x00\x45\");
exit 0}" >utf16BE.txt

 EOF file initutf.cmd 

(tested, works with notepad.exe v5.1.2600.2180 Dutch)

HTH
-- 
  (  Kees Nuyt
  )
c[_]

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Da Martian

EUREKA!

Ok I got it working now. It turns out my source was UTF-8 Encoded, so even
when i used the utf-16 functions it wasnt comming out right. I am now doing
a converstion in delphi from UTF-8 to UTF16 and using all UTF-16 sqlite
functions as recommended.

Thanks a million for all your help, it was all your suggestions which lead
me to the solution.

Much appreciated.

Have a good weekend.

S


On 12/8/06, Trevor Talbot <[EMAIL PROTECTED]> wrote:


On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote:

> Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully
> unicode enabled (I beleive). I use WideStrings through out the entire
> pipeline from xml I recieve into SQLite via the prepare16 back out
through
> column_text16 into virtual tree. Well thats true, the SQL APIs are
mapped to
> return PWideChar which is then copied via System.Move into a widestring
as
> follows:

[ DLL interfaces ]

> Previously (before my langauge headaches :-)  ) I was doing the above
> without the APIs ending in 16, and everything was string and PChar in
the
> above layer. The layer that used this class has always had "WideString".
>
> I realise your probably not delphi pros, but if you do spot something
stupid
> I am doing I would appreciate any help you can offer.

I've never used Delphi, "but I did sleep at a Holiday Inn last night"...

It looks fine to me.  To help check it, one thing you can try is
writing the result of FieldAsString directly to a file as raw bytes,
then in notepad open that with "encoding" set to "Unicode".  E.g.
something logically equivalent to:

  size := Length(field) * 2;
  SetLength(buffer, size );
  System.Move(field^, buffer^, size);
  file.Write(buffer, size);

I imagine you don't have to jump through hoops like that, but
hopefully you see what I have in mind.  If the result looks good in
notepad, then you know this layer is fine, so the problem must be
closer to the display layer.


-
To unsubscribe, send email to [EMAIL PROTECTED]

-




Re: [sqlite] Re: Unicode Help

2006-12-08 Thread Da Martian



It looks fine to me.  To help check it, one thing you can try is
writing the result of FieldAsString directly to a file as raw bytes,
then in notepad open that with "encoding" set to "Unicode".  E.g.
something logically equivalent to:

  size := Length(field) * 2;
  SetLength(buffer, size );
  System.Move(field^, buffer^, size);
  file.Write(buffer, size);

I imagine you don't have to jump through hoops like that, but
hopefully you see what I have in mind.  If the result looks good in
notepad, then you know this layer is fine, so the problem must be
closer to the display layer.



Hi

How do you set Notepad to Ecnoding = Unicode. I cant see an option for that
?


Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Chris Hanson

On Dec 5, 2006, at 8:42 AM, Igor Tandetnik wrote:


Da Martian <[EMAIL PROTECTED]> wrote:

So if I look at a name with umlaughts in the database via sqlite3.exe
I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top


"A with umlaut" is represented as two bytes in UTF-8.


This is a huge simplification.  At a bare minimum, 'ä' can be  
represented as either one or two Unicode code points -- one code point  
represented 'ä' or one representing 'a' and one representing the '¨'  
combining mark.  How *that* is represented in the UTF-8 encoding of  
Unicode is another issue, that depends on the exact values of the code  
points involved.


The particular example of 'ä' be represented as two bytes in UTF-8 in  
both cases (I don't know offhand) but that's not something that can be  
generalized.


  -- Chris


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Trevor Talbot

On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote:


Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully
unicode enabled (I beleive). I use WideStrings through out the entire
pipeline from xml I recieve into SQLite via the prepare16 back out through
column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to
return PWideChar which is then copied via System.Move into a widestring as
follows:


[ DLL interfaces ]


Previously (before my langauge headaches :-)  ) I was doing the above
without the APIs ending in 16, and everything was string and PChar in the
above layer. The layer that used this class has always had "WideString".

I realise your probably not delphi pros, but if you do spot something stupid
I am doing I would appreciate any help you can offer.


I've never used Delphi, "but I did sleep at a Holiday Inn last night"...

It looks fine to me.  To help check it, one thing you can try is
writing the result of FieldAsString directly to a file as raw bytes,
then in notepad open that with "encoding" set to "Unicode".  E.g.
something logically equivalent to:

 size := Length(field) * 2;
 SetLength(buffer, size );
 System.Move(field^, buffer^, size);
 file.Write(buffer, size);

I imagine you don't have to jump through hoops like that, but
hopefully you see what I have in mind.  If the result looks good in
notepad, then you know this layer is fine, so the problem must be
closer to the display layer.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Da Martian

I am still having issues trying to get my charaters standardizsed. I spent
> all of yesterday playing with ideas but it still in the dark.

Whatever you were doing the first time was fine:



I have been having that very thought!


So if I look at a name with umlaughts in the database via sqlite3.exe I
get:
>
> Städt. Klinikum Neunkirchen gGmbH
>   --
>   |
>   an "a" with two dots on top

That text was properly encoded as UTF-8.  The ONLY issue with that
line is that the sqlite shell under Windows is incapable of displaying
Unicode, so you need to retrieve the data from sqlite using a tool
that is.  The actual storage of it is perfect.



I know the console sqlite3 wont show it. The delphi control I am using does
unicode.



> If I add the text using the *16 prepare and then retrieve it using the
*16
> column_text, I still get the two seperate characters instead of the
umlaught
> thingie. So I can only assume that somehow my source isnt UTF-16. or I
am
> converting it somewhere in the middle. This is possible since I am using
> Delphi and it has some implicit convertions, but I think I have got that
> under control.

AFAIK Delphi has no built-in Unicode support at all; you will need to
find third-party support for everything, from processing to display
controls.  It is likely you are ending up with UTF-8 data at some
point in the pipeline, and whatever you're doing to process it does
not understand UTF-8.



Yeah I am currently using VirtualTree from Mikes Delphi Gems. Its fully
unicode enabled (I beleive). I use WideStrings through out the entire
pipeline from xml I recieve into SQLite via the prepare16 back out through
column_text16 into virtual tree. Well thats true, the SQL APIs are mapped to
return PWideChar which is then copied via System.Move into a widestring as
follows:


Show code :)


* DLL *
function  sqlite3_open(filename: PWideChar; var db: pointer): integer;
cdecl; external 'sqlite3.dll' name 'sqlite3_open16';
function  sqlite3_close(db: pointer): integer; cdecl; external 'sqlite3.dll
';
function  sqlite3_exec(db: pointer; sql: PWideChar; callback: pointer;
userdata: PWideChar; var errmsg: PWideChar): integer; cdecl; external '
sqlite3.dll' name 'sqlite3_exec16';
procedure sqlite3_free(ptr: PWideChar); cdecl; external 'sqlite3.dll';
function  sqlite3_prepare(db: pointer; sql: PWideChar; nBytes: integer; var
stmt: pointer; var ztail: PWideChar): integer; cdecl; external 'sqlite3.dll'
name 'sqlite3_prepare16';
function  sqlite3_column_bytes(stmt: pointer; col: integer): integer; cdecl;
external 'sqlite3.dll' name 'sqlite3_column_bytes16';
function  sqlite3_column_text(stmt: pointer; col: integer): PWideChar;
cdecl; external 'sqlite3.dll' name 'sqlite3_column_text16';

*CLASS*
function TSqliteQueryResults.FieldAsString(i: integer): WideString;
var
   size: integer;
 temp: PWideChar;
begin
   size := FieldSize(i);
   SetLength(result, size div 2);
 temp := sqlite3_column_text(Fstmt, i);
   System.Move(sqlite3_column_text(Fstmt, i)^, PWideChar(result)^, size);
end;

function TSqliteQueryResults.FieldSize(i: integer): integer;
begin
   result := sqlite3_column_bytes(Fstmt, i);
end;

* END 

Previously (before my langauge headaches :-)  ) I was doing the above
without the APIs ending in 16, and everything was string and PChar in the
above layer. The layer that used this class has always had "WideString".

I realise your probably not delphi pros, but if you do spot something stupid
I am doing I would appreciate any help you can offer.

Thanks,


Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Trevor Talbot

On 12/7/06, Da Martian <[EMAIL PROTECTED]> wrote:


I am still having issues trying to get my charaters standardizsed. I spent
all of yesterday playing with ideas but it still in the dark.


Whatever you were doing the first time was fine:


So if I look at a name with umlaughts in the database via sqlite3.exe I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top


That text was properly encoded as UTF-8.  The ONLY issue with that
line is that the sqlite shell under Windows is incapable of displaying
Unicode, so you need to retrieve the data from sqlite using a tool
that is.  The actual storage of it is perfect.


Part of my problem is I dont have a clue what my source data is encoded as.
Does anyone know of a tool which can try and guess the encoding? Basically
its a custom java bean written by someone else. It takes reports from a
third party system turns them into XML using a string buffer. They just
append everything to a string buffer. The code which actually adds this to
the output (the key peice) I cant actually see at this point. So by my best
guess based on research is that java usually uses UTF-16. But if this is so,
it should work.


It sounds as though it is UTF-16 and working fine.


If I add the text using the *16 prepare and then retrieve it using the *16
column_text, I still get the two seperate characters instead of the umlaught
thingie. So I can only assume that somehow my source isnt UTF-16. or I am
converting it somewhere in the middle. This is possible since I am using
Delphi and it has some implicit convertions, but I think I have got that
under control.


AFAIK Delphi has no built-in Unicode support at all; you will need to
find third-party support for everything, from processing to display
controls.  It is likely you are ending up with UTF-8 data at some
point in the pipeline, and whatever you're doing to process it does
not understand UTF-8.


The problem is if I copy my source and paste it into Notepad say, it shows
correctly cause notepad then does it own stuff, and if I save the notepad
and read that it works fine. *sigh*.


Notepad does support Unicode in various encodings, but that doesn't
mean anything in this test, since your system codepage may well
support the characters you're testing with anyway.


2)
When using the
NON16 version of prepare:
 If I add text which is in UTF16 what happens?



16 Version:
If I add UTF16 text what happnes?
if I add UTF-8 Text what happens?
if I add ASCIII text what happnes?


The answers to these depend on exactly how you're interfacing with it
(what programming language, how the sqlite library functions are
defined/declared, any use of library tools or auto-conversion
semantics in the language, etc).  Show code :)


Re: [sqlite] Re: Unicode Help

2006-12-07 Thread Da Martian

I think std function for convertions would be very helpful.

I am still having issues trying to get my charaters standardizsed. I spent
all of yesterday playing with ideas but it still in the dark.

Part of my problem is I dont have a clue what my source data is encoded as.
Does anyone know of a tool which can try and guess the encoding? Basically
its a custom java bean written by someone else. It takes reports from a
third party system turns them into XML using a string buffer. They just
append everything to a string buffer. The code which actually adds this to
the output (the key peice) I cant actually see at this point. So by my best
guess based on research is that java usually uses UTF-16. But if this is so,
it should work.

If I add the text using the *16 prepare and then retrieve it using the *16
column_text, I still get the two seperate characters instead of the umlaught
thingie. So I can only assume that somehow my source isnt UTF-16. or I am
converting it somewhere in the middle. This is possible since I am using
Delphi and it has some implicit convertions, but I think I have got that
under control.

The problem is if I copy my source and paste it into Notepad say, it shows
correctly cause notepad then does it own stuff, and if I save the notepad
and read that it works fine. *sigh*.

So my questions  are:

1) Any tools to determine encoding based on datas?
2)
When using the
NON16 version of prepare:
 If I add text which is in UTF16 what happens?

16 Version:
If I add UTF16 text what happnes?
if I add UTF-8 Text what happens?
if I add ASCIII text what happnes?

Thanks,


Re: [sqlite] Re: Unicode Help

2006-12-06 Thread John Stanton

Nicolas Williams wrote:

On Wed, Dec 06, 2006 at 10:06:12AM -0600, John Stanton wrote:


Marten Feldtmann wrote:

But Tcl is not part of SQLite (and this is good) - this is just an 
add-on. The idea with the

additional functions are pretty good !



How does Sqlite become Sqbloated?  By function creep, one little step at 
a time.



Well, let's see.  Folks would implement user functions, and promoting
code re-use would be good.  OTOH keeping the SQLite core small is also
good.

So why not have a library of non-core user functions?

That's almost like having them in the core disabled by default and with
options to include them.

Nico
Isn't that how Sqlite is designed and how it is used already?  User 
functions are dynamically linked when used but otherwise do not bloat 
the core.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-06 Thread Nicolas Williams
On Wed, Dec 06, 2006 at 10:06:12AM -0600, John Stanton wrote:
> Marten Feldtmann wrote:
> >But Tcl is not part of SQLite (and this is good) - this is just an 
> >add-on. The idea with the
> >additional functions are pretty good !
> >
> How does Sqlite become Sqbloated?  By function creep, one little step at 
> a time.

Well, let's see.  Folks would implement user functions, and promoting
code re-use would be good.  OTOH keeping the SQLite core small is also
good.

So why not have a library of non-core user functions?

That's almost like having them in the core disabled by default and with
options to include them.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-06 Thread John Stanton

Marten Feldtmann wrote:

Ulrich Schöbel schrieb:


SQLite includes a Tcl API. Tcl does all these conversions with ease.
See the encoding convertto/convertfrom commands and fconfigure


But Tcl is not part of SQLite (and this is good) - this is just an 
add-on. The idea with the

additional functions are pretty good !

Marten

- 


To unsubscribe, send email to [EMAIL PROTECTED]
- 



How does Sqlite become Sqbloated?  By function creep, one little step at 
a time.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-06 Thread Marten Feldtmann

Ulrich Schöbel schrieb:

SQLite includes a Tcl API. Tcl does all these conversions with ease.
See the encoding convertto/convertfrom commands and fconfigure
But Tcl is not part of SQLite (and this is good) - this is just an 
add-on. The idea with the

additional functions are pretty good !

Marten

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-05 Thread Ulrich Schöbel
SQLite includes a Tcl API. Tcl does all these conversions with ease.
See the encoding convertto/convertfrom commands and fconfigure.

On Tuesday 05 December 2006 20:42, Nicolas Williams wrote:
> On Tue, Dec 05, 2006 at 06:53:28PM +0100, Marten Feldtmann wrote:
> > Perhaps it would be nice to change sqlite3 in that way, that (when
> > columns with storage class text) these columns are converted to the host
> > platform code page. But actually even in that situation you may have
> > strings, which are not displayable on your screen - because you have no
> > suitable font.
>
> No, but having built-in functions that can do codeset conversion would
> be nice.
>
> -- convert from the default SQLite codeset/encoding (UTF-8) to a given
> -- codeset
> select iconv(foo, NULL, 'ISO-8859-1') from ...;
>
> -- convert to a codeset given by some row column
> select iconv(foo, from_cs, to_cs) from ...;
>
> -- convert to the current locale's codeset
> select iconv(foo, NULL, NULL) from ...;
>
> And functions for Unicode normalization and what not would be nice as
> well.
>
> Nico

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-05 Thread Ralf Junker
Hello Nicolas Williams,

>No, but having built-in functions that can do codeset conversion would
>be nice.

SQLiteSpy can do this: http://www.yunqa.de/delphi/sqlitespy/ 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-05 Thread Nicolas Williams
On Tue, Dec 05, 2006 at 06:53:28PM +0100, Marten Feldtmann wrote:
> Perhaps it would be nice to change sqlite3 in that way, that (when columns
> with storage class text) these columns are converted to the host platform
> code page. But actually even in that situation you may have strings, which
> are not displayable on your screen - because you have no suitable font.

No, but having built-in functions that can do codeset conversion would
be nice.

-- convert from the default SQLite codeset/encoding (UTF-8) to a given
-- codeset
select iconv(foo, NULL, 'ISO-8859-1') from ...;

-- convert to a codeset given by some row column
select iconv(foo, from_cs, to_cs) from ...;

-- convert to the current locale's codeset
select iconv(foo, NULL, NULL) from ...;

And functions for Unicode normalization and what not would be nice as
well.

Nico
-- 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Re: Unicode Help

2006-12-05 Thread Marten Feldtmann

Igor Tandetnik schrieb:

Da Martian <[EMAIL PROTECTED]> wrote:

So if I look at a name with umlaughts in the database via sqlite3.exe
I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top


"A with umlaut" is represented as two bytes in UTF-8. sqlite3.exe just 
dumps these bytes onto the console, which is not smart enough to 
interpret them as UTF-8 sequence. In other words, the data in the 
databse is fine, it's just displayed incorrectly (in the wrong 
codepage). You don't need to worry.



So I tried the *16 versions, but now the field size returned by
"sqlite3_column_bytes16" always seems to be larger than the string I
get back resulting in junk characters on the end.


In my storage framework for VASmalltalk I have to convert all my strings 
- to

get all the stuff right.

Each string you work with is not only characterized by the characters you
have, but also the code page the string is defined in.

In general this is very often the code page of the operating system (under
Windows 1252, under OS/2 850 - using them in a German locale).

To make it more complicate it is also needed to have a suitable font for 
that

code page.

VASmalltalk for example works internally with code page 819 on all
available platforms, but under Windows it has to support the code
page 1252.

Therefore when storing strings within this IDE one has to convert the
string from code page 819 to UTF-8. UTF-8 is a special code page under
Windows: 65001. Therefore I convert the strings from 819 to 65001 and
then send this converted string to the API call.

Perhaps it would be nice to change sqlite3 in that way, that (when columns
with storage class text) these columns are converted to the host platform
code page. But actually even in that situation you may have strings, which
are not displayable on your screen - because you have no suitable font.

Marten









-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Re: Unicode Help

2006-12-05 Thread Igor Tandetnik

Da Martian <[EMAIL PROTECTED]> wrote:

So if I look at a name with umlaughts in the database via sqlite3.exe
I get:

Städt. Klinikum Neunkirchen gGmbH
  --
  |
  an "a" with two dots on top


"A with umlaut" is represented as two bytes in UTF-8. sqlite3.exe just 
dumps these bytes onto the console, which is not smart enough to 
interpret them as UTF-8 sequence. In other words, the data in the 
databse is fine, it's just displayed incorrectly (in the wrong 
codepage). You don't need to worry.



So I tried the *16 versions, but now the field size returned by
"sqlite3_column_bytes16" always seems to be larger than the string I
get back resulting in junk characters on the end.


Show how you put the data in, and how you get it back out. Realize that 
terminatng NUL character is not stored nor retrieved from the DB: you 
might see garbage at the end simply because your string is not 
NUL-terminated.


Igor Tandetnik 



-
To unsubscribe, send email to [EMAIL PROTECTED]
-