Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-10 Thread ChaosKCW

Roger Binns wrote:


 No.  APSW converts it *to* Unicode.  SQLite only accepts Unicode
 so a Unicode string has to be supplied.  If you supply a non-Unicode
 string then conversion has to happen.  APSW asks Python to
 supply the string in Unicode.  If Python can't do that (eg
 it doesn't know the encoding) then you get an error.

If what you say is true, I have to ask why I get a converstion error
which states it cant convert to ASCII, not it cant convert to UNICODE?


  Ok if SQLite uses unicode internally why do you need to ignore
  everything greater than 127,

 I never said that.  I said that a special case is made so that
 if the string you supply only contains ASCII characters (ie =127)
 then the ASCII string is converted to Unicode.  (In fact it is
 valid UTF-8 hence the shortcut).

  the ascii table (256 bit one) fits into
  unicode just fine as far as I recall?

 No, ASCII characters have defined Unicode codepoints.  The ASCII
 character number just happens to be the same as the Unicode
 codepoints.  But there are only 127 ASCII characters.

  Or did I miss the boat here ?

 For bytes greater than 127, what character set is used?  There
 are hundreds of character sets that define those characters.
 You have to tell the computer which one to use.  See the Unicode
 article referenced above.

Yes I know there are a million extended ASCII charaters sets, which
happen to the bane of all existence. Most computers deal in bytes
nativly and the 7 bit coding still causes problems to this day. But
since the error I get is a converstion error to ASCII, not from ASCII,
I am willing to accept loss of information. You cant code unicode into
ascii without loss of information or two charcater codes. In my mind,
somewhere inside the cursor.execute function, it converts to ascii. I
say this because of the error msg recieved.  So I am missing how a
function which supposedly converts evereythin to unicode lands up doing
an ascii converstion ?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-10 Thread ChaosKCW

 There's an Oracle environment variable that appears to make a
 difference: NLS_CHARSET, perhaps - it's been a while since I've had to
 deal with Oracle, and I'm not looking for another adventure into
 Oracle's hideous documentation to find out.


That is an EVIL setting which should not be used. The NLS_CHARSET
environment variable causes so many headaches its not worth playing
with it at all.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-10 Thread Serge Orlov

ChaosKCW wrote:
 Roger Binns wrote:

 
  No.  APSW converts it *to* Unicode.  SQLite only accepts Unicode
  so a Unicode string has to be supplied.  If you supply a non-Unicode
  string then conversion has to happen.  APSW asks Python to
  supply the string in Unicode.  If Python can't do that (eg
  it doesn't know the encoding) then you get an error.

 If what you say is true, I have to ask why I get a converstion error
 which states it cant convert to ASCII, not it cant convert to UNICODE?

You do get error about convertion to unicode. Quote from you message:

SQLiteCur.execute(sql, row)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: 
ordinal not in range(128)

Notice the name of the error: UnicodeDecode or in other words
ToUnicode.


  So I am missing how a
 function which supposedly converts evereythin to unicode lands up doing
 an ascii converstion ?

When python tries to concatenate a byte string and a unicode string, it
assumes that the byte string is encoded ascii and tries to convert from
encoded ascii to unicode. It calls ascii decoder to do the decoding. If
decoding fails you see message from ascii decoder about the error.

   Serge

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-10 Thread Paul Boddie
ChaosKCW wrote:
 
  There's an Oracle environment variable that appears to make a
  difference: NLS_CHARSET, perhaps - it's been a while since I've had to
  deal with Oracle, and I'm not looking for another adventure into
  Oracle's hideous documentation to find out.

 That is an EVIL setting which should not be used. The NLS_CHARSET
 environment variable causes so many headaches its not worth playing
 with it at all.

Well, at this very point in time I don't remember the preferred way of
getting Oracle, the client libraries and the database adapter to agree
on the character encoding used for communicating data between
applications and the database system. Nevertheless, what you need to do
is to make sure that you know which encoding is used so that if you
either get plain strings (ie. not Unicode objects) out of the database,
or if you need to write plain strings to the database, you can provide
the encoding to the unicode built-in function or to the decode/encode
methods; this is much better than just stripping out characters that
can't be represented by ASCII.

Anyway, despite my objections to digging through Oracle documentation,
I found the following useful documents: the Globalization Support
index [1], an FAQ about NLS_LANG [2], and a white paper about Unicode
support in Oracle [3]. It may well be the case that NLS_LANG might help
you do what you want, but since the database systems I have installed
(PostgreSQL, sqlite3) seem to do Unicode without such horsing around,
I'm not really able to offer much more advice on this subject.

Paul

[1]
http://www.oracle.com/technology/tech/globalization/index.html
[2]
http://www.oracle.com/technology/tech/globalization/htdocs/nls_lang
faq.htm
[3]
http://www.oracle.com/technology/tech/globalization/pdf/TWP_AppDev_Unicode_10gR2.pdf

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-10 Thread ChaosKCW


 When python tries to concatenate a byte string and a unicode string, it
 assumes that the byte string is encoded ascii and tries to convert from
 encoded ascii to unicode. It calls ascii decoder to do the decoding. If
 decoding fails you see message from ascii decoder about the error.

Serge

Ok I get it now. Sorry for the slowness. I have to say as a lover of
python for its simplicity and clarity, the charatcer set thing has been
harder than I would have liked to figure out. 

Thanks for all the help.

-- 
http://mail.python.org/mailman/listinfo/python-list


Help on exceptions (was: Convertion of Unicode to ASCII NIGHTMARE)

2006-04-10 Thread Serge Orlov
ChaosKCW wrote:

 Ok I get it now. Sorry for the slowness. I have to say as a lover of
 python for its simplicity and clarity, the charatcer set thing has been
 harder than I would have liked to figure out.

I think there is a room for improvement here. In my opinion the message
is too confusing for newbies. It would be easier for them if there is a
mini tutorial available about what's going on with links to other more
broad tutorials (like unicode tutorial). Instead of generic error
--
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal not in range(128)
--

It can be like this. Notice special url, it is pointing to a
non-existing :) tutorial about why concatenating byte strings with
unicode strings can produce UnicodeDecodeError
--
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa5 in position 0:
ordinal
not in range(128)
For additional information about this exception see:
  http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str
--

Here is the sample code how it can be done:

---
extended_help = {
(concat, UnicodeDecodeError, str, unicode):
http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str;,
(concat, UnicodeDecodeError, unicode, str):
http://docs.python.org/2.4/exceptions/concat+UnicodeDecodeError+str;
}

def get_more_help(error, key):
if not extended_help.has_key(key):
return
error.reason += \nFor additional information about this exception
see:\n  
error.reason += extended_help[key]


def concat(s1,s2):
try:
return s1 + s2
except Exception, e:
key = concat, e.__class__, type(s1), type(s2)
get_more_help(e, key)
raise

concat(chr(0xA5),unichr(0x5432))

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-06 Thread Serge Orlov

Roger Binns wrote:
 Serge Orlov [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
  I have an impression that handling/production of byte order marks is
  pretty clear: they are produced/consumed only by two codecs: utf-16 and
  utf-8-sig. What is not clear?

 Are you talking about the C APIs in Python/SQLite (that is what I
 have been discussing) or the language level?

Both. Documentation for PyUnicode_DecodeUTF16 and PyUnicode_EncodeUTF16
is pretty clear when BOM is produced/removed. The only problem is that
you have to find out host endianess yourself. In python it's
sys.byteorder, in C you use hack like

unsigned long one = 1;
endianess = (*(char *) one) == 0) ? 1 : -1;

And then pass endianess to PyUnicode_(De/En)codeUTF16. So I still don't
see what is unclear about BOM production/handling.



 At the C level, SQLite doesn't accept boms.

It would be surprising if it did. Quote from
http://www.unicode.org/faq/utf_bom.html: Where the data is typed,
such as a field in a database, a BOM is unnecessary

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-05 Thread Roger Binns

Fredrik Lundh [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 Roger Binns wrote:

 SQLite only accepts Unicode so a Unicode string has to be supplied.

 fact or FUD?  let's see:

Note I said SQLite.  For APIs that take/give strings, you can either
supply/get a UTF-8 encoded sequence of bytes, or two bytes per character
host byte order sequence.  Any wrapper of SQLite that doesn't do
Unicode in/out is seriously breaking things.

I ended up using the UTF-8 versions of the API as Python can't quite
make its mind up how to represent Unicode strings at the C api level.
You can have two bytes per char or four, and the handling/production
of byte order markers isn't that clear either.

import pysqlite2.dbapi2 as DB

pysqlite had several unicode problems in the past.  It has since
been cleaned up as you saw.

Roger 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-05 Thread Fredrik Lundh
Roger Binns wrote:

  fact or FUD?  let's see:

 Note I said SQLite.  For APIs that take/give strings, you can either
 supply/get a UTF-8 encoded sequence of bytes, or two bytes per character
 host byte order sequence.  Any wrapper of SQLite that doesn't do
 Unicode in/out is seriously breaking things.

 I ended up using the UTF-8 versions of the API as Python can't quite
 make its mind up how to represent Unicode strings at the C api level.
 You can have two bytes per char or four, and the handling/production
 of byte order markers isn't that clear either.

sounds like your understanding of Unicode and Python's Unicode system
is a bit unclear.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-05 Thread Roger Binns

Fredrik Lundh [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 sounds like your understanding of Unicode and Python's Unicode system
 is a bit unclear.

Err, no.  Relaying unicode data between two disparate
C APIs requires being careful and thorough.  That means
paying attention to when conversions happen, byte ordering
(compile time) and boms (run time) and when the API
documentation isn't thorough, verifying the behaviour
yourself.  That requires a very clear understanding of
Unicode in order to do the requisite test cases, as well
as reading what the code does.

Roger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-05 Thread Serge Orlov
Roger Binns wrote:
 Fredrik Lundh [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
  Roger Binns wrote:
 
  SQLite only accepts Unicode so a Unicode string has to be supplied.
 
  fact or FUD?  let's see:

 Note I said SQLite.  For APIs that take/give strings, you can either
 supply/get a UTF-8 encoded sequence of bytes, or two bytes per character
 host byte order sequence.  Any wrapper of SQLite that doesn't do
 Unicode in/out is seriously breaking things.

 I ended up using the UTF-8 versions of the API as Python can't quite
 make its mind up how to represent Unicode strings at the C api level.
 You can have two bytes per char or four, and the handling/production
 of byte order markers isn't that clear either.

I have an impression that handling/production of byte order marks is
pretty clear: they are produced/consumed only by two codecs: utf-16 and
utf-8-sig. What is not clear?

  Serge

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-05 Thread Roger Binns

Serge Orlov [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 I have an impression that handling/production of byte order marks is
 pretty clear: they are produced/consumed only by two codecs: utf-16 and
 utf-8-sig. What is not clear?

Are you talking about the C APIs in Python/SQLite (that is what I
have been discussing) or the language level?

At the C level, SQLite doesn't accept boms.  You have to
provide UTF-8 or host byte order two bytes per char
UTF-16.

Roger 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread Roger Binns
Paul Boddie [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 It looks like you may have Unicode objects that you're presenting to
 sqlite. In any case, with earlier versions of pysqlite that I've used,
 you need to connect with a special unicode_results parameter,

He is using apsw.  apsw correctly handles unicode.  In fact it won't
accept a str with bytes 127 as they will be an unknown encoding and
SQLite only uses Unicode internally.  It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.

Roger 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread ChaosKCW
Hi

Thanks for all the posts. I am still digesting it all but here are my
initial comments.

Don't. You can't. Those characters don't exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.
http://www.sqlite.org/version3.html

As mentioned by the next poster, there is, its supposed to be encode
with the 'ignore' option. Thus you lose data, but thats just dandy with
me. As for SQLite supporting unicode, it probably does, but something
on the python side (probabyl in apsw) converts it to ascii at some
point before its handed to SQLite.

The .encode() method returns a new value; it does not change an object inplace.
  sql = sql.encode('utf-8')

Ah yes, big bistake on my part :-/

He is using apsw.  apsw correctly handles unicode.  In fact it won't
accept a str with bytes 127 as they will be an unknown encoding and
SQLite only uses Unicode internally.  It does have a blob type
using buffer for situations where binary data needs to be stored.
pysqlite's mishandling of Unicode is one of the things that drove
me to writing apsw in the first place.

Ok if SQLite uses unicode internally why do you need to ignore
everything greater than 127, the ascii table (256 bit one) fits into
unicode just fine as far as I recall? Or did I miss the boat here ?

Thanks,

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread John Machin
Thus you lose data, but thats just dandy with
me.: Please reconsider this attitude, before you perpetrate a nonsense
or even a disaster.

Wrt your last para:
1. Roger didn't say ignore -- he said won't accept (a major
difference).
2. The ASCII code comprises 128 characters, *NOT* 256.
3. What Roger means is: given a Python 8-bit string and no other
information, you don't have a clue what the encoding is. Most codes of
interest these days have the ASCII code (or a mild perversion thereof)
in the first 128 positions, but it's anyones guess what the writer of
the string had in mind with the next 128.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread Robert Kern
Roger Binns wrote:
 Paul Boddie [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 
It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter,
 
 He is using apsw.  apsw correctly handles unicode.  In fact it won't
 accept a str with bytes 127 as they will be an unknown encoding and
 SQLite only uses Unicode internally.  It does have a blob type
 using buffer for situations where binary data needs to be stored.
 pysqlite's mishandling of Unicode is one of the things that drove
 me to writing apsw in the first place.

Ah, I misread the OP's traceback.

Okay, the OP is getting regular strings, which are probably encoded in
ISO-8859-1 if I had to guess, from the Oracle DB. He is trying to pass them in
to SQLiteCur.execute() which tries to make a unicode string from the input:

In [1]: unicode('\xdc')
---
exceptions.UnicodeDecodeErrorTraceback (most recent call
last)

/Users/kern/ipython console

UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 0: ordinal
not in range(128)

*Now*, my advice to the OP is to figure out the encoding of the strings that are
being returned from Oracle. As I said, ISO-8859-1 is probably a good guess.
Then, he would *decode* the string to a unicode string using the encoding. E.g.:

  row = row.decode('iso-8859-1')

Then everything should be peachy. I hope.

-- 
Robert Kern
[EMAIL PROTECTED]

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread Paul Boddie
Robert Kern wrote:
 Roger Binns wrote:
  Paul Boddie [EMAIL PROTECTED] wrote
 It looks like you may have Unicode objects that you're presenting to
 sqlite. In any case, with earlier versions of pysqlite that I've used,
 you need to connect with a special unicode_results parameter,

Note that I've since mentioned client_encoding which seems to matter
for pysqlite 1.x.

  He is using apsw.  apsw correctly handles unicode.  In fact it won't
  accept a str with bytes 127 as they will be an unknown encoding and
  SQLite only uses Unicode internally.  It does have a blob type
  using buffer for situations where binary data needs to be stored.
  pysqlite's mishandling of Unicode is one of the things that drove
  me to writing apsw in the first place.

For pysqlite 2.x, it appears that Unicode objects can be handed
straight to the API methods, and I'd be interested to hear about your
problems with pysqlite, Unicode and what actually made you write apsw
instead.

 Ah, I misread the OP's traceback.

 Okay, the OP is getting regular strings, which are probably encoded in
 ISO-8859-1 if I had to guess, from the Oracle DB. He is trying to pass them in
 to SQLiteCur.execute() which tries to make a unicode string from the input:

[...]

There's an Oracle environment variable that appears to make a
difference: NLS_CHARSET, perhaps - it's been a while since I've had to
deal with Oracle, and I'm not looking for another adventure into
Oracle's hideous documentation to find out.

 *Now*, my advice to the OP is to figure out the encoding of the strings that 
 are
 being returned from Oracle. As I said, ISO-8859-1 is probably a good guess.
 Then, he would *decode* the string to a unicode string using the encoding. 
 E.g.:

   row = row.decode('iso-8859-1')

 Then everything should be peachy. I hope.

Yes, just find out what Oracle wants first, then set it all up, noting
that without looking into the Oracle wrapper being used, I can't
suggest an easier way.

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread Roger Binns

ChaosKCW [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]
 me. As for SQLite supporting unicode, it probably does,

No, SQLite *ONLY* supports Unicode.  It will *only* accept
strings in Unicode and only produces strings in Unicode.
All the functionality built into SQLite such as comparison
operators operate only on Unicode strings.

 but something
 on the python side (probabyl in apsw) converts it to ascii at some
 point before its handed to SQLite.

No.  APSW converts it *to* Unicode.  SQLite only accepts Unicode
so a Unicode string has to be supplied.  If you supply a non-Unicode
string then conversion has to happen.  APSW asks Python to
supply the string in Unicode.  If Python can't do that (eg
it doesn't know the encoding) then you get an error.

I strongly recommend reading this:

  The Absolute Minimum Every Software Developer Absolutely,
  Positively Must Know About Unicode and Character Sets

  http://www.joelonsoftware.com/articles/Unicode.html

 Ok if SQLite uses unicode internally why do you need to ignore
 everything greater than 127,

I never said that.  I said that a special case is made so that
if the string you supply only contains ASCII characters (ie =127)
then the ASCII string is converted to Unicode.  (In fact it is
valid UTF-8 hence the shortcut).

 the ascii table (256 bit one) fits into
 unicode just fine as far as I recall?

No, ASCII characters have defined Unicode codepoints.  The ASCII
character number just happens to be the same as the Unicode
codepoints.  But there are only 127 ASCII characters.

 Or did I miss the boat here ?

For bytes greater than 127, what character set is used?  There
are hundreds of character sets that define those characters.
You have to tell the computer which one to use.  See the Unicode
article referenced above.

Roger 


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-04 Thread Fredrik Lundh
Roger Binns wrote:

 SQLite only accepts Unicode so a Unicode string has to be supplied.

fact or FUD?  let's see:

import pysqlite2.dbapi2 as DB

db = DB.connect(test.db)

cur = db.cursor()

cur.execute(create table if not exists test (col text))
cur.execute(insert into test values (?), [this is an ascii string])
cur.execute(insert into test values (?), [uthis is a unicode string])
cur.execute(insert into test values (?), [uthïs ïs ö unicöde strïng])

cur.execute(select * from test)

for row in cur.fetchall():
print row

prints

(u'this is an ascii string',)
(u'this is a unicode string',)
(u'th\xefs \xefs \xf6 unic\xf6de str\xefng',)

which is correct behaviour under Python's Unicode model.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list

Convertion of Unicode to ASCII NIGHTMARE

2006-04-03 Thread ChaosKCW
Hi

I am reading from an oracle database using cx_Oracle. I am writing to a
SQLite database using apsw.

The oracle database is returning utf-8 characters for euopean item
names, ie special charcaters from an ASCII perspective.

I get the following error:
SQLiteCur.execute(sql, row)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: 
ordinal not in range(128)

I have googled for serval days now and still cant get it to encode to
ascii.

I encode the SQL as follows:

sql = insert into %s values %s % (SQLiteTable, paramstr)
sql.encode('ascii', 'ignore')

I then code each of the row values returned from Oracle like this:

 row = map(encodestr, row)
 SQLiteCur.execute(sql, row)

where encodestr is as follows:

def encodestr(item):
if isinstance(item, types.StringTypes):
   return unicodedata.normalize('NFKD', unicode(item, 'utf-8',
'ignore')).encode('ASCII', 'ignore')
else:
   return item

I have tried a thousand of similiar functions to the above,
permitations of the above from various google searches. But I still get
the above exception on the line:

 SQLiteCur.execute(sql, row)

and the exception is reslated to the data in one field.

Int the end I resorted to using oracles convert function in the SQL
statement but would like to  understand why this is happening and why
its so hard to convert the string in python. I have read many
complaints about this from other people some of whom have written
custom stripping routines. I havent tried a custom routine yet, cause I
think it should be possilble in python.

Thanks,

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-03 Thread Paul Boddie
ChaosKCW wrote:
 Hi

 I am reading from an oracle database using cx_Oracle. I am writing to a
 SQLite database using apsw.

 The oracle database is returning utf-8 characters for euopean item
 names, ie special charcaters from an ASCII perspective.

And does cx_Oracle return those as Unicode objects or as plain strings
containing UTF-8 byte sequences? It's very important to distinguish
between these two cases, and I don't have any experience with cx_Oracle
to be able to give advice here.

 I get the following error:
 SQLiteCur.execute(sql, row)
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: 
 ordinal not
  in range(128)

It looks like you may have Unicode objects that you're presenting to
sqlite. In any case, with earlier versions of pysqlite that I've used,
you need to connect with a special unicode_results parameter, although
later versions should work with Unicode objects without special
configuration. See here for a thread (in which I seem to have
participated, coincidentally):

http://mail.python.org/pipermail/python-list/2002-June/107526.html

 I have googled for serval days now and still cant get it to encode to
 ascii.

This is a tough thing to find out - whilst previous searches did
uncover some discussions about it, I just tried and failed to find the
enlightening documents - and I certainly didn't see many references to
it on the official pysqlite site.

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-03 Thread Robert Kern
ChaosKCW wrote:
 Hi
 
 I am reading from an oracle database using cx_Oracle. I am writing to a
 SQLite database using apsw.
 
 The oracle database is returning utf-8 characters for euopean item
 names, ie special charcaters from an ASCII perspective.

I'm not sure that you are using those terms correctly. From your description
below, it seems that your data is being returned from the Oracle database as
unicode strings rather than regular strings containing UTF-8 encoded data. These
European characters are not special characters from an ASCII perspective; they
simply aren't characters in the ASCII character set at all.

 I get the following error:
 
   SQLiteCur.execute(sql, row)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xdc in position 12: 
ordinal not in range(128)
 
 I have googled for serval days now and still cant get it to encode to
 ascii.

Don't. You can't. Those characters don't exist in the ASCII character set.
SQLite 3.0 deals with UTF-8 encoded SQL statements, though.

http://www.sqlite.org/version3.html

 I encode the SQL as follows:
 
 sql = insert into %s values %s % (SQLiteTable, paramstr)
 sql.encode('ascii', 'ignore')

The .encode() method returns a new value; it does not change an object inplace.

  sql = sql.encode('utf-8')

-- 
Robert Kern
[EMAIL PROTECTED]

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-03 Thread Diez B. Roggisch
 Don't. You can't. Those characters don't exist in the ASCII character set.
 SQLite 3.0 deals with UTF-8 encoded SQL statements, though.

That is not entirely correct - one can, if losing information is ok. The OPs
code that normalized UTF-8 to NFKD, an umlaut like ä is transformed to a
two-character-sequence basically saying a with two dots on top. With
'ignore' specified as parameter to the encoder, this should be result in
the letter a.


Regards,

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Convertion of Unicode to ASCII NIGHTMARE

2006-04-03 Thread Paul Boddie
Oh, and it occurs to me, as I seem to have mentioned a document about
PgSQL rather than pysqlite (although they both have the same principal
developer), that you might need to investigate the client_encoding
parameter when setting up your connection. The following message gives
some information (but not much):

http://groups.google.com/group/comp.lang.python/msg/f27fa9866c9b7b5f

Sadly, I can't find the information about getting result values as
Unicode objects, but I believe it involves some kind of SQL comment
that you send to the database system which actually tells pysqlite to
change its behaviour.

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list