Re: Unicode and Sybase univarchar

2010-06-07 Thread Martin Evans
Alexander Foken wrote:
 On 04.06.2010 15:41, Dave Rolsky wrote:
 On Fri, 4 Jun 2010, Alexander Foken wrote:
 That's why I proposed to switch to DBD::ODBC: It is well tested and
 supports Unicode as good as the ODBC driver does.

 And as I said in private email, that's not really feasible for us.
 Yes, it would cause you a lot of work, starting by distributing an ODBC
 manager and an ODBC driver. Perhaps too much work. Think of it as a last
 resort.
 
 But for patching Unicode support into DBD::Sybase, DBD::ODBC could be
 helpful, both by showing how to patch DBD::Sybase, and by providing a
 Unicode-aware way to the database on the development machine.

 Now you just need to find someone who is willing and has the time
 to patch DBD::Sybase ... ;-)

 Which is why I'm hoping we can pay Michael to work on this. 
 Throwing money at the problem is definitly a good idea! And it would be
 great if that work would end on CPAN (and not in a secret private branch).
 Clearly, he's the most qualified. 
 
 Otherwise, I might have to do this, which is a scary thought.
 Well, I also was scared by DBD::ODBC, and I must admit that I still
 don't understand all of it. I'm still happy that MJE took over the
 stalled DBD::ODBC development and improved my patch (and many other
 parts of DBD::ODBC).
 
 A first step to patching DBD::Sybase would be like my first step with
 DBD::ODBC: Just make binding input parameters and fetching results work
 with Unicode. Don't care about Unicode in the SQL statement (use
 placeholders, so you don't need Unicode there), don't care about binding
 output parameters, connect strings, user names, passwords, private
 functions. Steal the Unicode tests from DBD::ODBC and make DBD::Sybase
 pass those tests without breaking the existing tests.
 
 
 Alexander
 

Just a little note on copying the unicode tests in DBD::ODBC. There is
nothing in particular wrong with them but they can cause some drivers
problems because they use functions on parameters and sometimes the
driver cannot (or fails) to work out the correct type of the parameter:

e.g.,

SELECT ? as roundtrip, length(?) as roundtriplen

Some drivers (MS SQL Server in particular) attempts to rearrange the
above SQL to a select on 2 columns in the database to work out the
parameter types. In the above case there are no real database columns
and it fails.

Martin
-- 
Martin J. Evans
Easysoft Limited
http://www.easysoft.com


Re: Unicode and Sybase univarchar

2010-06-05 Thread Alexander Foken

On 04.06.2010 15:41, Dave Rolsky wrote:

On Fri, 4 Jun 2010, Alexander Foken wrote:
That's why I proposed to switch to DBD::ODBC: It is well tested and 
supports Unicode as good as the ODBC driver does.


And as I said in private email, that's not really feasible for us.
Yes, it would cause you a lot of work, starting by distributing an ODBC 
manager and an ODBC driver. Perhaps too much work. Think of it as a last 
resort.


But for patching Unicode support into DBD::Sybase, DBD::ODBC could be 
helpful, both by showing how to patch DBD::Sybase, and by providing a 
Unicode-aware way to the database on the development machine.


Now you just need to find someone who is willing and has the time 
to patch DBD::Sybase ... ;-)


Which is why I'm hoping we can pay Michael to work on this. 
Throwing money at the problem is definitly a good idea! And it would be 
great if that work would end on CPAN (and not in a secret private branch).
Clearly, he's the most qualified. 



Otherwise, I might have to do this, which is a scary thought.
Well, I also was scared by DBD::ODBC, and I must admit that I still 
don't understand all of it. I'm still happy that MJE took over the 
stalled DBD::ODBC development and improved my patch (and many other 
parts of DBD::ODBC).


A first step to patching DBD::Sybase would be like my first step with 
DBD::ODBC: Just make binding input parameters and fetching results work 
with Unicode. Don't care about Unicode in the SQL statement (use 
placeholders, so you don't need Unicode there), don't care about binding 
output parameters, connect strings, user names, passwords, private 
functions. Steal the Unicode tests from DBD::ODBC and make DBD::Sybase 
pass those tests without breaking the existing tests.



Alexander

--
Alexander Foken
mailto:alexan...@foken.de  http://www.foken.de/alexander/



Re: Unicode and Sybase univarchar

2010-06-04 Thread Alexander Foken


... should just work. It doesn't quite, because the hex string is not 
just a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte 
stream written with a 16 Bit Hex Format for each byte. Each and every 
16-Bit-Word has its most significant byte set to 0.


If it was a dump of a 16 Bit Unicode encoding, it should read
0065006d00200064006100730068003a00202014
and not
0065006d00200064006100730068003a002000e200800094

Your call to decode() compensates that, probably because you encoded 
once too much before writing the data into the database.


I did not encode at all. I simply created a utf8 string in Perl land 
and inserted it into Sybase.
And Sybase / DBD::Sybase happily ignored Perl's UTF8 flag and stored the 
byte stream as if it were characters.


Really, DBD::Sybase needs to handle any character set translation, not 
the end user.


Right. (But remember that DBI was there before Unicode support was added 
to Perl, and also most DBDs are older that the Unicode support. Before 
Unicode was there, you just passed bytes around and everything just worked.)


For the same reason, but with a different DBD, I hacked the first 
Unicode patch for DBD::ODBC, just enough code to have proper Unicode 
support in bind parameters from Perl to the database, and in returned 
fetch values. DBD::ODBC has been improved since the patch was merged by 
Martin J. Evans into v1.14, and it now supports Unicode in many other 
places, too. The DBI API was also improved during that process, allowing 
to have Unicode SQL query strings.


That's why I proposed to switch to DBD::ODBC: It is well tested and 
supports Unicode as good as the ODBC driver does.


The raw patch is still available at 
http://www.alexander-foken.de/unicode-patch.txt.gz. DBD::Sybase 
obviously lacks such a patch. Sybase may have a Unicode API, but no part 
of DBD::Sybase uses it (properly). In DBD::ODBC, the patch just tests 
the UTF8 flag for all relevant data coming from Perl and converts UTF8 
to the UCS2 encoding required by the ODBC API, and converts UCS2 encoded 
data back to UTF8, settings the UTF8 flag when needed. Most of the new 
test just tests that all conversions work, even inside the database. The 
length check is very primitive, but a good indicator to test if the 
database saw bytes (length too large) or characters (length just right). 
The original patch has some minor problems in the test code with ancient 
perls that were fixed during DBD::ODBC development, see DBD::ODBC Changes.


Now you just need to find someone who is willing and has the time to 
patch DBD::Sybase ... ;-)
The new tests should work nearly unmodified with a properly patched 
DBD::Sybase.


Alexander


Re: Unicode and Sybase univarchar

2010-06-04 Thread Dave Rolsky

On Fri, 4 Jun 2010, Alexander Foken wrote:

Right. (But remember that DBI was there before Unicode support was added to 
Perl, and also most DBDs are older that the Unicode support. Before Unicode 
was there, you just passed bytes around and everything just worked.)


Only if by everything you exclude any and all operations that need to be 
character aware, such as length, substr, regexes, encoding, etc.


That's why I proposed to switch to DBD::ODBC: It is well tested and supports 
Unicode as good as the ODBC driver does.


And as I said in private email, that's not really feasible for us.

Now you just need to find someone who is willing and has the time to patch 
DBD::Sybase ... ;-)


Which is why I'm hoping we can pay Michael to work on this. Clearly, he's 
the most qualified. Otherwise, I might have to do this, which is a scary 
thought.



-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

I'm working on an i18n project, and we use Sybase (sigh).

Newer versions of Sybase have built-in support for Unicode with the 
univarchar (and other uni*) type.


However, it seems like DBD::Sybase doesn't have any support for this.

Specifically, if I take a Perl unicode string (utf8 flag is on) and insert 
it in a univarchar column, it seems to be inserted as raw bytes (or 
something).


What's really bizarre is that when I select the value back I get something 
like 0065006d00200064006100730068003a002000e200800094.


Yes, that's a literal string containing a series of 2-digit hex numbers!

I can translate this back to Perl unicode with this madness:

my $chars = do {
use bytes;

join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /()/g;
};

my $unicode = decode( 'utf8', $chars );

So the data is there, but not in a very usable form.

Has anyone researched or solved this problem?

Michael Peppler, if you're reading this, is there any work on supporting 
Perl's unicode format transparently in DBD::Sybase?


My employer might be able to pay to have this work done, if you're 
interested. Alternately, maybe you could give me some hints and I could 
try to figure it out.



-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Re: Unicode and Sybase univarchar

2010-06-03 Thread Michael Peppler
Hi,

Which version of Sybase, which version of Sybase OpenClient, and which version 
of DBD::Sybase?

Are you setting the connection charset to utf8 (in the connect() call?)

Thanks,

Michael


On Jun 3, 2010, at 5:22 PM, Dave Rolsky wrote:

 I'm working on an i18n project, and we use Sybase (sigh).
 
 Newer versions of Sybase have built-in support for Unicode with the 
 univarchar (and other uni*) type.
 
 However, it seems like DBD::Sybase doesn't have any support for this.
 
 Specifically, if I take a Perl unicode string (utf8 flag is on) and insert it 
 in a univarchar column, it seems to be inserted as raw bytes (or something).
 
 What's really bizarre is that when I select the value back I get something 
 like 0065006d00200064006100730068003a002000e200800094.
 
 Yes, that's a literal string containing a series of 2-digit hex numbers!
 
 I can translate this back to Perl unicode with this madness:
 
my $chars = do {
use bytes;
 
join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /()/g;
};
 
my $unicode = decode( 'utf8', $chars );
 
 So the data is there, but not in a very usable form.
 
 Has anyone researched or solved this problem?
 
 Michael Peppler, if you're reading this, is there any work on supporting 
 Perl's unicode format transparently in DBD::Sybase?
 
 My employer might be able to pay to have this work done, if you're 
 interested. Alternately, maybe you could give me some hints and I could try 
 to figure it out.
 
 
 -dave
 
 /*
 http://VegGuide.org   http://blog.urth.org
 Your guide to all that's veg  House Absolute(ly Pointless)
 */
 

--
Michael Peppler
Sybase on Linux FAQ: http://www.peppler.org/FAQ/linux.html

A successful [software] tool is one that was used to do something undreamed of 
by its author. -- S. C. Johnson   








Re: Unicode and Sybase univarchar

2010-06-03 Thread Michael Peppler

On Jun 3, 2010, at 7:29 PM, Michael Peppler wrote:

 Hi,
 
 Which version of Sybase, which version of Sybase OpenClient, and which 
 version of DBD::Sybase?
 
 Are you setting the connection charset to utf8 (in the connect() call?)
 


I just gave this a try - I'm under linux, with ASE 15.5. I created a table with 
a univarchar column, entered some data via isql, then wrote a minimal perl 
script to fetch the data.

If I use a UTF8 locale (i.e. LANG=en_us.UTF8) I get the correct output.
If I don't I do not get the correct output, at least for rows where non-ascii 
data has been entered into the table.

I'm using DBD::Sybase 1.10.

Michael


 
 
 On Jun 3, 2010, at 5:22 PM, Dave Rolsky wrote:
 
 I'm working on an i18n project, and we use Sybase (sigh).
 
 Newer versions of Sybase have built-in support for Unicode with the 
 univarchar (and other uni*) type.
 
 However, it seems like DBD::Sybase doesn't have any support for this.
 
 Specifically, if I take a Perl unicode string (utf8 flag is on) and insert 
 it in a univarchar column, it seems to be inserted as raw bytes (or 
 something).
 
 What's really bizarre is that when I select the value back I get something 
 like 0065006d00200064006100730068003a002000e200800094.
 
 Yes, that's a literal string containing a series of 2-digit hex numbers!
 
 I can translate this back to Perl unicode with this madness:
 
   my $chars = do {
   use bytes;
 
   join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /()/g;
   };
 
   my $unicode = decode( 'utf8', $chars );
 
 So the data is there, but not in a very usable form.
 
 Has anyone researched or solved this problem?
 
 Michael Peppler, if you're reading this, is there any work on supporting 
 Perl's unicode format transparently in DBD::Sybase?
 
 My employer might be able to pay to have this work done, if you're 
 interested. Alternately, maybe you could give me some hints and I could try 
 to figure it out.
 
 
 -dave
 
 /*
 http://VegGuide.org   http://blog.urth.org
 Your guide to all that's veg  House Absolute(ly Pointless)
 */
 
 
 --
 Michael Peppler
 Sybase on Linux FAQ: http://www.peppler.org/FAQ/linux.html
 
 A successful [software] tool is one that was used to do something undreamed 
 of by its author. -- S. C. Johnson   
 
 
 
 
 
 
 
 

--
Michael Peppler
Sybase on Linux FAQ: http://www.peppler.org/FAQ/linux.html

A successful [software] tool is one that was used to do something undreamed of 
by its author. -- S. C. Johnson   








Re: Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

On Thu, 3 Jun 2010, Michael Peppler wrote:


I just gave this a try - I'm under linux, with ASE 15.5. I created a table with 
a univarchar column, entered some data via isql, then wrote a minimal perl 
script to fetch the data.

If I use a UTF8 locale (i.e. LANG=en_us.UTF8) I get the correct output.
If I don't I do not get the correct output, at least for rows where non-ascii 
data has been entered into the table.


Define correct output here. From looking at the DBD::Sybase code, I don't 
see how it could possibly be right, because there's nothing in there to 
set the utf8 flag on the Perl string when the data is retrieved.


So even if I can work around the bizarro bytes as a string issue, I 
still won't have utf8 after the round trip.


My test script did something like this:

my $dbh = ...; # set charset to utf8

round_trip(em dash: \x{2014});

sub round_trip {
my $unicode = shift;

$dbh-do('DELETE FROM unicode');

check($unicode);

$dbh-do( 'INSERT INTO unicode (utest) VALUES (?)', {}, $unicode );

my $rows = $dbh-selectall_arrayref('SELECT * FROM unicode');

my $fromdb = $rows-[0][0];

check($fromdb);

my $chars = do {
use bytes;

join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /()/g;
};

check($chars);

my $decoded = decode( 'utf-8', $chars );

check($decoded);
}

sub check {
my $string = shift;

print $string is utf8? ,
( Encode::is_utf8($string) ? 'yes' : 'no' ),
\n;
}

__END__

CREATE TABLE unicode (
  utest  univarchar(250)  NOT NULL,
);


-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Re: Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

On Thu, 3 Jun 2010, Michael Peppler wrote:


Which version of Sybase, which version of Sybase OpenClient, and which version 
of DBD::Sybase?


Ah, I was using the old libs (11.0), which may have been the problem. I 
was also using DBD::Sybase 1.07.


I switch to Sybase 15.0 (OCS 15.0 if that makes sense), OpenClient 15.0 
libs, and DBD::Sybase 1.10.


Now it's closer to working. If I set charset=utf8 in the dsn, I get

 2010/06/03 14:08:11 unicode CRITICAL: FATAL: DBD::Sybase::db do failed: Server 
message number=2402 severity=16 state=1 line=1 server=HDATADEV1 text=Error 
converting characters into server's character set. Some character(s) could not 
be converted.

I'm not sure what that means.

If I _don't_ set that, the data goes in and comes out as bytes, rather 
than the bizarro hex string. However, the data does have the utf8 flag set 
when it comes back from Sybase, so I have to run it through 
Encode::decode.


I really don't think I can realistically tell the bazillion developers 
here just run all the data through Encode.


I'd really like see an end-to-end solution.

Also, it's not clear to me that the data is actually being stored as 
characters at the Sybase level. I'm not even sure how I'd figure this out. 
When I do a select from sqsh, I see the wacky hex string, but I can't tell 
if that's Sybase trying to present data to me in a format it thinks my 
environment can handle.


I did try setting LC_ALL=us_english.utf8 when running sqsh, but that 
didn't make a difference.


Basically, what I need is to be able to take Perl native unicode strings, 
store them in Sybase in Sybase's native format (utf16, I believe), and 
then retrieve them as Perl native unicode strings again.



-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Re: Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

On Thu, 3 Jun 2010, Dave Rolsky wrote:

If I _don't_ set that, the data goes in and comes out as bytes, rather than 
the bizarro hex string. However, the data does have the utf8 flag set when it 
comes back from Sybase, so I have to run it through Encode::decode.


Doh, the data _does not_ have the utf8 flag set. That's what I meant to 
say.



-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Re: Unicode and Sybase univarchar

2010-06-03 Thread Michael Peppler

On Jun 3, 2010, at 8:14 PM, Dave Rolsky wrote:

 On Thu, 3 Jun 2010, Michael Peppler wrote:
 
 Which version of Sybase, which version of Sybase OpenClient, and which 
 version of DBD::Sybase?
 
 Ah, I was using the old libs (11.0), which may have been the problem. I was 
 also using DBD::Sybase 1.07.
 
 I switch to Sybase 15.0 (OCS 15.0 if that makes sense), OpenClient 15.0 libs, 
 and DBD::Sybase 1.10.
 
 Now it's closer to working. If I set charset=utf8 in the dsn, I get
 
 2010/06/03 14:08:11 unicode CRITICAL: FATAL: DBD::Sybase::db do failed: 
 Server message number=2402 severity=16 state=1 line=1 server=HDATADEV1 
 text=Error converting characters into server's character set. Some 
 character(s) could not be converted.
 
 I'm not sure what that means.

Hmmm - is that on a query, or on an insert operation?


 If I _don't_ set that, the data goes in and comes out as bytes, rather than 
 the bizarro hex string. However, the data does have the utf8 flag set when it 
 comes back from Sybase, so I have to run it through Encode::decode.
 
 I really don't think I can realistically tell the bazillion developers here 
 just run all the data through Encode.
 
 I'd really like see an end-to-end solution.

I agree - I've just not had much opportunity (or requests) to ensure that this 
works 100%.

Ideally if you could send me some sample code, and a simple table structure and 
data that reproduces the problem for you I could try to look at it and see if I 
can fix it.


 Also, it's not clear to me that the data is actually being stored as 
 characters at the Sybase level. I'm not even sure how I'd figure this out. 
 When I do a select from sqsh, I see the wacky hex string, but I can't tell if 
 that's Sybase trying to present data to me in a format it thinks my 
 environment can handle.

When in doubt - use the Sybase tools (i.e. isql, and use -Jutf8 to force 
conversion to/from utf8 when reading/writing the data).

You can also do a convert(varbinary, the_univarchar_col) to see the hex 
representation of the data in the database.

 
 Basically, what I need is to be able to take Perl native unicode strings, 
 store them in Sybase in Sybase's native format (utf16, I believe), and then 
 retrieve them as Perl native unicode strings again.

If you store data in univarchar, etc columns, then it's utf16. You can also set 
up the dataserver to use utf8 throughout - but that's something you'd have to 
discuss with your DBAs.

Michael
--
Michael Peppler
Sybase on Linux FAQ: http://www.peppler.org/FAQ/linux.html

A successful [software] tool is one that was used to do something undreamed of 
by its author. -- S. C. Johnson   








Re: Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

On Thu, 3 Jun 2010, Michael Peppler wrote:


2010/06/03 14:08:11 unicode CRITICAL: FATAL: DBD::Sybase::db do failed: Server 
message number=2402 severity=16 state=1 line=1 server=HDATADEV1 text=Error 
converting characters into server's character set. Some character(s) could not 
be converted.

I'm not sure what that means.


Hmmm - is that on a query, or on an insert operation?


That was from an insert.


If I _don't_ set that, the data goes in and comes out as bytes, rather than the 
bizarro hex string. However, the data does have the utf8 flag set when it comes 
back from Sybase, so I have to run it through Encode::decode.

I really don't think I can realistically tell the bazillion developers here just 
run all the data through Encode.

I'd really like see an end-to-end solution.


I agree - I've just not had much opportunity (or requests) to ensure that this 
works 100%.

Ideally if you could send me some sample code, and a simple table 
structure and data that reproduces the problem for you I could try to 
look at it and see if I can fix it.


See my previous email. I included some code, and there was a table 
definition after the __END__ marker.


As I said, my employer might be willing to pay to have this done. If 
that's something you're interested, let's talk off list and I can try to 
help coordinate that.


Also, it's not clear to me that the data is actually being stored as 
characters at the Sybase level. I'm not even sure how I'd figure this 
out. When I do a select from sqsh, I see the wacky hex string, but I 
can't tell if that's Sybase trying to present data to me in a format it 
thinks my environment can handle.


When in doubt - use the Sybase tools (i.e. isql, and use -Jutf8 to force 
conversion to/from utf8 when reading/writing the data).


Unfortunately, from the terminal on my work dev machine, this gives me the 
hex string. I think something is being helpful here ;)


I fiddled with -Jutf8 and setting LC_ALL, to no avail.


-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/


Re: Unicode and Sybase univarchar

2010-06-03 Thread Alexander Foken

Sorry, forgot to CC to the list ...

 Original Message 
Subject:Re: Unicode and Sybase univarchar
Date:   Thu, 03 Jun 2010 22:20:37 +0200
From:   Alexander Foken alexan...@foken.de
To: Dave Rolsky auta...@urth.org
References: alpine.deb.0..1006031018000.9...@urth.org



On 03.06.2010 17:22, Dave Rolsky wrote:


What's really bizarre is that when I select the value back I get 
something like 0065006d00200064006100730068003a002000e200800094.


Yes, that's a literal string containing a series of 2-digit hex numbers!

I can translate this back to Perl unicode with this madness:

my $chars = do {
use bytes;

join q{}, map { chr( eval '0x' . $_ ) } $fromdb =~ /()/g;
};

my $unicode = decode( 'utf8', $chars );

Really strange way to avoid pack()/unpack(). At least, you can get rid 
of the evil string-eval, simply use hex($_).  The combination of use 
bytes, chr() with an argument larger than 0x00FF, and decode() also 
looks very strange. $fromdb contains only hex digits, so there should be 
no need to use bytes. Unless you force byte mode, chr() should already 
return perfect Unicode, flagged as such. So ...


my $unicode=join('',map { chr hex $_ } $fromdb=~/([0-9a-fA-F]{4})/g);

... should just work. It doesn't quite, because the hex string is not 
just a dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte stream 
written with a 16 Bit Hex Format for each byte. Each and every 
16-Bit-Word has its most significant byte set to 0.


If it was a dump of a 16 Bit Unicode encoding, it should read
0065006d00200064006100730068003a00202014
and not
0065006d00200064006100730068003a002000e200800094

Your call to decode() compensates that, probably because you encoded 
once too much before writing the data into the database.


Appart from that, have a look at the tests in DBD::Oracle, there are a 
few tests for Unicode round trips in 40UnicodeRoundTrip.t and 
41Unicode.t, try to run them on DBD::Sybase.


Also consider using DBD::ODBC, a Unicode capable ODBC manager (the 
one(s) on Windows is/are fine) and an Unicode capable ODBC driver for 
Sybase. It may cost you a few CPU cycles for the extra layers, but 
DBD::ODBC supports Unicode quite well (on non-Windows, you need to 
explicitly enable Unicode).


Alexander

--

Alexander Foken
mailto:alexan...@foken.de  http://www.foken.de/alexander/



Re: Unicode and Sybase univarchar

2010-06-03 Thread Dave Rolsky

On Thu, 3 Jun 2010, Alexander Foken wrote:


Really strange way to avoid pack()/unpack(). At least, you can get rid of the


Yes, yes, this is hack code, not production.

... should just work. It doesn't quite, because the hex string is not just a 
dump of a 16 Bit Unicode encoding, but it is a UTF-8 byte stream written with 
a 16 Bit Hex Format for each byte. Each and every 16-Bit-Word has its most 
significant byte set to 0.


If it was a dump of a 16 Bit Unicode encoding, it should read
0065006d00200064006100730068003a00202014
and not
0065006d00200064006100730068003a002000e200800094

Your call to decode() compensates that, probably because you encoded once too 
much before writing the data into the database.


I did not encode at all. I simply created a utf8 string in Perl land and 
inserted it into Sybase.


Really, DBD::Sybase needs to handle any character set translation, not the 
end user.



-dave

/*
http://VegGuide.org   http://blog.urth.org
Your guide to all that's veg  House Absolute(ly Pointless)
*/