Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
>output_handler=mb_output_handler

This helped me to fix any output to the browser properly, so I don't need to do 
any utf8_decode() any more, thanks.

> Setting it to "7" won't let me even echo something else.

Right, it's strange, but true... :-(

> mbstring.detect_order = UTF-8,eucjp-win,sjis-win

That solved the problem that mb_detect_encoding() was resulting with ASCII, now 
its saying "UTF-8", BUT only when running the script on console, with browser 
it tells me still ASCII, well not important.

But still the comparison test is "not equal", so the ut8_decode() is still 
needed when data comes from database, it's the same result in browser and on 
console (even it shows UTF-8 as detected).

>   The other thing to be wary of, is output to the console. Some OSes do
> not support unicode in the console.  So unless you're certain yours does,
> I wouldn't use it as a test.

I know, that's why I use the comparison test ;-)

Niel wrote:
> Hi
> 
> You still haven't answered whether you're using any output handler, and
> if so which one.  I use
> 
>output_handler=mb_output_handler
> 
>> I overloaded the mbstring variables with:
>> mbstring.func_overload = 6
>> Setting it to "7" won't let me even echo something else.
> 
> Very strange, the only additional function overloaded is mail() and that
> shouldn't stop you using echo.
> 
> As well as setting the internal encoding and enabling it with
> mbstring.encoding_translation = On
>mbstring.internal_encoding = UTF-8
> 
> I would also use:
> mbstring.language = English 
> ; or German in your case
> mbstring.detect_order = UTF-8,eucjp-win,sjis-win
> mbstring.http_input = UTF-8,SJIS,EUC-JP
> mbstring.http_output = UTF-8
> 
>> Is it possible for mbstring to overload the pg-functions I need?
>  No, and it shouldn't be needed. Those functions should be UTF-8 enabled
> in order to communicate with the database and supply the correct data
> 
>   You're still referring to 'UTF8' which as I pointed out isn't the
> official name of the encoding system.  I have no idea if PHP will
> recognise  it, but to be safe I suggest you use the official 'UTF-8'
> (hyphen between letters and number) in case it's causing problems.
>   The other thing to be wary of, is output to the console. Some OSes do
> not support unicode in the console.  So unless you're certain yours does,
> I wouldn't use it as a test.
> 
> --
> Niel Archer

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
thx a lot - what you're writing is really necessary to handle this problems in 
the future.

The reason why I was looking for a faster solution is when you have to handle 
huge data which is utf8, and sometimes not utf8... etc you understand what 
I mean? ;-)


Bruno Lustosa wrote:
> On 7/21/07, aldnin <[EMAIL PROTECTED]> wrote:
>> When I try to send this query (select 'lacarrière' as test;) to a UTF8
>> initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this error:
>>
>> ERROR:  invalid byte sequence for encoding "UTF8": 0xe87265
> 
> Short answer: start using utf-8 for just everything, and your problems
> will be gone.
> 
> Long explanation:
> This is usually the case when you get data from a form and put it in
> the database, and the two aren't using the same encoding.
> I guess your pg connection is using unicode (so the db expects unicode
> input), and your html is set to something else. To fix this, you have
> two choices:
> 
> 1-Run utf8_encode() on the input from your forms; or
> 2-Set all your html pages to use utf-8 encoding.
> 
> IMHO, option 2 is the way to go. I've been using utf-8 for everything
> for quite some time, and has solved all my problems dealing with
> accents, and so on.
> You will need:
> - All your HTML files encoded to utf-8 (quite easy with iconv, if you
> are using Linux);
> - Add a "Content-type: text/html; charset=utf-8" to all your pages.
> This is easily done using PHP's header() function in a file included
> by all your scripts.
> 
> This way, the pages will be unicode, any data entered will be posted
> as unicode, and you will have no problems sending them to a database
> that uses unicode.
> Forget the  tag that sets the encoding. It's only used in case
> the server doesn't send a Content-type header, which isn't the case
> normally. By default, I think at least apache sends the content-type
> as iso8859-1.
> 

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread Bruno Lustosa

On 7/21/07, aldnin <[EMAIL PROTECTED]> wrote:

When I try to send this query (select 'lacarrière' as test;) to a UTF8 
initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this error:

ERROR:  invalid byte sequence for encoding "UTF8": 0xe87265


Short answer: start using utf-8 for just everything, and your problems
will be gone.

Long explanation:
This is usually the case when you get data from a form and put it in
the database, and the two aren't using the same encoding.
I guess your pg connection is using unicode (so the db expects unicode
input), and your html is set to something else. To fix this, you have
two choices:

1-Run utf8_encode() on the input from your forms; or
2-Set all your html pages to use utf-8 encoding.

IMHO, option 2 is the way to go. I've been using utf-8 for everything
for quite some time, and has solved all my problems dealing with
accents, and so on.
You will need:
- All your HTML files encoded to utf-8 (quite easy with iconv, if you
are using Linux);
- Add a "Content-type: text/html; charset=utf-8" to all your pages.
This is easily done using PHP's header() function in a file included
by all your scripts.

This way, the pages will be unicode, any data entered will be posted
as unicode, and you will have no problems sending them to a database
that uses unicode.
Forget the  tag that sets the encoding. It's only used in case
the server doesn't send a Content-type header, which isn't the case
normally. By default, I think at least apache sends the content-type
as iso8859-1.

--
Bruno Lustosa <[EMAIL PROTECTED]>
ZCE - Zend Certified Engineer - PHP!
http://www.lustosa.net/


Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread Niel
Hi

You still haven't answered whether you're using any output handler, and
if so which one.  I use

   output_handler=mb_output_handler

> I overloaded the mbstring variables with:
> mbstring.func_overload = 6
> Setting it to "7" won't let me even echo something else.

Very strange, the only additional function overloaded is mail() and that
shouldn't stop you using echo.

As well as setting the internal encoding and enabling it with
mbstring.encoding_translation = On
   mbstring.internal_encoding = UTF-8

I would also use:
mbstring.language = English 
; or German in your case
mbstring.detect_order = UTF-8,eucjp-win,sjis-win
mbstring.http_input = UTF-8,SJIS,EUC-JP
mbstring.http_output = UTF-8

> Is it possible for mbstring to overload the pg-functions I need?
 No, and it shouldn't be needed. Those functions should be UTF-8 enabled
in order to communicate with the database and supply the correct data

  You're still referring to 'UTF8' which as I pointed out isn't the
official name of the encoding system.  I have no idea if PHP will
recognise  it, but to be safe I suggest you use the official 'UTF-8'
(hyphen between letters and number) in case it's causing problems.
  The other thing to be wary of, is output to the console. Some OSes do
not support unicode in the console.  So unless you're certain yours does,
I wouldn't use it as a test.

--
Niel Archer

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> You did not answer the most important question. What, if any, output
> buffering are you using?  Are you using the mbstring module?  If so, is
> it set to overload the old string functions?

Well, i checked for Multi Byte String functions, and it was enabled and 
configured before compiling with "=all".

After performing the query with pg_query, fetching the result with pg_fetch_all 
and putting the utf8 string into $dbString I tried to detect the encoding with:

mb_detect_encoding($dbSring)

I tells me:
ASCII

The content of $dbString is:
lacarrière

I overloaded the mbstring variables with:
mbstring.func_overload = 6
Setting it to "7" won't let me even echo something else.

mbstring.encoding_translation = On
mbstring.internal_encoding = UTF8

That's it, rest is default.

Is it possible for mbstring to overload the pg-functions I need?

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread Niel
Hi

> 
> Well, I searched all the source code of phpPgAdmin for charsets and I found:
> 
> "echo "\t charset={$data->codemap[$dbEncoding]}\" />\r\n";"
> 
> So this means, phpPgAdmin sets the output charset to the charset which
> is used by the databased connected to - but that's still not the
> problem, because I also know how to fix charset output in browsers.

Not exactly. As far as I can see, it only changes the value of the 
Content-type: header
in the HTML, it doesn't change the actual encoding output.

> > Once again indicating your data needs to be converted from some other
> > character set.
> 
> It's already converted to be compatible to utf8 when fetching it from some 
> other ressources.

I didn't mean the content of the database,. I was referring to the data
that PHP is actually processing, which appears to have been converted
within PHP

> Well, I've set the default_charset to UTF8, it was set before to "" (empty) -
> but the output on console (cli) and the problem is still the same also
> after changing this to UTF8, so: this is not the problem, 
.
It should be "UTF-8", this is the official designation from unicode,
although case will likely be ignored. As far as I know "UTF8" is not a
recognised encoding
  This however, is only the value that will be output as the
Content-Type charset, as noted above.


> I fetch something from database, which looks like "lacarrière" when I output 
> it in
> PHP - well don't let us get confused from PHPs output. Then I fetch
> something from another ressource looking like "lacarrière" - when I
> compare both strings in PHP it tells me that they are "not equal".

As I said before. Many of PHP's functions (the string one's for
comparing for example) are NOT multi-byte aware, so are NOT guaranteed
to work correctly.

You did not answer the most important question. What, if any, output
buffering are you using?  Are you using the mbstring module?  If so, is
it set to overload the old string functions?

--
Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> Please configure your email client so we don't receive 5 copies of your
> mail.

Just fixed that issue, don't be afraid of that in the future.

> This indicates that PHP not using UTF-8.  That output is typical of
> UTF-8 output as Latin characters.

Well, maybe the output is not correct - when running the php script on console 
(cli) it outputs me the content in the wrong charset, but that's not the 
problem, doing a utf8_decode() lets me output it in the right charset.

> Not true, it only indicates that phpPgAdmin is is configured to handle
> UTF-8 correctly.

Well, I searched all the source code of phpPgAdmin for charsets and I found:

"echo "\tcodemap[$dbEncoding]}\" />\r\n";"

So this means, phpPgAdmin sets the output charset to the charset which is used 
by the databased connected to - but that's still not the problem, because I 
also know how to fix charset output in browsers.

> Once again indicating your data needs to be converted from some other
> character set.

It's already converted to be compatible to utf8 when fetching it from some 
other ressources.

> I had similar problems getting PHP to work with UTF-8 and MySQL.  Many
> of PHP's function are not multibyte aware and assume a Latin character set.
> What, if any, output buffering are you using? What is your
> default_charset set to?

Well, I've set the default_charset to UTF8, it was set before to "" (empty) - 
but the output on console (cli) and the problem is still the same also after 
changing this to UTF8, so: this is not the problem, and I don't need proper 
output on console without utf8_decode() - if I want proper output there I just 
do a decode, like I do when I want it to get outputed in the browser properly.

Maybe a cleaner explanation of the problem:

I fetch something from database, which looks like "lacarrière" when I output 
it in PHP - well don't let us get confused from PHPs output. Then I fetch 
something from another ressource looking like "lacarrière" - when I compare 
both strings in PHP it tells me that they are "not equal".

So I HAVE TO do either an utf8_encode() on the string from the other ressource 
OR a utf8_decode() on the string from the database to compare them as "equal".

...and THIS means a lot of more code in my classes.

Hint: The other ressource is a socket connection (API) to another server.

The problem is quite simple I think, everything comming from the database is 
UTF8-byte encoded and needs to get UTF8-Decoded before you can work with it 
properly.

The default_charset seems to work only on output buffer, so the solution for 
that problem could only be a mechanism to tell PHP handling all strings UTF8 
byte encoded, which should mean a lot of more ressources to be taken for this 
process - I understand that this is not a solution.

So the only solutions could be: 

a) Decode and encode properly utf8 stuff and to take care if the content is 
utf8-byte encoded so it needs to be decoded before using it properly with other 
strings

b) A mechanism to tell the pg-functions in PHP to decode all data which is 
UTF8-Encoded. The ADODB-Layers seems to do that properly, but the pg-functions 
don't do that as I can see.

You can use this to reproduce it:

1. Create a table in postgres, on a UTF8 initialized database, insert something 
like "lacarrière" into it. Check if it's inserted correctly..

2. Check with psql the normal output, you should get either "lacarrière" or 
"lacarrière" so you can be sure it's inserted correctly.

3. Make a script which fetchs the string from the database to $dbString. 

4. Set a string $phpString = "lacarrière";

5. Compare both strings with "==" - you'll get "false"

Another hint:

Try to send "select 'lacarrière' as test;' with pg_query to any postgres 
database, you'll get an error, if not... well, then I'm wrong and I've set up 
PHP wrong to handle UTF8-stuff.

If you send "select '".utf8_encode(lacarrière)."' as test;" to your database 
this should work.

Also the above meant $phpString is NOT EQUAL to the result you would get from 
"select '".utf8_encode(lacarrière)."' as test;", you would need to compare it 
to utf8_decode($dbString) to be EQUAL.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread Niel
Hi

Please configure your email client so we don't receive 5 copies of your
mail.

> I already did this and all encoding settings are right, but I figured out 
> something more.
> 
> 1) Using pg_query for fetching UTF8 data from database is working properly. 
> Of course when I try to output it direclty then I get something like that as 
> output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get 
> it the right way "lacarrière".

This indicates that PHP not using UTF-8.  That output is typical of
UTF-8 output as Latin characters.

> 2) I found another PHP application which is able to insert UTF8 data 
> properly, phpPgAdmin, but it seems that it uses the ADODB-Layers for 
> executing SQL-statements.
> Well, the fact that phpPgAdmin runs on the same machine handling properly 
> UTF8 data means that my PHP is well configurated handling UTF8.

Not true, it only indicates that phpPgAdmin is is configured to handle
UTF-8 correctly.

> 3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
> database, it works properly, the insert is fine, so that's a temporary 
> solution for my first problem.

> 4) When I get data from database I usually would have to do a utf8_decode on 
> EVERY string which is fetched from database. So my solution is now, to handle 
> all strings comming UTF8 from database as they are comming with UTF8-bytes, 
> and really only then  when I need to decode them I decode them for further 
> use.

Once again indicating your data needs to be converted from some other
character set.


I had similar problems getting PHP to work with UTF-8 and MySQL.  Many
of PHP's function are not multibyte aware and assume a Latin character set.

What, if any, output buffering are you using? What is your
default_charset set to?

--
Niel Archer

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out 
something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of 
course when I try to output it direclty then I get something like that as 
output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it 
the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, 
phpPgAdmin, but it seems that it uses the ADODB-Layers for executing 
SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 
data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
database, it works properly, the insert is fine, so that's a temporary solution 
for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on 
EVERY string which is fetched from database. So my solution is now, to handle 
all strings comming UTF8 from database as they are comming with UTF8-bytes, and 
really only then  when I need to decode them I decode them for further use.

Problem:

Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when 
doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more 
ressources when using always a utf8_encode() on a string, also when the string 
does not include special characters. And this ressources are also wasted when 
the strings don't need to be utf8-encoded.

Workaround:
---
To don't waste ressources you have to do a utf8_encode only when you "guess" 
that there might be special characters - have fun with that, but it's the only 
way I see to work properly with that special characters in combination with 
postgres.








-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out 
something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of 
course when I try to output it direclty then I get something like that as 
output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it 
the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, 
phpPgAdmin, but it seems that it uses the ADODB-Layers for executing 
SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 
data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
database, it works properly, the insert is fine, so that's a temporary solution 
for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on 
EVERY string which is fetched from database. So my solution is now, to handle 
all strings comming UTF8 from database as they are comming with UTF8-bytes, and 
really only then  when I need to decode them I decode them for further use.

Problem:

Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when 
doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more 
ressources when using always a utf8_encode() on a string, also when the string 
does not include special characters. And this ressources are also wasted when 
the strings don't need to be utf8-encoded.

Workaround:
---
To don't waste ressources you have to do a utf8_encode only when you "guess" 
that there might be special characters - have fun with that, but it's the only 
way I see to work properly with that special characters in combination with 
postgres.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out 
something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of 
course when I try to output it direclty then I get something like that as 
output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it 
the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, 
phpPgAdmin, but it seems that it uses the ADODB-Layers for executing 
SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 
data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
database, it works properly, the insert is fine, so that's a temporary solution 
for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on 
EVERY string which is fetched from database. So my solution is now, to handle 
all strings comming UTF8 from database as they are comming with UTF8-bytes, and 
really only then  when I need to decode them I decode them for further use.

Problem:

Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when 
doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more 
ressources when using always a utf8_encode() on a string, also when the string 
does not include special characters. And this ressources are also wasted when 
the strings don't need to be utf8-encoded.

Workaround:
---
To don't waste ressources you have to do a utf8_encode only when you "guess" 
that there might be special characters - have fun with that, but it's the only 
way I see to work properly with that special characters in combination with 
postgres.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out 
something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of 
course when I try to output it direclty then I get something like that as 
output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it 
the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, 
phpPgAdmin, but it seems that it uses the ADODB-Layers for executing 
SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 
data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
database, it works properly, the insert is fine, so that's a temporary solution 
for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on 
EVERY string which is fetched from database. So my solution is now, to handle 
all strings comming UTF8 from database as they are comming with UTF8-bytes, and 
really only then  when I need to decode them I decode them for further use.

Problem:

Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when 
doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more 
ressources when using always a utf8_encode() on a string, also when the string 
does not include special characters. And this ressources are also wasted when 
the strings don't need to be utf8-encoded.

Workaround:
---
To don't waste ressources you have to do a utf8_encode only when you "guess" 
that there might be special characters - have fun with that, but it's the only 
way I see to work properly with that special characters in combination with 
postgres.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread aldnin
> My guess is that your PHP is not setup to handle UTF8, and is really
> sending something else. UTF8 is the default client encoding because that
> is the encoding of the database. It does not mean that PHP has set the
> right one. Before running your test, try executing this: "SET
> client_encoding TO LATIN1;" and see if that fixes it.

I already did this and all encoding settings are right, but I figured out 
something more.

1) Using pg_query for fetching UTF8 data from database is working properly. Of 
course when I try to output it direclty then I get something like that as 
output "lacarrière" - but when I use utf8_decode() on the UTF8-bytes I get it 
the right way "lacarrière".

2) I found another PHP application which is able to insert UTF8 data properly, 
phpPgAdmin, but it seems that it uses the ADODB-Layers for executing 
SQL-statements.
Well, the fact that phpPgAdmin runs on the same machine handling properly UTF8 
data means that my PHP is well configurated handling UTF8.

3) When I add to my DB-Class utf8_encode() on the querystring I send to the 
database, it works properly, the insert is fine, so that's a temporary solution 
for my first problem.

4) When I get data from database I usually would have to do a utf8_decode on 
EVERY string which is fetched from database. So my solution is now, to handle 
all strings comming UTF8 from database as they are comming with UTF8-bytes, and 
really only then  when I need to decode them I decode them for further use.

Problem:

Just declaring the string 'lacarrière' 10 millions times takes 5 seconds, when 
doing a utf8_encode() on it takes 13 seconds. So it needs 2-3 times more 
ressources when using always a utf8_encode() on a string, also when the string 
does not include special characters. And this ressources are also wasted when 
the strings don't need to be utf8-encoded.

Workaround:
---
To don't waste ressources you have to do a utf8_encode only when you "guess" 
that there might be special characters - have fun with that, but it's the only 
way I see to work properly with that special characters in combination with 
postgres.

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] PHP + PostgreSQL: invalid byte sequence for encoding "UTF8"

2007-07-21 Thread John DeSoi


On Jul 21, 2007, at 7:53 AM, aldnin wrote:

When I try to send this query (select 'lacarrière' as test;) to a  
UTF8 initialized pgsql-database (8.2.4) from PHP 5.2.3 I get this  
error:


ERROR:  invalid byte sequence for encoding "UTF8": 0xe87265

I use pg_query for the query delivery.

Client Encoding is set to:
 client_encoding
-
 UTF8
(1 row)

pg_client_encoding() also deliveres me "UTF8".



My guess is that your PHP is not setup to handle UTF8, and is really  
sending something else. UTF8 is the default client encoding because  
that is the encoding of the database. It does not mean that PHP has  
set the right one. Before running your test, try executing this: "SET  
client_encoding TO LATIN1;" and see if that fixes it.





John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php