On mån, 2009-11-16 at 22:37 +0200, Peter Eisentraut wrote:
On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
Sure. Client encoding is declared in body of a file, but BOM is
in head of the file. So, we should always ignore BOM sequence
at the file head no matter what client encoding
On ons, 2009-11-18 at 12:52 +0900, Itagaki Takahiro wrote:
Peter Eisentraut pete...@gmx.net wrote:
Together, that should cover a lot of cases. Not perfect, but far from
useless.
For Japanese users on Windows, the client encoding are always set to SJIS
because of the restriction of
On tis, 2009-11-17 at 23:22 -0500, Andrew Dunstan wrote:
Itagaki Takahiro wrote:
I don't want user to check the encoding of scripts before executing
--
it is far from fail-safe.
That's what we require in all other cases. Why should UTF8 be special?
But now we're back to the
Peter Eisentraut wrote:
But now we're back to the original problem. Certain editors insert BOMs
at the beginning of the file. And that is by any definition before the
embedded client encoding declaration. I think the only ways to solve
this are:
1) Ignore the BOM if a client encoding
On ons, 2009-11-18 at 08:52 -0500, Andrew Dunstan wrote:
4) set the client encoding before the file is read in any of the ways
that have already been discussed and then allow psql to eat the BOM.
This is certainly a workaround, just like piping the file through a
suitable sed expression would
Peter Eisentraut pete...@gmx.net writes:
This is certainly a workaround, just like piping the file through a
suitable sed expression would be, but conceptually, the client encoding
is a property of the file and should therefore be marked in the file.
In a perfect world things would be like
I don't know what the best solution is here. The BOM encoded as UTF-8
is valid data in other encodings. Of course, there is your point that
such data cannot be at the start of an SQL command.
Is the UTF-8 BOM ( EF BB BF ) actually valid data in any other multi-byte
encoding (other than
Itagaki Takahiro wrote:
Multi-byte scripts
without encoding are always dangerous whether BOM is present or not.
I'd say we can always throw an error when we find queries that contain
multi-byte characters if no prior encoding declaration.
You will break a gazillion scripts that today
Peter Eisentraut pete...@gmx.net writes:
I think I could support using the presence of the BOM as a fall-back
indicator of encoding in absence of any other declaration. It seems to
me, however, that the description above ignores the existence of
encodings other than SQL_ASCII and UTF8.
Yeah.
On tis, 2009-11-17 at 09:31 +0900, Itagaki Takahiro wrote:
Peter Eisentraut pete...@gmx.net wrote:
OK, I think the consensus here is:
- Eat BOM at beginning of file (as you implemented)
- Only when client encoding is UTF-8 -- please fix that
Are they AND condition? If so, this patch
On tis, 2009-11-17 at 00:59 -0800, Chuck McDevitt wrote:
Or is there a plan to read and convert the UTF-16 or UTF-32 to UTF-8,
so psql and PostgreSQL understand it?
(BTW, that would actually be nice on Windows, where UTF-16 is common).
Well, someone could implement UTF-16 or UTF-whatever as
-Original Message-
From: Peter Eisentraut [mailto:pete...@gmx.net]
Sent: Tuesday, November 17, 2009 9:05 AM
To: Chuck McDevitt
Cc: Itagaki Takahiro; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] UTF8 with BOM support in psql
On tis, 2009-11-17 at 00:59 -0800, Chuck McDevitt
Peter Eisentraut wrote:
On tis, 2009-11-17 at 00:59 -0800, Chuck McDevitt wrote:
Or is there a plan to read and convert the UTF-16 or UTF-32 to UTF-8,
so psql and PostgreSQL understand it?
(BTW, that would actually be nice on Windows, where UTF-16 is common).
Well, someone could
-Original Message-
From: Andrew Dunstan [mailto:and...@dunslane.net]
Sent: Tuesday, November 17, 2009 9:15 AM
To: Peter Eisentraut
Cc: Chuck McDevitt; Itagaki Takahiro; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] UTF8 with BOM support in psql
Peter Eisentraut wrote
Andrew Dunstan and...@dunslane.net writes:
Peter Eisentraut wrote:
Well, someone could implement UTF-16 or UTF-whatever as client encoding.
But I have not heard of any concrete proposals about that.
Doesn't the nul byte problem make that seriously hard?
Just about impossible. It would
Tom Lane wrote:
Andrew Dunstan and...@dunslane.net writes:
Peter Eisentraut wrote:
Well, someone could implement UTF-16 or UTF-whatever as client encoding.
But I have not heard of any concrete proposals about that.
Doesn't the nul byte problem make that seriously hard?
Andrew Dunstan and...@dunslane.net writes:
Well, it might be a good idea to provide at least some support in libpq.
Making each client do it from scratch seems a bit inefficient.
Encoding conversion seems far outside libpq's charter, and as for
from scratch there are other libraries for that.
Peter Eisentraut pete...@gmx.net wrote:
Together, that should cover a lot of cases. Not perfect, but far from
useless.
For Japanese users on Windows, the client encoding are always set to SJIS
because of the restriction of cmd.exe. But the script file can be written
in UTF8 with BOM. I don't
Andrew Dunstan and...@dunslane.net wrote:
Itagaki Takahiro wrote:
Multi-byte scripts
without encoding are always dangerous whether BOM is present or not.
I'd say we can always throw an error when we find queries that contain
multi-byte characters if no prior encoding declaration.
Itagaki Takahiro wrote:
I don't want user to check the encoding of scripts before executing --
it is far from fail-safe.
That's what we require in all other cases. Why should UTF8 be special?
If I have a script in Latin1 and Postgres thinks it's UTF8 it will
probably explode. Same for
Andrew Dunstan and...@dunslane.net wrote:
Itagaki Takahiro wrote:
I don't want user to check the encoding of scripts before executing --
it is far from fail-safe.
That's what we require in all other cases. Why should UTF8 be special?
No. I didn't think about UTF-8 nor BOM in that
On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
Sure. Client encoding is declared in body of a file, but BOM is
in head of the file. So, we should always ignore BOM sequence
at the file head no matter what client encoding is used.
The attached patch replace BOM with while spaces,
Peter Eisentraut pete...@gmx.net writes:
I'm not sure if replacing a BOM by three spaces is a good way to
implement eating, because it might throw off a column indicator
somewhere, say, but I couldn't reproduce a problem. Note that the U
+FEFF character is defined as *zero-width* non-breaking
Peter Eisentraut pete...@gmx.net wrote:
OK, I think the consensus here is:
- Eat BOM at beginning of file (as you implemented)
- Only when client encoding is UTF-8 -- please fix that
Are they AND condition? If so, this patch will be useless.
Please remember \encoding or SET client_encoding
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:
Please remember \encoding or SET client_encoding appear
*after* BOM at beginning of file. I'll agree if the condition is
Eat BOM at beginning of file and set client encoding to UTF-8,
As has been stated multiple times, that will not get
Itagaki Takahiro wrote:
Peter Eisentraut pete...@gmx.net wrote:
OK, I think the consensus here is:
- Eat BOM at beginning of file (as you implemented)
- Only when client encoding is UTF-8 -- please fix that
Are they AND condition? If so, this patch will be useless.
Please remember
Andrew Dunstan and...@dunslane.net writes:
As for when it can be set, unless I'm mistaken you should be able to set
it before any file is opened, if you need to, using PGOPTIONS or psql
dbname=mydb options='-c client_encoding=utf8'. Of course, if the
server encoding is utf8 then, in the
Tom Lane t...@sss.pgh.pa.us wrote:
Andrew Dunstan and...@dunslane.net writes:
if you need to, using PGOPTIONS or psql
dbname=mydb options='-c client_encoding=utf8'.
It could also be set in ~/.psqlrc, which would probably be the most
convenient method for regular users of UTF8 files who
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:
If encoding setting is reverted,
Eat BOM at beginning of file and set client encoding to UTF-8
will be much safer.
This isn't going to happen, so please stop wasting our time arguing
about it.
regards, tom lane
Tom Lane t...@sss.pgh.pa.us wrote:
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:
If encoding setting is reverted,
Eat BOM at beginning of file and set client encoding to UTF-8
will be much safer.
This isn't going to happen, so please stop wasting our time arguing
about it.
On tis, 2009-11-17 at 14:19 +0900, Itagaki Takahiro wrote:
The attachd patch is a new proposal of the feature.
When we found BOM at beginning of file, set expected_encoding to UTF8.
Before every execusion of query, if pset.encoding is not UTF8, we check the
query string not to contain any
Peter Eisentraut pete...@gmx.net wrote:
I think I could support using the presence of the BOM as a fall-back
indicator of encoding in absence of any other declaration.
What is the difference the fall-back and set client encoding to UTF-8
if BOM found ? I read this discussion that we cannot
On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
Client encoding is declared in body of a file, but BOM is
in head of the file. So, we should always ignore BOM sequence
at the file head no matter what client encoding is used.
The attached patch replace BOM with while spaces, but it
Peter Eisentraut wrote:
On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
Client encoding is declared in body of a file, but BOM is
in head of the file. So, we should always ignore BOM sequence
at the file head no matter what client encoding is used.
The attached patch replace
On ons, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
So, we should always ignore BOM sequence
at the file head no matter what client encoding is used.
I think we can't do that. That byte sequence might be valid user data
in other encodings.
--
Sent via pgsql-hackers mailing list
On Wed, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
The attached patch replace BOM with while spaces, but it does not
change client encoding automatically. I think we can always ignore
client encoding at the replacement because SQL command cannot start
with BOM sequence. If we don't
Peter Eisentraut wrote:
On Wed, 2009-10-21 at 13:11 +0900, Itagaki Takahiro wrote:
The attached patch replace BOM with while spaces, but it does not
change client encoding automatically. I think we can always ignore
client encoding at the replacement because SQL command cannot start
with
Bruce Momjian br...@momjian.us wrote:
Itagaki Takahiro wrote:
When psql opens a file with -f or \i, it checks first 3 bytes of the
file. If they are BOM, discard the 3 bytes and change client encoding
to UTF8 automatically.
Seems there is community support for accepting BOM:
On Tue, 2009-10-20 at 14:41 +0900, Itagaki Takahiro wrote:
UTF8 encoding text files with BOM (Byte Order Mark) are commonly
used in Windows, though BOM was designed for UTF16 text originally.
However, psql cannot read such format even if we set client encoding
to UTF8. Is it worth supporting
Bruce Momjian br...@momjian.us writes:
Seems there is community support for accepting BOM:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01625.php
That discussion has approximately nothing to do with the
much-more-invasive change that Itagaki-san is suggesting.
In particular I
Tom Lane wrote:
Bruce Momjian br...@momjian.us writes:
Seems there is community support for accepting BOM:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg01625.php
That discussion has approximately nothing to do with the
much-more-invasive change that Itagaki-san is
Andrew Dunstan and...@dunslane.net writes:
What I think we might sensibly do is to eat the leading BOM of an SQL
file iff the client encoding is UTF8, and otherwise treat it as just
bytes in whatever the encoding is.
That seems relatively non-risky.
Should we also do the same for files
Andrew Dunstan and...@dunslane.net wrote:
What I think we might sensibly do is to eat the leading BOM of an
SQL file iff the client encoding is UTF8, and otherwise treat it as
just bytes in whatever the encoding is.
Only at the beginning of the file or stream? What happens when people
2009/10/20 Tom Lane t...@sss.pgh.pa.us:
Andrew Dunstan and...@dunslane.net writes:
What I think we might sensibly do is to eat the leading BOM of an SQL
file iff the client encoding is UTF8, and otherwise treat it as just
bytes in whatever the encoding is.
That seems relatively non-risky.
On Oct 20, 2009, at 10:51 AM, Tom Lane wrote:
Andrew Dunstan and...@dunslane.net writes:
What I think we might sensibly do is to eat the leading BOM of an SQL
file iff the client encoding is UTF8, and otherwise treat it as just
bytes in whatever the encoding is.
That seems relatively
David Christensen da...@endpoint.com wrote:
Is that only when the default client encoding is set to UTF8
(PGCLIENTENCODING, whatever), or will it be coded to work with the
following:
$ psql -f file
Where file is:
BOM
SET CLIENT ENCODING 'utf8';
Sure. Client encoding is declared in
Itagaki Takahiro wrote:
UTF8 encoding text files with BOM (Byte Order Mark) are commonly
used in Windows, though BOM was designed for UTF16 text originally.
However, psql cannot read such format even if we set client encoding
to UTF8. Is it worth supporting those format in psql?
When psql
47 matches
Mail list logo