Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-14 Thread Sergey Burladyan
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:

 There are some issues:
 * Is it possible to determine the platform encoding?

There is no platform encoding in linux. File name encoding depend on user
locale, so different users can have different encoding of file name.

-- 
Sergey Burladyan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-13 Thread Itagaki Takahiro
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp wrote:

 Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote:
 
  Um,  I had a focus in help the problem which is not avoided. 
  I am not sensitive to a problem being avoided depending on usage. 
  However, I will wish to work spontaneously, when it is help much. 
 
 I'll research whether encoding of filesystem path is affected by
 locale settings or not in some platforms. Also, we need to research
 where we should get the system encoding when the locale is set to C,
 which is popular in Japanese users.

Here is a patch to implement GetPlatformEncoding() and convert absolute
file paths from database encoding to platform encoding. Since encoding
of paths are converted at AllocateFile() and BasicOpenFile(), not only
COPY TO/FROM but also almost of file operations are covered by the patch.
Callers of file access methods don't have to modify their codes.

Please test the patch in a variety of platforms. I tested it on Windows
and Linux, and then I found {PG_UTF8, ANSI_X3.4-1968} is required for
encoding_match_list in src/port/chklocale.c on Linux (FC6).

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



GetPlatformEncoding.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-13 Thread Tom Lane
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:
 Here is a patch to implement GetPlatformEncoding() and convert absolute
 file paths from database encoding to platform encoding.

This seems like a fairly significant overhead added to solve a really
minor problem (if it's not minor why has it never come up before?).

I'm also not convinced by any of the details --- why are GetACP and
pg_get_encoding_from_locale the things to look at, and why is fd.c an
appropriate place to hook in?  Surely if we need it here, we need it in
places like initdb as well.  But really this is much too low a level to
be solving the problem at.  If we have to convert path encodings in the
backend, we should be doing it once somewhere around the place where we
identify the value of PGDATA.  It should not be necessary to repeat all
this for every file access within the database directory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-13 Thread Hiroshi Saito

Hi.

Anyhow, I appreciate discussion. 

- Original Message - 
From: Tom Lane t...@sss.pgh.pa.us




Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:

Here is a patch to implement GetPlatformEncoding() and convert absolute
file paths from database encoding to platform encoding.


This seems like a fairly significant overhead added to solve a really
minor problem (if it's not minor why has it never come up before?).

I'm also not convinced by any of the details --- why are GetACP and
pg_get_encoding_from_locale the things to look at, and why is fd.c an
appropriate place to hook in?  Surely if we need it here, we need it in
places like initdb as well.  But really this is much too low a level to
be solving the problem at.  If we have to convert path encodings in the
backend, we should be doing it once somewhere around the place where we
identify the value of PGDATA.  It should not be necessary to repeat all
this for every file access within the database directory.


Ahh, I think this is a sensitive problem and requires careful handling too.
However, following tests are shown in order to help your understanding.
This is the case which can't be operated if no apply the patch of Itagaki-san. 


C:\workset PGDATA=C:\tmp\日本語 data

C:\workset PGPORT=5444

C:\workset PGHOME=C:\MinGW\local\pgsql

C:\workcmd.exe
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\workinitdb -E UTF-8 --no-locale
データベースシステム内のファイルの所有者はHIROSHIユーザでした。
このユーザがサーバプロセスを所有しなければなりません。

データベースクラスタはロケールCで初期化されます。
デフォルトのテキスト検索設定はenglishに設定されました。

ディレクトリC:/tmp/日本語 dataの権限を設定しています ... ok
サブディレクトリを作成しています ... ok
デフォルトのmax_connectionsを選択しています ... 100
デフォルトの shared_buffers を選択しています ... 32MB
設定ファイルを作成しています ... ok
C:/tmp/日本語 data/base/1にtemplate1データベースを作成しています ... ok
pg_authidを初期化しています ... ok
依存関係を初期化しています ... ok
システムビューを作成しています ... ok
システムオブジェクトの定義をロードしています ... ok
変換を作成しています ... ok
ディレクトリを作成しています ... ok
組み込みオブジェクトに権限を設定しています ... ok
情報スキーマを作成しています ... ok
template1データベースをバキュームしています ... ok
template1からtemplate0へコピーしています ... ok
template1からpostgresへコピーしています ... ok

警告: ローカル接続向けにtrust認証が有効です。
pg_hba.confを編集する、もしくは、次回initdbを実行する時に-Aオプショ
ンを使用することで変更することができます。

成功しました。以下を使用してデータベースサーバを起動することができます。

   postmaster -D C:/tmp/日本語 data
または
   pg_ctl -D C:/tmp/日本語 data -l logfile start


C:\workset PGCLIENTENCODING=SJIS

C:\workpsql postgres
psql (8.4beta1)
help でヘルプを表示します.

postgres=# create table 日本語(きー text);
CREATE TABLE
postgres=# insert into 日本語 values('いれた');
INSERT 0 1
postgres=# copy 日本語 to 'C:/tmp/日本語 data/日本語utf8.txt';
COPY 1
postgres=# delete from 日本語;
DELETE 1
postgres=# copy 日本語 from 'C:/tmp/日本語 data/日本語utf8.txt';
COPY 1
postgres=# select * from 日本語;
 きー

いれた
(1 行)

C:\workdir C:\tmp\日本語 data
ドライブ C のボリューム ラベルは SYS です
ボリューム シリアル番号は 1433-2C7C です

C:\tmp\日本語 data のディレクトリ

2009/04/13  23:22DIR  .
2009/04/13  23:22DIR  ..
2009/04/13  23:18DIR  base
2009/04/13  23:19DIR  global
2009/04/13  23:17DIR  pg_clog
2009/04/13  23:17 3,616 pg_hba.conf
2009/04/13  23:17 1,611 pg_ident.conf
2009/04/13  23:17DIR  pg_multixact
2009/04/13  23:23DIR  pg_stat_tmp
2009/04/13  23:17DIR  pg_subtrans
2009/04/13  23:17DIR  pg_tblspc
2009/04/13  23:17DIR  pg_twophase
2009/04/13  23:17 4 PG_VERSION
2009/04/13  23:17DIR  pg_xlog
2009/04/13  23:1717,112 postgresql.conf
2009/04/13  23:1938 postmaster.opts
2009/04/13  23:1924 postmaster.pid
2009/04/13  23:22 8 日本語utf8.txt
  7 個のファイル  22,413 バイト
 11 個のディレクトリ  42,780,246,016 バイトの空き領域




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-13 Thread Itagaki Takahiro

Tom Lane t...@sss.pgh.pa.us wrote:

 Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes:
  Here is a patch to implement GetPlatformEncoding() and convert absolute
  file paths from database encoding to platform encoding.
 
 This seems like a fairly significant overhead added to solve a really
 minor problem (if it's not minor why has it never come up before?).

It's not always a minor problem in Japan. It has been discussed in
users group in Japan several times. However, surely I should pay attention
to the performance. One of the solutions might be to cache the encoding
in GetPlatformEncoding(). There will be no overheads when database
encoding and platform encoding are same, that would be a typical use.

 It should not be necessary to repeat all
 this for every file access within the database directory.

That's why I added checking with is_absolute_path() there. We can
avoid conversion in normal file access under PGDATA because relative
paths are used for it. But I should have checked all of file access
not only in backends but also in client programs. I'll research them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-08 Thread Hiroshi Inoue
Tom Lane wrote:
 Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes:
 I want to solve one problem before the release of 8.4.
 However, since it also seems to be the new feature,
 if not enough for 8.4, you may suggest that it is 8.5.
 
 I'm not too clear on what this is really supposed to accomplish, but
 we are hardly going to put code like that into every single file access
 in Postgres, which is what seems to be the logical implication.
 Shouldn't we just tell people to use a database encoding that matches
 their system environment?

Unfortunately (as usual) under Japanese Windows there's no database
encoding that matches the system environment.
As for the file name in COPY command, there's little meaning to
convert it to the server encoding because the file name is irrelevant
to the database. Because Windows is Unicode(UTF-16) based, it seems
natural to convert the file name to wide characters once.

regards,
Hiroshi Inoue


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-08 Thread Hiroshi Saito

Hi.

- Original Message - 
From: Hiroshi Inoue in...@tpf.co.jp




Tom Lane wrote:

Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes:

I want to solve one problem before the release of 8.4.
However, since it also seems to be the new feature,
if not enough for 8.4, you may suggest that it is 8.5.


I'm not too clear on what this is really supposed to accomplish, but
we are hardly going to put code like that into every single file access
in Postgres, which is what seems to be the logical implication.
Shouldn't we just tell people to use a database encoding that matches
their system environment?


Unfortunately (as usual) under Japanese Windows there's no database
encoding that matches the system environment.
As for the file name in COPY command, there's little meaning to
convert it to the server encoding because the file name is irrelevant
to the database. Because Windows is Unicode(UTF-16) based, it seems
natural to convert the file name to wide characters once.


Yes, If server encoding can be chosen by windows, the facilities 
in good working order. It was not possible though it was regrettable. 


Regards,
Hiroshi Saito


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-08 Thread Hiroshi Saito

Hi Itagaki-san.

Um,  I had a focus in help the problem which is not avoided. 
I am not sensitive to a problem being avoided depending on usage. 
However, I will wish to work spontaneously, when it is help much. 


Regards,
Hiroshi Saito

- Original Message - 
From: Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp




Hi,

Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote:


At this time, a copy file name is UTF-8.  It was troubled by handling.:-(
Then,  I make this proposal patch.


I think the problem is not only in Windows but also in all platforms
where the database encoding doesn't match their OS's encoding.

Instead of Windows specific codes, how about adding GetPlatformEncoding()
and convert all of *absolute* paths? It would be performed at the lowest
API layer; i.e, BasicOpenFile(). Standard database file accesses with
RelFileNode are not affected because is uses *relative* paths.

There are some issues:
   * Is it possible to determine the platform encoding?
   * The above cannot handle non-ascii path under $PGDATA.
 Is it acceptable?
   * In Windows, the native encoding is UTF-16, but we will use SJIS
 if we take on the above method. Is the limitation acceptable?

Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-08 Thread Itagaki Takahiro

Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote:

 Um,  I had a focus in help the problem which is not avoided. 
 I am not sensitive to a problem being avoided depending on usage. 
 However, I will wish to work spontaneously, when it is help much. 

I'll research whether encoding of filesystem path is affected by
locale settings or not in some platforms. Also, we need to research
where we should get the system encoding when the locale is set to C,
which is popular in Japanese users.

I'll report to you the progress :)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-07 Thread Tom Lane
Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes:
 I want to solve one problem before the release of 8.4.
 However, since it also seems to be the new feature,
 if not enough for 8.4, you may suggest that it is 8.5.

I'm not too clear on what this is really supposed to accomplish, but
we are hardly going to put code like that into every single file access
in Postgres, which is what seems to be the logical implication.
Shouldn't we just tell people to use a database encoding that matches
their system environment?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Solution of the file name problem of copy on windows.

2009-04-07 Thread Itagaki Takahiro
Hi,

Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote:

 At this time, a copy file name is UTF-8.  It was troubled by handling.:-(
 Then,  I make this proposal patch.

I think the problem is not only in Windows but also in all platforms
where the database encoding doesn't match their OS's encoding.

Instead of Windows specific codes, how about adding GetPlatformEncoding()
and convert all of *absolute* paths? It would be performed at the lowest
API layer; i.e, BasicOpenFile(). Standard database file accesses with
RelFileNode are not affected because is uses *relative* paths.

There are some issues:
* Is it possible to determine the platform encoding?
* The above cannot handle non-ascii path under $PGDATA.
  Is it acceptable?
* In Windows, the native encoding is UTF-16, but we will use SJIS
  if we take on the above method. Is the limitation acceptable?

Comments welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers