Re: [HACKERS] Solution of the file name problem of copy on windows.
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: There are some issues: * Is it possible to determine the platform encoding? There is no platform encoding in linux. File name encoding depend on user locale, so different users can have different encoding of file name. -- Sergey Burladyan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp wrote: Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote: Um, I had a focus in help the problem which is not avoided. I am not sensitive to a problem being avoided depending on usage. However, I will wish to work spontaneously, when it is help much. I'll research whether encoding of filesystem path is affected by locale settings or not in some platforms. Also, we need to research where we should get the system encoding when the locale is set to C, which is popular in Japanese users. Here is a patch to implement GetPlatformEncoding() and convert absolute file paths from database encoding to platform encoding. Since encoding of paths are converted at AllocateFile() and BasicOpenFile(), not only COPY TO/FROM but also almost of file operations are covered by the patch. Callers of file access methods don't have to modify their codes. Please test the patch in a variety of platforms. I tested it on Windows and Linux, and then I found {PG_UTF8, ANSI_X3.4-1968} is required for encoding_match_list in src/port/chklocale.c on Linux (FC6). Regards, --- ITAGAKI Takahiro NTT Open Source Software Center GetPlatformEncoding.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Here is a patch to implement GetPlatformEncoding() and convert absolute file paths from database encoding to platform encoding. This seems like a fairly significant overhead added to solve a really minor problem (if it's not minor why has it never come up before?). I'm also not convinced by any of the details --- why are GetACP and pg_get_encoding_from_locale the things to look at, and why is fd.c an appropriate place to hook in? Surely if we need it here, we need it in places like initdb as well. But really this is much too low a level to be solving the problem at. If we have to convert path encodings in the backend, we should be doing it once somewhere around the place where we identify the value of PGDATA. It should not be necessary to repeat all this for every file access within the database directory. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hi. Anyhow, I appreciate discussion. - Original Message - From: Tom Lane t...@sss.pgh.pa.us Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Here is a patch to implement GetPlatformEncoding() and convert absolute file paths from database encoding to platform encoding. This seems like a fairly significant overhead added to solve a really minor problem (if it's not minor why has it never come up before?). I'm also not convinced by any of the details --- why are GetACP and pg_get_encoding_from_locale the things to look at, and why is fd.c an appropriate place to hook in? Surely if we need it here, we need it in places like initdb as well. But really this is much too low a level to be solving the problem at. If we have to convert path encodings in the backend, we should be doing it once somewhere around the place where we identify the value of PGDATA. It should not be necessary to repeat all this for every file access within the database directory. Ahh, I think this is a sensitive problem and requires careful handling too. However, following tests are shown in order to help your understanding. This is the case which can't be operated if no apply the patch of Itagaki-san. C:\workset PGDATA=C:\tmp\日本語 data C:\workset PGPORT=5444 C:\workset PGHOME=C:\MinGW\local\pgsql C:\workcmd.exe Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\workinitdb -E UTF-8 --no-locale データベースシステム内のファイルの所有者はHIROSHIユーザでした。 このユーザがサーバプロセスを所有しなければなりません。 データベースクラスタはロケールCで初期化されます。 デフォルトのテキスト検索設定はenglishに設定されました。 ディレクトリC:/tmp/日本語 dataの権限を設定しています ... ok サブディレクトリを作成しています ... ok デフォルトのmax_connectionsを選択しています ... 100 デフォルトの shared_buffers を選択しています ... 32MB 設定ファイルを作成しています ... ok C:/tmp/日本語 data/base/1にtemplate1データベースを作成しています ... ok pg_authidを初期化しています ... ok 依存関係を初期化しています ... ok システムビューを作成しています ... ok システムオブジェクトの定義をロードしています ... ok 変換を作成しています ... ok ディレクトリを作成しています ... ok 組み込みオブジェクトに権限を設定しています ... ok 情報スキーマを作成しています ... ok template1データベースをバキュームしています ... ok template1からtemplate0へコピーしています ... ok template1からpostgresへコピーしています ... ok 警告: ローカル接続向けにtrust認証が有効です。 pg_hba.confを編集する、もしくは、次回initdbを実行する時に-Aオプショ ンを使用することで変更することができます。 成功しました。以下を使用してデータベースサーバを起動することができます。 postmaster -D C:/tmp/日本語 data または pg_ctl -D C:/tmp/日本語 data -l logfile start C:\workset PGCLIENTENCODING=SJIS C:\workpsql postgres psql (8.4beta1) help でヘルプを表示します. postgres=# create table 日本語(きー text); CREATE TABLE postgres=# insert into 日本語 values('いれた'); INSERT 0 1 postgres=# copy 日本語 to 'C:/tmp/日本語 data/日本語utf8.txt'; COPY 1 postgres=# delete from 日本語; DELETE 1 postgres=# copy 日本語 from 'C:/tmp/日本語 data/日本語utf8.txt'; COPY 1 postgres=# select * from 日本語; きー いれた (1 行) C:\workdir C:\tmp\日本語 data ドライブ C のボリューム ラベルは SYS です ボリューム シリアル番号は 1433-2C7C です C:\tmp\日本語 data のディレクトリ 2009/04/13 23:22DIR . 2009/04/13 23:22DIR .. 2009/04/13 23:18DIR base 2009/04/13 23:19DIR global 2009/04/13 23:17DIR pg_clog 2009/04/13 23:17 3,616 pg_hba.conf 2009/04/13 23:17 1,611 pg_ident.conf 2009/04/13 23:17DIR pg_multixact 2009/04/13 23:23DIR pg_stat_tmp 2009/04/13 23:17DIR pg_subtrans 2009/04/13 23:17DIR pg_tblspc 2009/04/13 23:17DIR pg_twophase 2009/04/13 23:17 4 PG_VERSION 2009/04/13 23:17DIR pg_xlog 2009/04/13 23:1717,112 postgresql.conf 2009/04/13 23:1938 postmaster.opts 2009/04/13 23:1924 postmaster.pid 2009/04/13 23:22 8 日本語utf8.txt 7 個のファイル 22,413 バイト 11 個のディレクトリ 42,780,246,016 バイトの空き領域 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Tom Lane t...@sss.pgh.pa.us wrote: Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp writes: Here is a patch to implement GetPlatformEncoding() and convert absolute file paths from database encoding to platform encoding. This seems like a fairly significant overhead added to solve a really minor problem (if it's not minor why has it never come up before?). It's not always a minor problem in Japan. It has been discussed in users group in Japan several times. However, surely I should pay attention to the performance. One of the solutions might be to cache the encoding in GetPlatformEncoding(). There will be no overheads when database encoding and platform encoding are same, that would be a typical use. It should not be necessary to repeat all this for every file access within the database directory. That's why I added checking with is_absolute_path() there. We can avoid conversion in normal file access under PGDATA because relative paths are used for it. But I should have checked all of file access not only in backends but also in client programs. I'll research them... Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Tom Lane wrote: Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes: I want to solve one problem before the release of 8.4. However, since it also seems to be the new feature, if not enough for 8.4, you may suggest that it is 8.5. I'm not too clear on what this is really supposed to accomplish, but we are hardly going to put code like that into every single file access in Postgres, which is what seems to be the logical implication. Shouldn't we just tell people to use a database encoding that matches their system environment? Unfortunately (as usual) under Japanese Windows there's no database encoding that matches the system environment. As for the file name in COPY command, there's little meaning to convert it to the server encoding because the file name is irrelevant to the database. Because Windows is Unicode(UTF-16) based, it seems natural to convert the file name to wide characters once. regards, Hiroshi Inoue -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hi. - Original Message - From: Hiroshi Inoue in...@tpf.co.jp Tom Lane wrote: Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes: I want to solve one problem before the release of 8.4. However, since it also seems to be the new feature, if not enough for 8.4, you may suggest that it is 8.5. I'm not too clear on what this is really supposed to accomplish, but we are hardly going to put code like that into every single file access in Postgres, which is what seems to be the logical implication. Shouldn't we just tell people to use a database encoding that matches their system environment? Unfortunately (as usual) under Japanese Windows there's no database encoding that matches the system environment. As for the file name in COPY command, there's little meaning to convert it to the server encoding because the file name is irrelevant to the database. Because Windows is Unicode(UTF-16) based, it seems natural to convert the file name to wide characters once. Yes, If server encoding can be chosen by windows, the facilities in good working order. It was not possible though it was regrettable. Regards, Hiroshi Saito -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hi Itagaki-san. Um, I had a focus in help the problem which is not avoided. I am not sensitive to a problem being avoided depending on usage. However, I will wish to work spontaneously, when it is help much. Regards, Hiroshi Saito - Original Message - From: Itagaki Takahiro itagaki.takah...@oss.ntt.co.jp Hi, Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote: At this time, a copy file name is UTF-8. It was troubled by handling.:-( Then, I make this proposal patch. I think the problem is not only in Windows but also in all platforms where the database encoding doesn't match their OS's encoding. Instead of Windows specific codes, how about adding GetPlatformEncoding() and convert all of *absolute* paths? It would be performed at the lowest API layer; i.e, BasicOpenFile(). Standard database file accesses with RelFileNode are not affected because is uses *relative* paths. There are some issues: * Is it possible to determine the platform encoding? * The above cannot handle non-ascii path under $PGDATA. Is it acceptable? * In Windows, the native encoding is UTF-16, but we will use SJIS if we take on the above method. Is the limitation acceptable? Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote: Um, I had a focus in help the problem which is not avoided. I am not sensitive to a problem being avoided depending on usage. However, I will wish to work spontaneously, when it is help much. I'll research whether encoding of filesystem path is affected by locale settings or not in some platforms. Also, we need to research where we should get the system encoding when the locale is set to C, which is popular in Japanese users. I'll report to you the progress :) Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hiroshi Saito z-sa...@guitar.ocn.ne.jp writes: I want to solve one problem before the release of 8.4. However, since it also seems to be the new feature, if not enough for 8.4, you may suggest that it is 8.5. I'm not too clear on what this is really supposed to accomplish, but we are hardly going to put code like that into every single file access in Postgres, which is what seems to be the logical implication. Shouldn't we just tell people to use a database encoding that matches their system environment? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Solution of the file name problem of copy on windows.
Hi, Hiroshi Saito z-sa...@guitar.ocn.ne.jp wrote: At this time, a copy file name is UTF-8. It was troubled by handling.:-( Then, I make this proposal patch. I think the problem is not only in Windows but also in all platforms where the database encoding doesn't match their OS's encoding. Instead of Windows specific codes, how about adding GetPlatformEncoding() and convert all of *absolute* paths? It would be performed at the lowest API layer; i.e, BasicOpenFile(). Standard database file accesses with RelFileNode are not affected because is uses *relative* paths. There are some issues: * Is it possible to determine the platform encoding? * The above cannot handle non-ascii path under $PGDATA. Is it acceptable? * In Windows, the native encoding is UTF-16, but we will use SJIS if we take on the above method. Is the limitation acceptable? Comments welcome. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers