Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-21 Thread Kyotaro HORIGUCHI
Hello, At Tue, 13 Sep 2016 11:44:01 +0300, Heikki Linnakangas wrote in <7ff67a45-a53e-4d38-e25d-3a121afea...@iki.fi> > On 09/08/2016 09:35 AM, Kyotaro HORIGUCHI wrote: > > Returning in UTF-8 bloats the result string by about 1.5 times so > > it doesn't seem to make sense

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-13 Thread Heikki Linnakangas
On 09/08/2016 09:35 AM, Kyotaro HORIGUCHI wrote: Returning in UTF-8 bloats the result string by about 1.5 times so it doesn't seem to make sense comparing with it. But it takes real = 47.35s. Nice! I was hoping that this would also make the binaries smaller. A few dozen kB of storage is

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-12 Thread Kyotaro HORIGUCHI
At Thu, 8 Sep 2016 07:09:51 +, "Tsunakawa, Takayuki" wrote in <0A3221C70F24FB45833433255569204D1F5E7D4A@G01JPEXMBYT05> > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > > HORIGUCHI > > > > $

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-08 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > HORIGUCHI > > $ time psql postgres -c 'select t.a from t, generate_series(0, )' > > /dev/null > > real 0m22.696s > user 0m16.991s > sys 0m0.182s> > > Using binsearch the result

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-08 Thread Kyotaro HORIGUCHI
Hello, At Wed, 07 Sep 2016 16:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI wrote in <20160907.161304.112519789.horiguchi.kyot...@lab.ntt.co.jp> > > Implementing radix tree code, then redefining the format of mapping table > > > to suppot radix tree,

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-07 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > Thanks, by the way, there's another issue related to SJIS conversion. MS932 > has several characters that have multiple code points. By converting texts > in this encoding to and from

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-07 Thread Kyotaro HORIGUCHI
Hello, At Tue, 6 Sep 2016 03:43:46 +, "Tsunakawa, Takayuki" wrote in <0A3221C70F24FB45833433255569204D1F5E66CE@G01JPEXMBYT05> > > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > > HORIGUCHI >

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kyotaro > HORIGUCHI Implementing radix tree code, then redefining the format of mapping table > to suppot radix tree, then modifying mapping generator script are needed. > > If no one oppse to

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Kyotaro HORIGUCHI
Hello, At Mon, 5 Sep 2016 19:38:33 +0300, Heikki Linnakangas wrote in <529db688-72fc-1ca2-f898-b0b99e300...@iki.fi> > On 09/05/2016 05:47 PM, Tom Lane wrote: > > "Tsunakawa, Takayuki" writes: > >> Before digging into the problem, could you share

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tom Lane
"Tsunakawa, Takayuki" writes: > Using multibyte-functions like mb... to process characters would solve > the problem? Well, sure. The problem is (1) finding all the places that need that (I'd estimate dozens to hundreds of places in the core code, and then

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tsunakawa, Takayuki
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Heikki > But one thing that would help a little, would be to optimize the UTF-8 > -> SJIS conversion. It uses a very generic routine, with a binary search > over a large array of mappings. I bet you

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tsunakawa, Takayuki
From: Tom Lane [mailto:t...@sss.pgh.pa.us] > "Tsunakawa, Takayuki" writes: > > Before digging into the problem, could you share your impression on > > whether PostgreSQL can support SJIS? Would it be hopeless? > > I think it's pretty much hopeless. Even if we

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Heikki Linnakangas
On 09/05/2016 05:47 PM, Tom Lane wrote: "Tsunakawa, Takayuki" writes: Before digging into the problem, could you share your impression on whether PostgreSQL can support SJIS? Would it be hopeless? I think it's pretty much hopeless. Agreed. But one thing

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tom Lane
"Tsunakawa, Takayuki" writes: > Before digging into the problem, could you share your impression on > whether PostgreSQL can support SJIS? Would it be hopeless? I think it's pretty much hopeless. Even if we were willing to make every bit of code that looks for

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tatsuo Ishii
> Before digging into the problem, could you share your impression on whether > PostgreSQL can support SJIS? Would it be hopeless? Can't we find any > direction to go? Can I find relevant source code by searching specific words > like "ASCII", "HIGH_BIT", "\\" etc? For starters, you could

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tsunakawa, Takayuki
> From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii > > But what I'm wondering is why PostgreSQL doesn't support SJIS. Was there > any technical difficulty? Is there anything you are worried about if adding > SJIS? > > Yes, there's

Re: [HACKERS] Supporting SJIS as a database encoding

2016-09-05 Thread Tatsuo Ishii
> But what I'm wondering is why PostgreSQL doesn't support SJIS. Was there any > technical difficulty? Is there anything you are worried about if adding SJIS? Yes, there's a technical difficulty with backend code. In many places it is assumed that any string is "ASCII compatible", which means