from:"John Hansen"

[HACKERS] Optional postgres database not so optional in 8.1

2005-11-17 Thread John Hansen

On a fresh installation of postgrsql 8.1if you drop the
'postgres' database,
psql, createdb, etc. no longer works.

psql -l; ignores -dtemplate1, and createdb doesn't have such an option.

Maybe it should fallback to template1 if the postgres database doesn't
exist?

... John


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] someone working to add merge?

2005-11-11 Thread John Hansen

Jaime Casanova Wrote:
 
 But MERGE isn't REPLACE...
 
 REPLACE will delete old records to insert new ones; MERGE try 
 to insert and if the record exists then can UPDATE just a few 
 values, maybe incrementing them with a value (all the 
 calculation are doing by the MERGE)

That sounds like MySQL's 'INSERT INTO ... ON DUPLICATE KEY UPDATE',
which they recommend over REPLACE anyways.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] someone working to add merge?

2005-11-11 Thread John Hansen

I Wrote:

 From the mysql manual:
 
 'REPLACE works exactly like INSERT, except that if an old 
 record in the table has the same value as a new record for a 
 PRIMARY KEY or a UNIQUE index, the old record is deleted 
 before the new record is inserted. See Section 13.2.4, 
 INSERT Syntax.'

It also says:

 Note that unless the table has a PRIMARY KEY  or UNIQUE index, using a
REPLACE statement makes no sense. It becomes equivalent to INSERT,
because there is no index to be used to determine whether a new row
duplicates another. 


... John

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] someone working to add merge?

2005-11-11 Thread John Hansen

Tom Lane Wrote:

 Surely they require a unique constraint --- else the behavior 
 isn't even well defined, is it?

From the mysql manual:

'REPLACE works exactly like INSERT, except that if an old record in the
table has the same value as a new record for a PRIMARY KEY or a UNIQUE
index, the old record is deleted before the new record is inserted. See
Section 13.2.4, INSERT Syntax.'

... John

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen

Good people,

Just had a thought!

Might it be worth while protecting the postmaster from an OOM Kill on
Linux by setting /proc/{pid}/oom_adj to -17 ?
(Described vaguely in mm/oom_kill.c)

Kind Regards,

John Hansen


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen

Martijn van Oosterhout Wrote:
 
 Has it actually happened to you? PostgreSQL is pretty good 
 about its memory usage. Besides, seems to me it should be an 
 system admisitrator descision.

No, Just came across this by chance, and thought it might be a good
idea.
Perhaps as a postgresql.conf setting.

... John


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PG Killed by OOM Condition

2005-10-03 Thread John Hansen

Tom Lane Wrote:

 (a) wouldn't that require root privilege?  (b) how would we 
 determine whether we are on a system to which this applies?  
 (c) is it actually documented in a way that makes you think 
 it'll be a permanently supported feature (ie, somewhere 
 outside the source code)?

(a) No, /proc/{pid}/* is owned by the process
(b) /proc/{pid}/oom_adj exists ?
(c) No, from the source: (not docbooked, we don't want this one
cluttering up the manual)

... John

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Does anybody use ORDER BY x USING y?

2005-09-18 Thread John Hansen

Martijn van Oosterhout Wrote:

   All we lose is the ability to say USING [arbitrary op]. Does
anybody 
   use this. Would people object to requiring the operator after
USING 
   to be part of an operator class?
  
  Hmmm ... would this prevent the hackish workaround for
case-insensitive sort?
 
 Err, which hackish workaround would that be? The right 
 solution is citext which creates it's own operator class. 
 This doesn't have anything to do with functional indexes either.
 
 I've been using Google to find any interesting use of the 
 USING clause but havn't found any yet.

I was actually of the impression that that was exacty what it was for:
specifying what op(class) to use for the sort in case you wanted to use
a non-default opclass for the type, and/or if the less-than operator
wasn't called ''.

... John

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [ANNOUNCE] Welcome Core Team member Dave Page

2005-08-26 Thread John Hansen

Good on ya, Dave!

... John 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Gavin M. Roy
 Sent: Friday, August 26, 2005 1:51 PM
 To: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [ANNOUNCE] Welcome Core Team member Dave Page

 Congrats Dave!

 On Aug 25, 2005, at 5:59 PM, Josh Berkus wrote:

  Project members:

  On behalf of the PostgreSQL Core Team, I welcome Dave Page. 
  Dave has 
  been the head of the pgODBC project for a couple of years, 
 started the 
  pgAdmin project in 1998, has been our lead webmaster for 
 three years, 
  and is now (split with Magnus) in charge of the Win32 
 packaging.  In 
  these latter two roles, Dave's help is indispensable to the 
 PostgreSQL 
  release process, so we decided it was time for him to have 
 the title 
  for the work
  he's already doing.   We believe that anyone who's been around the  
  project
  for a few years will agree.

  So, welcome Dave Page as the newest member of Core!

  --
  --Josh Berkus

  Josh Berkus
  PostgreSQL Project Core Team
  www.postgresql.org

  ---(end of
  broadcast)---
  TIP 2: Don't 'kill -9' the postmaster

 Gavin M. Roy
 800 Pound Gorilla
 [EMAIL PROTECTED]

 ---(end of 
 broadcast)---
 TIP 5: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [ANNOUNCE] Welcome Core Team member Dave Page

2005-08-26 Thread John Hansen

Merlin Moncure Wrote:

 ... Be sure to mix in a request for 
 better Unicode support at the same time, Dave loves that.

As do I... :)

... John

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] FreeBSD ICU was Win32 unicode vs ICU

2005-08-24 Thread John Hansen

Kevin McArthur Wrote:

 Should the postgresql project also be looking at CLDR for 
 cross-platform unicode support?

Afaict, from the ICU website, ICU too uses CLDR.
Why reinvent the wheel?

... John

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] contrib/rtree_gist into core system?

2005-06-27 Thread John Hansen

Tom Lane [mailto:[EMAIL PROTECTED] Wrote:

 There's no HOWTO for rtree either.  Again, my point is not 
 that one couldn't be written; it's that we would probably be 
 better off spending the effort on a HOWTO for gist.

No, but the _current_ implementation of the rtree operators are ver much
self explaining and need no howto.

Union(x,y) = x + y
Intersect(x,y) = the values that are present in both x and y, or
_overlapping_region_
Size(x) = the size of the area/length of the line, number of elements,
etc...

Now, how simple is that compared to gist?

I for one, is yet to produce a working example of something as simple as
indexing an array of 2 elements [x y] represented by a custom type as
'[x y]' in string format (returned by type_out) internally stored as a
char[2], so that I can fetch all rows where [x y] = ':y' (:y meaning 2nd
element in array, x: meaning first element in array.

I chose this as something simple to play with, having no practical
application for me, but to get an understanding of gist, For now,. I
have put it in the too hard basket.

I did however in about half a day implement rtree support for inet/cidr
(ipv4 only) as you might recall.

Kind Regards,

John


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] contrib/rtree_gist into core system?

2005-06-26 Thread John Hansen

Tom Lane Wrote:

 ... but rtree has always 
 been marginal, and it's very hard to see where it can win over gist.

Simplicity!

Implementing rtree operators and support functions is FAR simpler than
implementing the GiST equivalents.

For example, suppose all you want to implement is the ~ operator for a
custom type, then technically all you need is 4 functions (well, 5
including the stub operators)

bool contains(type,type);
type intersect(type,type);
type union(type,type);
void size(type,*float);

And the 6 other operators simply defined as:
bool false(type) { return false; }

For GiST you still need 7 support functions + the operator function,
some of which aren't exactly simple to implement, the picksplit for
instance.

So I'd not recommend getting rid of rtree just yet. At least not until
someone has written an extensive howto on the subject of GiST
implementation.

... John

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Function's LEAST, GREATEST and DECODE (Oracle vararg polymorphic functions)

2005-06-24 Thread John Hansen

I'd vote that these functions should follow the semantics of the , and
 operators.

(NULL  x) is NULL;

... John 


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Friday, June 24, 2005 11:21 PM
 To: Pavel Stehule
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [PATCHES] Function's LEAST, GREATEST 
 and DECODE (Oracle vararg polymorphic functions) 
 
 [ moving to -hackers for a wider audience ]
 
 Today's issue: should the GREATEST/LEAST functions be strict 
 (return null if any input is null) or not (return null only 
 if all inputs are null, else return the largest/smallest of 
 the non-null inputs)?
 
 Pavel Stehule [EMAIL PROTECTED] writes:
  On Thu, 23 Jun 2005, Tom Lane wrote:
  Pavel Stehule [EMAIL PROTECTED] writes:
  +  /* If any argument is null, then result 
 is null (for GREATEST 
  + and LEAST)*/
  
  Are you sure about that?  The only reference I could find 
 says that 
  these functions are not strict in Oracle:
  
  
 http://download-east.oracle.com/otn_hosted_doc/rdb/pdf/sql_ref_v71_vo
  l1.pdf
  on page 2-185:
  
  The NULL keyword can appear in the list but is ignored. 
 However, not 
  all value expressions can be specified as NULL. That is, 
 a non-NULL 
  value expression must be in the list so that the data 
 type for the 
  expression can be determined.
  The GREATEST and LEAST functions can result in NULL only 
 if at run 
  time all value expressions result in NULL.
  
  The strict interpretation is mathematically cleaner, no doubt, but 
  offhand it seems less useful.
  
 
  I know it, But when moustly PostgreSQL function is strict I 
 desided so 
  greatest and least will be strict. There is two analogy:
 
  one, normal comparing which implicate strinct aggregate 
 function which 
  ignore NULL.
 
  Tom I don't know, what is better. Maybe Oracle,
 
  because
 
  least(nullif(col2, +max), nullif(col2, +max)) isn't really 
 readable, 
  but it's precedens for PostgreSQL. I selected more conservative 
  solution, but my patches are only start points for 
 discussion (really) :).
 
  Please, if You think, so Oracle way is good, correct it.
 
 I'm still favoring non-strict but it deserves more than two votes.
 Anybody else have an opinion?
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 8: explain analyze is your friend
 
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] GiST rtree logic is not right

2005-06-23 Thread John Hansen

 I'll look at problem after GiST concurrency. Fixing 
 rtree_gist is bug a fix, not a new feature, so I'm not 
 limited by 1 July.

Wont fixing rtree(_gist) require initdb, since the behaviour of the
operators will change?

... John


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] query plan ignoring check constraints

2005-06-21 Thread John Hansen

Bruno Wolff III [mailto:[EMAIL PROTECTED] Wrote

 I think the real problem is that check constraints on tables 
 aren't used by the optimizer. Given that, what you have below 
 is expected.
 There has been talk about that in the past, but I haven't 
 heard anything recently about someone considering implenting that.
 
 For your problem consider not using a partial index. It isn't 
 going to save anything if it has a constraint matching that 
 of the table.


Ahh, I get it now,... 

  If a column has a CHECK (col IN (1,2,3)) and a query says .. WHERE col
= 4; then the planner should 
  know that the query will return 0 rows, right?

... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] query plan ignoring check constraints

2005-06-21 Thread John Hansen

Bruno Wolff III [mailto:[EMAIL PROTECTED] Wrote:

 You only want to use partial indexes when they don't cover 
 the whole table. They make sense to enforce uniqueness of a 
 column under some condition and when you can save significant 
 space (becuase the condition is only satisfied for a small 
 fraction of rows).


Yes, I know that,. 

I misunderstood the original post as a request for queries NOT to use
indexes where it doesn't match the table contents.

.. John

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] query plan ignoring check constraints

2005-06-20 Thread John Hansen

Someone Wrote:

 Should not check constraint act as the first filter? The index should 
 ideally be scanned only when the check constraint is passed by the
search 
 criteria but surprisingly it did not happen. The explain analyze
showed 
 cost for index scans of subtables that cannot contain rows matching
the 
 search criteria.

Obviously, indexes on columns with a check constraint, should be
qualified with the same check constraint.

test=# CREATE TABLE test (
   foo text check(foo IN ('YES','NO'))
);
CREATE TABLE
test=# CREATE INDEX text_foo_idx ON test (foo) WHERE foo IN('YES','NO');
CREATE INDEX
test=# INSERT INTO test VALUES ('YES');
INSERT 280188 1
test=# INSERT INTO test VALUES ('NO');
INSERT 280189 1
test=# INSERT INTO test VALUES ('no');
ERROR:  new row for relation test violates check constraint
test_foo_check
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE foo = 'YES';
 QUERY PLAN



 Index Scan using text_foo_idx on test  (cost=0.00..5.82 rows=7
width=32) (actual time=0.369..0.376 rows=1 loops=1)
   Index Cond: (foo = 'YES'::text)
 Total runtime: 0.490 ms
(3 rows)
test=# EXPLAIN ANALYZE SELECT * FROM test WHERE foo = 'no';
   QUERY PLAN



 Seq Scan on test  (cost=0.00..25.38 rows=7 width=32) (actual
time=0.358..0.358 rows=0 loops=1)
   Filter: (foo = 'no'::text)
 Total runtime: 0.421 ms
(3 rows)
test=# 

... John

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] Returning Composite Types from C functions

2005-06-18 Thread John Hansen

Hi all,

CREATE TYPE my_type AS (
a int,
b int,
c int,
d int,
e int
);

CREATE FUNCTION text_to_my_type(text)
RETURNS my_type
AS 'my_lib.so'
LANGUAGE 'C' IMMUTABLE STRICT;


CREATE CAST (text AS my_type) WITH FUNCTION text_to_my_type (text);

SELECT ('1:2:3:4:5'::text::my_type).*;

This results in the text_to_my_type(text) function being called no less
than 5 times. Once for each element.

Is this the desired behaviour, or a bug?


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Returning Composite Types from C functions

2005-06-18 Thread John Hansen

Michael Fuhr [mailto:[EMAIL PROTECTED] Wrote:
 Sent: Saturday, June 18, 2005 9:56 PM
 To: John Hansen
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Returning Composite Types from C functions

 On Sat, Jun 18, 2005 at 09:18:34PM +1000, John Hansen wrote:

  SELECT ('1:2:3:4:5'::text::my_type).*;

  This results in the text_to_my_type(text) function being called no 
  less than 5 times. Once for each element.

  Is this the desired behaviour, or a bug?

 It's a known behavior with functions that return composite types.
 Apparently it's not easy to fix:

 http://archives.postgresql.org/pgsql-hackers/2005-04/msg00971.php

There is a workaround tho, so should be fixable:

SELECT (a.b).* FROM (SELECT ('1:2:3:4:5'::text::my_type) AS b) AS a;

Or am I missing something?

 --
 Michael Fuhr
 http://www.fuhr.org/~mfuhr/

... John

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Returning Composite Types from C functions

2005-06-18 Thread John Hansen

Yes, it worked for me,... 

But my point is the workaround shouldn't be nescessary 

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, June 18, 2005 11:36 PM
 To: John Hansen
 Cc: Michael Fuhr; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Returning Composite Types from C functions 
 
 John Hansen [EMAIL PROTECTED] writes:
  There is a workaround tho, so should be fixable:
  SELECT (a.b).* FROM (SELECT ('1:2:3:4:5'::text::my_type) AS 
 b) AS a; 
  Or am I missing something?
 
 Try it ;-)
 
   regards, tom lane
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] LGPL

2005-06-15 Thread John Hansen

Josh,

 Both of these would be fine as add-ins to be distributed *separately*
through 
 pgFoundry or even the mirrors if they prove popular.   
 Bundling them in unified distribution binaries with PostgreSQL would
be a 
 significant problem. 
 
 You see this in other projects all the time:  Requriements: 
 __, which is GPL and can be downloaded from __ . 
  We've managed so far to avoid needing external libraries 
 which are not standard on most POSIX platforms, and it would 
 be nice to keep it that way instead of doing the component 
 easter egg hunt (which users of Linux multimedia apps are 
 familiar with).
 
 This means that you're unlikely to be able to use glib unless 
 it becomes standard on POSIX platforms, and someone makes a 
 Windows port.  Out of curiosity, what did you want to use it *for*?

Ohh,. Just felt like a cleaner interface than ICU,

 As for a URI type, I don't see the problem with doing that as 
 a PostgreSQL add-in downloadable from PGFoundry.  Given the 
 variety of URI implementations, I'm not sure we'd want a 
 single URI type as standard anyway.  

That I don't know, Yet... However what I've come up with so far, has
proven quite useful.

Remember the all famous 'email' type?
This has the same functionality, if not better

create table email_addresses (email_address text,uri uri);
insert into email_addresses (email_address) VALUES
('[EMAIL PROTECTED]');
update email_addresses set uri = 'mailto:'||email_address::text; --
could be a rule on insert!
select (uri).username,(uri).host from email_addresses ;
 username |  host
--+
 john | geeknet.com.au
(1 row)

And here's the really funky bit:

select email_address = '[EMAIL PROTECTED]' from email_addresses ;
 ?column?
--
 f
(1 row)

select uri = 'mailto:[EMAIL PROTECTED]'::text::uri from
email_addresses ;
 ?column?
--
 t
(1 row)

As it should, since email sent to the two email addresses would end up
in the same mailbox
The same applies to other URI formats of course. It appears this uri
library is fully spec compliant.

 According to the FSF's junior licensing maven, building in a 
 GPL data type or other plug-in would make *your instance* of 
 PostgreSQL GPL, but so does PL/R and PostGIS, so that's 
 nothing new.  It just needs to be distributed separately.
 
 FYI, the reason the GPL linking issue is vague is that it 
 depends on local copyright law, which varies from country to 
 country and in the US from state to state.  This is 
 deliberate by the FSF because an agreement which depends on 
 local copyright law is stronger in court than one which sets its own 
 explicit terms.   If anyone has nuts-and-bolts questions 
 about GPL/LGPL 
 issues, I have some friends at the FSF and can get answers 
 from the horse's mouth.
 

Thanks for the explanation

 -- 
 Josh Berkus
 Aglio Database Solutions
 San Francisco
 
 

... John

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] max_fsm_pages 800k ... ?

2005-06-15 Thread John Hansen

INFO:  analyzing pg_catalog.pg_depend
INFO:  pg_depend: 27 pages, 3866 rows sampled, 3866 estimated total
rows
INFO:  free space map: 423 relations, 88475 pages stored; 431200 total
pages needed
DETAIL:  Allocated FSM size: 4000 relations + 8 pages = 705 kB
shared memory. 

This, on a database with a moderate amount of rows (10 Million)
So,.. I'd say fairly normal.

... John
 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Marc 
 G. Fournier
 Sent: Thursday, June 16, 2005 3:57 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] max_fsm_pages 800k ... ?
 
 
 Without more information to provide at this time, does the 
 following seem unusual?
 
 INFO:  free space map: 252 relations, 411494 pages stored; 
 738640 total pages needed
 DETAIL:  Allocated FSM size: 2000 relations + 40 pages = 
 2463 kB shared memory.
 
 I know there are alot of factors involved in the above, which 
 I'm looking into, but when I first saw the above, I just 
 about had a heartattack, only because i've never seen such 
 high #s for total pages needed ...
 
 ... Is this something that others are seeing as relatively normal?
 
 
 
 
 
 Marc G. Fournier   Hub.Org Networking Services 
 (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy 
  ICQ: 7615664
 
 ---(end of 
 broadcast)---
 TIP 8: explain analyze is your friend
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] PG_FREE_IF_COPY()

2005-06-14 Thread John Hansen

Given the following snippet:

HeapTupleHeader tuple;
Datum   temp;
bool  isnull;

tuple = PG_GETARG_HEAPTUPLEHEADER(0);
temp  = GetAttributeByName(tuple, data, isnull);


When using this for a btree operator functions, you need to
PG_FREE_IF_COPY(?,?);

Which of the above parameters need I free?

tuple, or temp, Or both?

... John

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] PG_FREE_IF_COPY()

2005-06-14 Thread John Hansen

Tom Lane [mailto:[EMAIL PROTECTED] wrote:
 John Hansen [EMAIL PROTECTED] writes:
  Given the following snippet:
  HeapTupleHeader tuple;
  Datum   temp;
  bool  isnull;
   
  tuple = PG_GETARG_HEAPTUPLEHEADER(0);
  temp  = GetAttributeByName(tuple, data, isnull);
 
  When using this for a btree operator functions, you need to 
  PG_FREE_IF_COPY(?,?);
 
  Which of the above parameters need I free?
 
 It'd probably be wise to do a FREE_IF_COPY on the tuple.
 
 GetAttributeByName, however, hasn't copied anything (if it's 
 a pass-by-ref type you'll just get a pointer into the tuple).
 If you do a copy further down that might be a reason to free 
 something, but this code alone isn't making a copy.

Cool, thanks...

 
   regards, tom lane
 
 

... John

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[HACKERS] LGPL

2005-06-14 Thread John Hansen

Is there any reason why we would not be able to use LGPL code in PG?

... John

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

What about GPL ?
I assume that's out of the question!

 -Original Message-
 From: Marc G. Fournier [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, June 15, 2005 11:59 AM
 To: John Hansen
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] LGPL
 
 
 We already do ... libreadline ...
 
 On Wed, 15 Jun 2005, John Hansen wrote:
 
  Is there any reason why we would not be able to use LGPL code in PG?
 
  ... John
 
  ---(end of 
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] 
 so that your
   message can get through to the mailing list cleanly
 
 
 
 
 
 Marc G. Fournier   Hub.Org Networking Services 
 (http://www.hub.org)
 Email: [EMAIL PROTECTED]   Yahoo!: yscrappy 
  ICQ: 7615664
 
 

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

Ooooh

I got the impression that using GPL libraries was a Bad Thing(tm)

... John 

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, June 15, 2005 12:15 PM
 To: Marc G. Fournier
 Cc: John Hansen; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] LGPL

 Er, no. It's GPL, not LGPL software. My readline.h says:

The GNU Readline Library is free software; you can redistribute it
and/or modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2, or
(at your option) any later version.

 see this ancient thread: 
 http://archives.postgresql.org/pgsql-hackers/2000-12/msg01029.php

 cheers

 andrew

 Marc G. Fournier wrote:

  We already do ... libreadline ...

  On Wed, 15 Jun 2005, John Hansen wrote:

  Is there any reason why we would not be able to use LGPL 
 code in PG?

  ... John

  ---(end of 
  broadcast)---
  TIP 3: if posting/reading through Usenet, please send an 
 appropriate
   subscribe-nomail command to [EMAIL PROTECTED] 
 so that your
   message can get through to the mailing list cleanly

  Marc G. Fournier   Hub.Org Networking Services 
  (http://www.hub.org)
  Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 
  7615664

  ---(end of 
  broadcast)---
  TIP 4: Don't 'kill -9' the postmaster

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

So, what's the story with readline? 

 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, June 15, 2005 12:11 PM
 To: John Hansen
 Cc: Marc G. Fournier; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] LGPL
 
 John Hansen wrote:
  What about GPL ?
  I assume that's out of the question!
 
 If we add some GPL code, the entire binary becomes GPL, and 
 that prevents closed-source commercial versions from being produced.
 
 --
 -
 
 
  
   -Original Message-
   From: Marc G. Fournier [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, June 15, 2005 11:59 AM
   To: John Hansen
   Cc: pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] LGPL
   
   
   We already do ... libreadline ...
   
   On Wed, 15 Jun 2005, John Hansen wrote:
   
Is there any reason why we would not be able to use 
 LGPL code in PG?
   
... John
   
---(end of
broadcast)---
TIP 3: if posting/reading through Usenet, please send 
 an appropriate
 subscribe-nomail command to [EMAIL PROTECTED]
   so that your
 message can get through to the mailing list cleanly
   
   
   
   
   
   Marc G. Fournier   Hub.Org Networking Services 
   (http://www.hub.org)
   Email: [EMAIL PROTECTED]   Yahoo!: yscrappy 
ICQ: 7615664
   
   
  
  ---(end of 
  broadcast)---
  TIP 1: subscribe and unsubscribe commands go to 
  [EMAIL PROTECTED]
  
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

 Agreed.
 
 With libreadline, we are not taking their code or 
 distributing it, but merely linking to it if it exists.  Now, 
 some say that is enough to make us GPL, but many don't agree 
 with that interpretation.


Right,. That's actually exactly what I meant: using GPL/LGPL libraries
by linking to them.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

Tom Lane [mailto:[EMAIL PROTECTED] Wrote:
 John Hansen [EMAIL PROTECTED] writes:
  Is there any reason why we would not be able to use LGPL code in PG?
 
 Another point of view on this: it's OK to use LGPL code if 
 it's available on the local platform, so long as we don't 
 *require* it to be present.  It's even safer if the LGPL code 
 is merely one implementation of an API that has other 
 implementations under different licenses.
 For instance I have no fear at all of linking to glibc, and 
 little of linking to libreadline (the latter because we can 
 also use the BSD libedit).
 
 If we could not build without libreadline then we would have 
 a very big problem.  And we certainly aren't going to 
 textually incorporate any new LGPL (or GPL) code into our 
 distribution.

Right,... Let me be more specific then,

What are your thoughts on using the glib
(http://developer.gnome.org/doc/API/2.2/glib/index.html) library for
some functionality in pg?

Additionally,. I came across this fine library
(http://home.gna.org/uri/uri.en.html) which I'd like to use as a base
for a new URI type, unfortunately it's GPL, so based on the above I'm
guessing using it as is, is out of the question?

 
   regards, tom lane
 
 

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] LGPL

2005-06-14 Thread John Hansen

Tom Lane [mailto:[EMAIL PROTECTED] Wrote:
 John Hansen [EMAIL PROTECTED] writes:
  Right,... Let me be more specific then,
 
  What are your thoughts on using the glib
  (http://developer.gnome.org/doc/API/2.2/glib/index.html) 
 library for 
  some functionality in pg?
 
 Right offhand that seems like a nonstarter.  Exactly how 
 would you use it in a way that didn't turn it into a required 
 component?  It looks to me like a collection of bits that are 
 pretty useful but also very low-level, and hence not easily separable.

K, that's what confused me as I got the impression it was ok to require
LGPL libraries but not GPL.

 
  Additionally,. I came across this fine library
  (http://home.gna.org/uri/uri.en.html) which I'd like to use 
 as a base 
  for a new URI type, unfortunately it's GPL, so based on the 
 above I'm 
  guessing using it as is, is out of the question?
 
 Sure, you can do whatever you like with that ... as long as 
 you're not expecting us to distribute the combined code as 
 part of Postgres.
 
 It's worth reiterating here that GPL/LGPL code plus BSD code 
 is no problem whatever for local development and use.  It's 
 only if you want to redistribute the result that you have to 
 worry about what the licenses require.  Since Postgres is a 
 BSD-license project, *we* are not going to redistribute any 
 GPL or LGPL code, nor any code that fundamentally depends on 
 code that is so licensed.  But you can pretty much do what 
 you like in your own sandbox.  In particular, you could 
 develop a datatype that requires a GPL/LGPL library, and then 
 distribute that code by itself as GPL/LGPL, and neither the 
 GPL nor BSD camps would have any problem with that.  Just 
 don't expect us to put such code in a BSD distribution ...

That's what I was afraid of

 
   regards, tom lane
 
 

... John

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] unicode upper/lower functions

2005-06-06 Thread John Hansen

Yes,

Thank you! :)

.. John 

 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, June 07, 2005 10:07 AM
 To: John Hansen
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] unicode upper/lower functions
 
 
 I think we have decided to use the ICU library to implement 
 multiple locales.
 
 --
 -
 
 John Hansen wrote:
  Hi list,
  
  
  Attached for your perusal, unicode versions of upper/lower, 
 that work 
  independent of locale except for the following languages:
  
  Turkish, Azeri, and Lithuanian.
  There are 15 locale specific cases in total not covered.
  
  
  --
  John Hansen [EMAIL PROTECTED]
  GeekNET
 
 [ Attachment, skipping... ]
 
  
  ---(end of 
  broadcast)---
  TIP 7: don't forget to increase your free space map settings
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] unicode upper/lower functions

2005-06-06 Thread John Hansen

... Except,.. It was never decided if the 'C' locale optimisations was
going to be removed if/when implementing ICU.

Tho I think the  conclusion was a postgresql.conf parameter to
enable/disable the optimisations.
Either way, this code is now obsolete.

... John 

 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, June 07, 2005 10:07 AM
 To: John Hansen
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] unicode upper/lower functions
 
 
 I think we have decided to use the ICU library to implement 
 multiple locales.
 
 --
 -
 
 John Hansen wrote:
  Hi list,
  
  
  Attached for your perusal, unicode versions of upper/lower, 
 that work 
  independent of locale except for the following languages:
  
  Turkish, Azeri, and Lithuanian.
  There are 15 locale specific cases in total not covered.
  
  
  --
  John Hansen [EMAIL PROTECTED]
  GeekNET
 
 [ Attachment, skipping... ]
 
  
  ---(end of 
  broadcast)---
  TIP 7: don't forget to increase your free space map settings
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] unicode upper/lower functions

2005-06-06 Thread John Hansen

Bruce Momjian wrote:
 John Hansen wrote:
  ... Except,.. It was never decided if the 'C' locale 
 optimisations was 
  going to be removed if/when implementing ICU.
 
 Uh, why would we remove it?  Oh, meaning if the locale is C 
 we bypass locale lookups?  I think we will have to see what 
 performance we have with things.

Uhh, not quite: If locale is 'C' the current assumption is 7-bit ASCII
for upper/lower/initcap.
ICU is capable of properky doing upper/lower/initcap except for the
cases described in this (obsolete) patch.

 
  Tho I think the  conclusion was a postgresql.conf parameter to 
  enable/disable the optimisations.
  Either way, this code is now obsolete.
 
 Thanks.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] Unicode characters above 0x10000 #2

2005-06-04 Thread John Hansen

Bruce,

Attached patch replaces the original, applied today against CVS HEAD.
Fixes the surrogates, and limits to 4 byte utf8 as per spec.

Also extends UtfToLocal to 4 byte characters (tho, it does not add any,
just enables the code to handle them. If my interpretation of this code
is wrong, please let me know, and correct it).

... John

 -Original Message-
 From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, June 05, 2005 11:23 AM
 To: pgman@candle.pha.pa.us
 Cc: John Hansen; pgsql-hackers@postgresql.org; PostgreSQL-patches
 Subject: Re: [PATCHES] Unicode characters above 0x1 #2
 
 
 Your patch has been added to the PostgreSQL unapplied patches list at:
 
   http://momjian.postgresql.org/cgi-bin/pgpatches
 
 It will be applied as soon as one of the PostgreSQL 
 committers reviews and approves it.
 
 --
 -
 
 
 pgman wrote:
  
  I have backed out this patch.  It is unclear it is a bug fix.
  
  It will be saved for 8.1.
  
  
 --
  -
  
  pgman wrote:
   
   Patch applied.  Thanks.
   
   
 
   ---
   
   
   John Hansen wrote:
3 times lucky?

Last one broke utf8 G

This one works, Too tired, sorry for the inconvenience..

... John
   
   Content-Description: cvs.diff
   
   [ Attachment, skipping... ]
   

---(end of 
broadcast)---
TIP 9: the planner will ignore your desire to choose an 
 index scan if your
  joining column's datatypes do not match
   
   -- 
 Bruce Momjian|  http://candle.pha.pa.us
 pgman@candle.pha.pa.us   |  (610) 359-1001
 +  If your life is a hard drive, |  13 Roberts Road
 +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
  
  -- 
Bruce Momjian|  http://candle.pha.pa.us
pgman@candle.pha.pa.us   |  (610) 359-1001
+  If your life is a hard drive, |  13 Roberts Road
+  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
  ===
  RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/wchar.c,v
  retrieving revision 1.38
  diff -c -r1.38 wchar.c
  *** src/backend/utils/mb/wchar.c17 Sep 2004 21:59:57 
 - 1.38
  --- src/backend/utils/mb/wchar.c21 Nov 2004 09:58:36 -
  ***
  *** 343,348 
  --- 343,373 
  return (pg_euc_dsplen(s));
}

  + bool isLegalUTF8(const UTF8 *source, int len) {
  + UTF8 a;
  + const UTF8 *srcptr = source+len;
  + if(!source || (pg_utf_mblen(source) != len)) return false;
  + switch (len) {
  + default: return false;
  + /* Everything else falls through when true... */
  + case 6: if ((a = (*--srcptr))  0x80 || a  
 0xBF) return false;
  + case 5: if ((a = (*--srcptr))  0x80 || a  
 0xBF) return false;
  + case 4: if ((a = (*--srcptr))  0x80 || a  
 0xBF) return false;
  + case 3: if ((a = (*--srcptr))  0x80 || a  
 0xBF) return false;
  + case 2: if ((a = (*--srcptr))  0xBF) return false;
  + switch (*source) {
  + /* no fall-through in this inner switch */
  + case 0xE0: if (a  0xA0) return false; break;
  + case 0xF0: if (a  0x90) return false; break;
  + case 0xF4: if (a  0x8F) return false; break;
  + default:  if (a  0x80) return false;
  + }
  + case 1: if (*source = 0x80  *source  
 0xC2) return false;
  + if (*source  0xFD) return false;
  + }
  + return true;
  + }
  + 
/*
 * convert UTF-8 string to pg_wchar (UCS-2)
 * caller should allocate enough space for to
  ***
  *** 398,404 
 * returns the byte length of a UTF-8 word pointed to by s
 */
int
  ! pg_utf_mblen(const unsigned char *s)
{
  int len = 1;

  --- 423,429 
 * returns the byte length of a UTF-8 word pointed to by s
 */
int
  ! pg_utf_mblen(const UTF8 *s)
{
  int len = 1;

  ***
  *** 406,418 
  len = 1;
  else if ((*s  0xe0) == 0xc0)
  len = 2;
  !   else if ((*s  0xe0) == 0xe0)
  !   len = 3;
  return (len);
}

static int
  ! pg_utf_dsplen(const unsigned char *s)
{
  return 1;   /* XXX 
 fix me! */
}
  --- 431,449 
  len = 1;
  else if ((*s  0xe0) == 0xc0)
  len = 2;
  ! else if ((*s  0xf0) == 0xe0)
  ! len = 3

Re: [HACKERS] executing OS programs from pg

2005-06-03 Thread John Hansen

Look at peter eisentraut's procedural language PL/sh
It's on pgfoundry.

... John

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Saturday, June 04, 2005 5:16 AM
 To: Gevik babakhani
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] executing OS programs from pg 
 
 Gevik babakhani [EMAIL PROTECTED] writes:
  Does anyone know how to execute an OS command from pgsql. I 
 would like 
  to create a trigger that op on firing would run/execute an 
 external program.
 
 Use any of the untrusted PLs to execute system() or the like.
 
 Whether this is a good idea or not is a different question 
 --- there are excellent reasons why it is a *bad* idea to 
 execute outside-the-database actions from within a trigger.  
 Mainly that the actions won't be undone if the transaction 
 later rolls back, and now your database state is inconsistent 
 with outside-the-database state.  See the list archives for 
 many past discussions of this point and safer ways to design 
 your application.
 
 (BTW, this is hardly material for -hackers.)
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] character type value is not padded with spaces

2005-05-23 Thread John Hansen

Ahemm,...

UNICODE DB:

create table t (a char(10));
set client_encoding = iso88591;
insert into t VALUES ('æøå');

select a, octet_length(a),length(a) from t;
 a  | octet_length | length
+--+
 æøå|   13 |  3
(1 row)

This is with 8.0.2.

Just FYI.

... John

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tatsuo Ishii
 Sent: Tuesday, May 24, 2005 8:52 AM
 To: [EMAIL PROTECTED]
 Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [PATCHES] character type value is not padded with spaces
 
 Hackers,
 
 The problem he found is not only existing in Japanese 
 characters but also in any multibyte encodings including 
 UTF-8. For me the patch looks good and I will commit it to 
 7.3, 7.4, 8.0 stables and current if there's no objection.
 --
 Tatsuo Ishii
 
  Character type value including multibyte characters is not 
 padded with 
  spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.
  
 create table t (a char(10));
 insert into t values ('X'); -- X is 2byte character.
  
  I expect that 'X ' is inserted. But 'X' is inserted.
  
 select a, octed_length(a) from t;
  
a   | octet_length 
 ---+--
  X |   10
  
  If padded with spaces, octet_length(a) is 15. This problem 
 is caused 
  that string length is calculated by byte length(VARSIZE) in 
  exprTypmod().
  
  I attache the patch for this problem.
  
  Regards,
  
  --
  Yoshiyuki Asaba
  [EMAIL PROTECTED]
 
 ---(end of 
 broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to 
 [EMAIL PROTECTED])
 
 

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] [PATCHES] character type value is not padded with spaces

2005-05-23 Thread John Hansen

Ahhh... 

 -Original Message-
 From: Tatsuo Ishii [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 24, 2005 9:26 AM
 To: John Hansen
 Cc: [EMAIL PROTECTED]; pgsql-patches@postgresql.org; 
 pgsql-hackers@postgresql.org
 Subject: Re: [PATCHES] character type value is not padded with spaces
 
 I think you need to test with 5 characters, not 3.
 --
 Tatsuo Ishii
 
  Ahemm,...
  
  UNICODE DB:
  
  create table t (a char(10));
  set client_encoding = iso88591;
  insert into t VALUES ('æøå');
  
  select a, octet_length(a),length(a) from t;
   a  | octet_length | length
  +--+
   æøå|   13 |  3
  (1 row)
  
  This is with 8.0.2.
  
  Just FYI.
  
  ... John
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Tatsuo 
   Ishii
   Sent: Tuesday, May 24, 2005 8:52 AM
   To: [EMAIL PROTECTED]
   Cc: pgsql-patches@postgresql.org; pgsql-hackers@postgresql.org
   Subject: Re: [PATCHES] character type value is not padded with 
   spaces
   
   Hackers,
   
   The problem he found is not only existing in Japanese 
 characters but 
   also in any multibyte encodings including UTF-8. For me the patch 
   looks good and I will commit it to 7.3, 7.4, 8.0 stables 
 and current 
   if there's no objection.
   --
   Tatsuo Ishii
   
Character type value including multibyte characters is not
   padded with
spaces. It reproduces at 7.3.x, 7.4.x and 8.0.x.

   create table t (a char(10));
   insert into t values ('X'); -- X is 2byte character.

I expect that 'X ' is inserted. But 'X' is inserted.

   select a, octed_length(a) from t;

  a   | octet_length 
   ---+--
X |   10

If padded with spaces, octet_length(a) is 15. This problem
   is caused
that string length is calculated by byte length(VARSIZE) in 
exprTypmod().

I attache the patch for this problem.

Regards,

--
Yoshiyuki Asaba
[EMAIL PROTECTED]
   
   ---(end of
   broadcast)---
   TIP 2: you can get off all lists at once with the 
 unregister command
   (send unregister YourEmailAddressHere to
   [EMAIL PROTECTED])
   
   
  
 
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Returning the name of a primary key

2005-05-17 Thread John Hansen

Tom, Juan,

Wouldn't this simple SQL do the trick?

CREATE OR REPLACE FUNCTION pk_column(text) RETURNS SETOF text
AS '
SELECT attname::text
FROM pg_class, pg_constraint, pg_attribute
WHERE pg_class.oid = conrelid
AND contype=''p''
AND attrelid = pg_class.oid
AND attnum = ANY (conkey)
AND relname=$1;
'
LANGUAGE sql VOLATILE STRICT; 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Tuesday, May 17, 2005 4:49 AM
 To: Juan Pablo Espino
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Returning the name of a primary key 
 
 Juan Pablo Espino [EMAIL PROTECTED] writes:
  I need to write a function that retrieve the name of at least one 
  table primary key, if it exists.  The only argument passed to the 
  function is the table name.  I have thought something like this:
 
 You need to be searching the list of indexes, not the 
 attributes per se.
 ATExecDropNotNull() might be a useful example.
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 9: the planner will ignore your desire to choose an index 
 scan if your
   joining column's datatypes do not match
 
 

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] SQL_ASCII vs. 7-bit ASCII encodings

2005-05-12 Thread John Hansen

 Personally, I'd like UTF8 to be the default encoding :)  This 
 is the 21st century :D

I concur.

... John

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Patch for collation using ICU

2005-05-10 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Tuesday, May 10, 2005 5:45 PM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

  Tatsuo Ishii wrote:
   Sent: Tuesday, May 10, 2005 12:32 AM
   To: John Hansen
   Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
   pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] Patch for collation using ICU

 -Original Message-
 From: Tatsuo Ishii [mailto:[EMAIL PROTECTED]
 Sent: Sunday, May 08, 2005 11:08 PM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

   I don't buy it. If current conversion tables does the
 right thing,
   why we need to replace. Or if conversion tables are not
 correct, why
   don't you fix it? I think the rule of character
 conversion will not
   change frequently, especially for LATIN languages. Thus
 maintaining
   cost is not too high.

  I never said we need to, but if we're going to implement
 ICU, then we
  might as well go all the way.

 So you admit there's no benefit using ICU for 
 replacing existing 
 conversions?

 Besides ICU does not support all existing conversions, I
   think ICU
 has serious flaw for using conversion. If I understand 
 correctly, ICU uses UNICODE internally to do the 
 conversion. For 
 example, to implement
 SJIS-EUC_JP conversion, ICU first converts SJIS to 
 UNICODE then
 converts UNICODE to EUC_JP. Problem is these conversion
   is not roud
 trip(conversion between SJIS/EUC_JP and UNICODE will 
 lose some 
 information). Thus SJIS-EUC_JP-SJIS conversion using
   ICU does not
 preserve original text.

Just for the record, I fetched a web page encoded in sjis, and 
converted it to euc-jp and back using uconv from ICU 
 3.2, and the 
result is the original is identical to the transformed file.

 uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html
   uconv -f
EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc  diff
   index.html
index.html.sjis

   Not all SJIS/EUC_JP characters have the problem. You might want to
   try: Shift_JIS 0x81e6, 0x879a, 0xfa5b.

   BTW, I got this with ICU 3.2:

   $ uconv -f EUC_JP -t Shift_JIS /tmp/a.txt -o /tmp/b.txt 
 Conversion 
   from Unicode to codepage failed at input byte position 0. 
 Unicode: 
   301c Error: Invalid character found

   The contents of a.txt is 0xa1c1 which is a valid EUC_JP character.

  That actually makes perfect sense, since according to unicode.org's
  database:
  301C ~ WAVE DASH
 This character was encoded to match JIS C 6226-1978 
 1-33 wave 
  dash.
 The JIS standards and some industry practise 
 disagree in mapping.
   - 3030 wavy dash
   - FF5E full width tilde

  In PG FF5E is the mapping currently used. That is obviously wrong 
  (according to the standards), as that is only a 'similar character'.

  Unfortunately, there is no mapping from 301C to shift_jis, as 
  shift_jis doesn't define WAVE DASH.
  In all, I believe this behaviour to be correct according to the 
  standards.

  There'd be nothing to stop us from defining alternative 
 mappings for 
  the cases where we deviate from the standard, but the question is, 
  should we be non-standard?

 You missed the point. EUC_JP 0xa1c1 is a perfect valid data 
 and uconv -f EUC_JP -t Shift_JIS should convert it to 
 Shift_JIS 0x8160 regardless of the internal of uconv.

Studying ICU forther, I found that it works fine, provided you use the
_correct_ charset for the conversion..

a.txt contains 0x81 0x60
uconv -f ibm-943_P130-1999 -t EUC_JP a.txt -o b.txt
b.txt now contains 0xa1 0xc1
uconv -t ibm-943_P130-1999 -f EUC_JP b.txt -o a.txt
a.txt still contains 0x81 0x60

The mapping table you want is ibm-943_P130-1999
Similar, we'd need to find the right euc-jp (and plain jis) mapping,
assuming we want the one that strictly defines JIS X 0208 right?

I trust this to put your fears to rest...

 --
 Tatsuo Ishii

... John

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Patch for collation using ICU

2005-05-09 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Tuesday, May 10, 2005 12:32 AM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

   -Original Message-
   From: Tatsuo Ishii [mailto:[EMAIL PROTECTED]
   Sent: Sunday, May 08, 2005 11:08 PM
   To: John Hansen
   Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
   pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] Patch for collation using ICU

 I don't buy it. If current conversion tables does the
   right thing,
 why we need to replace. Or if conversion tables are not
   correct, why
 don't you fix it? I think the rule of character
   conversion will not
 change frequently, especially for LATIN languages. Thus
   maintaining
 cost is not too high.

I never said we need to, but if we're going to implement
   ICU, then we
might as well go all the way.

   So you admit there's no benefit using ICU for replacing existing 
   conversions?

   Besides ICU does not support all existing conversions, I 
 think ICU 
   has serious flaw for using conversion. If I understand correctly, 
   ICU uses UNICODE internally to do the conversion. For example, to 
   implement
   SJIS-EUC_JP conversion, ICU first converts SJIS to UNICODE then
   converts UNICODE to EUC_JP. Problem is these conversion 
 is not roud 
   trip(conversion between SJIS/EUC_JP and UNICODE will lose some 
   information). Thus SJIS-EUC_JP-SJIS conversion using 
 ICU does not 
   preserve original text.

  Just for the record, I fetched a web page encoded in sjis, and 
  converted it to euc-jp and back using uconv from ICU 3.2, and the 
  result is the original is identical to the transformed file.

   uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html  
 uconv -f 
  EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc  diff 
 index.html 
  index.html.sjis

 Not all SJIS/EUC_JP characters have the problem. You might want to
 try: Shift_JIS 0x81e6, 0x879a, 0xfa5b.

 BTW, I got this with ICU 3.2:

 $ uconv -f EUC_JP -t Shift_JIS /tmp/a.txt -o /tmp/b.txt 
 Conversion from Unicode to codepage failed at input byte 
 position 0. Unicode: 301c Error: Invalid character found

 The contents of a.txt is 0xa1c1 which is a valid EUC_JP character.

That actually makes perfect sense, since according to unicode.org's
database:
301C ~ WAVE DASH
   This character was encoded to match JIS C 6226-1978 1-33 wave
dash.
   The JIS standards and some industry practise disagree in mapping.
 - 3030 wavy dash
 - FF5E full width tilde

In PG FF5E is the mapping currently used. That is obviously wrong
(according to the standards), as that is only a 'similar character'.

Unfortunately, there is no mapping from 301C to shift_jis, as shift_jis
doesn't define WAVE DASH.
In all, I believe this behaviour to be correct according to the
standards.

There'd be nothing to stop us from defining alternative mappings for the
cases where we deviate from the standard, but the question is, should we
be non-standard?

 This makes me nervous in using ICU...
 --
 Tatsuo Ishii

... John

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

Tatsuo Ishii
 Sent: Sunday, May 08, 2005 3:41 PM
 To: John Hansen
 Cc: [EMAIL PROTECTED]; pgman@candle.pha.pa.us; 
 [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

  Alvaro Herrera wrote:
   Sent: Sunday, May 08, 2005 2:49 PM
   To: John Hansen
   Cc: Tatsuo Ishii; pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
   pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] Patch for collation using ICU

   On Sun, May 08, 2005 at 02:07:29PM +1000, John Hansen wrote:
Tatsuo Ishii wrote:

 So Japanese(including ASCII)/UNICODE behavior is
   perfectly correct
 at this moment.

Right, so you _never_ use accented ascii characters in 
 Japanese? 
(like è for example, whose uppercase is È)

   That isn't ASCII.  It's latin1 or some other ASCII extension.

  Point taken...
  But...

  If you want EUC_JP (Japanese + ASCII) then use that as your 
 backend encoding, not UTF-8 (unicode).
  UTF-8 encoded databases are very useful for representing multiple 
  languages in the same database, but this usefulness 
 vanishes if functions like upper/lower doesn't work correctly.

 I'm just curious if Germany/French/Spanish mixed text can be 
 sorted correctly. I think these languages need their own 
 locales even with UNICODE/ICU.

No, they will not sort correctly, for that you still need the locale.

  So optimizing for 3 languages breaks more than a hundred, 
 that's doesn't seem fair!

That is a compromise I'd be willing to agree on. :)

 Why don't you add a GUC variable or some such to control the 
 upper/lower behavior?
 --
 Tatsuo Ishii

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-08 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 3:31 PM
 To: John Hansen
 Cc: [EMAIL PROTECTED]; pgsql-general@postgresql.org; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

  Tatsuo Ishii wrote:
   Sent: Sunday, May 08, 2005 12:01 PM
   To: [EMAIL PROTECTED]
   Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

   We have developed patches which relaxes the character 
 validation so 
   that PostgreSQL accepts invalid characters. It works like this:

  That is just plain 100% wrong!!

  Under no circumstances should there be invalid data in a database.
  And if you're trying to make a database of invalid data, 
 then at least 
  encode it using a valid encoding.

  In fact, I've proposed strengthening the validation 
 routines for UTF-8.

 Actually I myself thought as you are before. Later I found 
 that it was not so good idea. People already have invalid 
 encoded data in their precious database and have very hard 
 time to migrate to newer version of PostgreSQL because of 
 encoding validation.

 Think about this kind of situation:

 There is a table t1(member_id integer primary key, 
 member_name text, address text, phone text, email text). I 
 have to reach each member by either adress, phone or email. 
 Unfortunately some of address field have wrong encoded data. 
 In this case I will use phone or email to reach them. 

 Now I need to upgrade to newer PostgreSQL within 1 day. I 
 know I have to fix wrong encoded field but it will take more 
 than 1 day. So I would like to import the data first then fix 
 wrong encoded field on running database since I can reach 
 members by phone or email even with wrong encoded address field...

Actually would be very simple, create function isvalidutf8(text) in your
preferred language.
C source is available from unicode.org.
Create function converttoutf8(text) using whatever code is required to
transform the _wrong_ encoding (SQL_ASCII - UTF8 for instance) to
utf-8.
Update table set field=converttoutf8(field) where !isvalidutf8(field);

Now sit back and relax while your invalid data is converted to utf-8.

When done, pg_dump the database, upgrade, and reload.

This should take less than a day.

 I saw this kind of situation in the real world and that's why 
 we developed the patches.
 --
 Tatsuo Ishii

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

 The source for ICU 3.2 is 9.8Mb in .tar.gz. PostgreSQL 8.0.2 is 13.2.
 That means the size of the distribution would almost *double* 
 if we bundled ICU.

Ermm,. Don't forget to remove the current charset conversions and locale
support before making your size estimation.

 
 It's probably fine bundling it in the binary distributions 
 (at least we'd probably do it on win32, since not many ppl 
 will have it already there), but bundling the source seems a 
 bit excessive to me.
 
 //Magnus
 
 

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 11:19 PM
 To: John Hansen
 Cc: [EMAIL PROTECTED]; pgman@candle.pha.pa.us; 
 [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

 On Sun, May 08, 2005 at 02:07:29PM +1000, John Hansen wrote:
  Tatsuo Ishii wrote:

   So Japanese(including ASCII)/UNICODE behavior is
 perfectly correct
   at this moment.

  Right, so you _never_ use accented ascii characters in
   Japanese? 
  (like è for example, whose uppercase is È)

 That isn't ASCII.  It's latin1 or some other ASCII extension.

Point taken...
But...

If you want EUC_JP (Japanese + ASCII) then use that as your
   backend encoding, not UTF-8 (unicode).
UTF-8 encoded databases are very useful for 
 representing multiple 
languages in the same database, but this usefulness
   vanishes if functions like upper/lower doesn't work correctly.

   I'm just curious if Germany/French/Spanish mixed text can 
 be sorted 
   correctly. I think these languages need their own locales 
 even with 
   UNICODE/ICU.

  No, they will not sort correctly, for that you still need 
 the locale.

 I'm confused. I thought the ICU patches is intended for using 
 on broken locale platforms?

Initially yes, but why duplicate code?
What I meant was, that they will not sort correctly using the C locale.
Locale _name_ needs to be known to ICU for it to sort correctly.

 --
 Tatsuo Ishii

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

Tom Lane wrote:
 Sent: Monday, May 09, 2005 2:47 AM
 To: Palle Girgensohn
 Cc: Tatsuo Ishii; John Hansen; [EMAIL PROTECTED]; 
 pgman@candle.pha.pa.us; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU 

 Palle Girgensohn [EMAIL PROTECTED] writes:
  I'm confused. I thought the ICU patches is intended for using on 
  broken locale platforms?

  It will sort correctly in *one* locale, using ICU. You still cannot 
  mix different locales in the same database cluster, the collation 
  locale is still fixed at initdb time, unfortunately.

 I thought the point of using ICU was to be able to dig out 
 from under that restriction?  It's a bit of a large pill to 
 swallow if we will still have to throw it away someday to 
 become SQL spec compliant.

That is not a limitation of ICU but of postgresql.
I don't know what the specs say, but imagine something like:
SELECT foo FROM bar ORDER BY foo WITH LOCALE 'en_US', foobar WITH LOCALE
'jp_JP';

Which would be less difficult to implement using ICU.

   regards, tom lane

... John

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 11:08 PM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

   I don't buy it. If current conversion tables does the 
 right thing, 
   why we need to replace. Or if conversion tables are not 
 correct, why 
   don't you fix it? I think the rule of character 
 conversion will not 
   change frequently, especially for LATIN languages. Thus 
 maintaining 
   cost is not too high.

  I never said we need to, but if we're going to implement 
 ICU, then we 
  might as well go all the way.

 So you admit there's no benefit using ICU for replacing 
 existing conversions?

 Besides ICU does not support all existing conversions, I 
 think ICU has serious flaw for using conversion. If I 
 understand correctly, ICU uses UNICODE internally to do the 
 conversion. For example, to implement
 SJIS-EUC_JP conversion, ICU first converts SJIS to UNICODE then
 converts UNICODE to EUC_JP. Problem is these conversion is 
 not roud trip(conversion between SJIS/EUC_JP and UNICODE will 
 lose some information). Thus SJIS-EUC_JP-SJIS conversion 
 using ICU does not preserve original text.

Could you please send me a sample text as an attachment encoded in SJIS
where this would happen?

 --
 Tatsuo Ishii

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Patch for collation using ICU

2005-05-08 Thread John Hansen

 -Original Message-
 From: Tatsuo Ishii [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, May 08, 2005 11:08 PM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

   I don't buy it. If current conversion tables does the 
 right thing, 
   why we need to replace. Or if conversion tables are not 
 correct, why 
   don't you fix it? I think the rule of character 
 conversion will not 
   change frequently, especially for LATIN languages. Thus 
 maintaining 
   cost is not too high.

  I never said we need to, but if we're going to implement 
 ICU, then we 
  might as well go all the way.

 So you admit there's no benefit using ICU for replacing 
 existing conversions?

 Besides ICU does not support all existing conversions, I 
 think ICU has serious flaw for using conversion. If I 
 understand correctly, ICU uses UNICODE internally to do the 
 conversion. For example, to implement
 SJIS-EUC_JP conversion, ICU first converts SJIS to UNICODE then
 converts UNICODE to EUC_JP. Problem is these conversion is 
 not roud trip(conversion between SJIS/EUC_JP and UNICODE will 
 lose some information). Thus SJIS-EUC_JP-SJIS conversion 
 using ICU does not preserve original text.

Just for the record, I fetched a web page encoded in sjis, and converted
it to euc-jp and back using uconv from ICU 3.2, and the result is the
original is identical to the transformed file.

 uconv -f Shift_JIS -t EUC-JP -o index.html.euc index.html
 uconv -f EUC-JP -t Shift_JIS -o index.html.sjis index.html.euc
 diff index.html index.html.sjis

... John

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Where'd you get the licence from?
None of that is in the licence I'm reading!

(http://www-306.ibm.com/software/globalization/icu/index.jsp)
(http://www-306.ibm.com/software/globalization/icu/license.jsp) 

... John

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
 Sent: Saturday, May 07, 2005 3:17 PM
 To: Bruce Momjian
 Cc: Palle Girgensohn; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU 
 
 Bruce Momjian pgman@candle.pha.pa.us writes:
  Tom Lane wrote:
  Not until ICU is released under a BSD license ...
 
  Well, readline isn't BSD either, but we use it.  It is any 
 different?
 
 Did you read the license?  Some of the more troubling bits:
 
 : It is the understanding of INTERNATIONAL BUSINESS MACHINES 
 CORPORATION
 : that the purpose for which its publications are being reproduced is
 : accurate and true as stated in your attached request.
 
 (er, which attached request would that be?)
 
 : Permission to quote from or reprint IBM publications is 
 limited to the
 : purpose and quantities originally requested and must not be 
 construed as
 : a blanket license to use the material for other purposes or 
 to reprint
 : other IBM copyrighted material.
 
 : IBM reserves the right to withdraw permission to reproduce 
 copyrighted
 : material whenever, in its discretion, it feels that the privilege of
 : reproducing its material is being used in a way detrimental to its
 : interest or the above instructions are not being followed 
 properly to
 : protect its copyright.
 
 : IBM may have patents or pending patent applications covering subject
 : matter in this document. The furnishing of this document 
 does not give
 : you any license to these patents. You can send license inquiries, in
 : writing, to:
 
 : For license inquiries regarding double-byte (DBCS) 
 information, contact
 : the IBM Intellectual Property Department in your country or send
 : inquiries, in writing, to:
 
   regards, tom lane
 
 ---(end of 
 broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to 
 [EMAIL PROTECTED])
 
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 
 I use this patch in production on one FreeBSD 4.10 server at 
 the moment. 
 With the latest version, I've had no problems. Logging is 
 swithed on for 
 now, and it shows no signs of ICU complaining. I'd like more 
 reports on 
 Linux, though.

I currently use this on gentoo with ICU3.2 unmasked.

Works a dream, even with locale C and UNICODE database.

Small test:

createdb --encoding UNICODE --locale C test
psql test
set client_encoding=iso88591;
CREATE TABLE test (t text);
INSERT INTO test (t) VALUES ('æøå');
set client_encoding=unicode;
INSERT INTO test (t) SELECT upper(t) FROM test;
set client_encoding=iso88591;
SELECT * FROM test;
  t
-
 æøå
 ÆØÅ
(2 rows)

Just as I'd expect, as upper/lower/initcap are locale independent for these 
characters.


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Errm,... initdb --encoding UNICODE --locale C

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of John Hansen
 Sent: Saturday, May 07, 2005 10:23 PM
 To: Palle Girgensohn; Bruce Momjian
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

  I use this patch in production on one FreeBSD 4.10 server at the 
  moment.
  With the latest version, I've had no problems. Logging is 
 swithed on 
  for now, and it shows no signs of ICU complaining. I'd like more 
  reports on Linux, though.

 I currently use this on gentoo with ICU3.2 unmasked.

 Works a dream, even with locale C and UNICODE database.

 Small test:

 createdb --encoding UNICODE --locale C test psql test set 
 client_encoding=iso88591; CREATE TABLE test (t text); INSERT 
 INTO test (t) VALUES ('æøå'); set client_encoding=unicode; 
 INSERT INTO test (t) SELECT upper(t) FROM test; set 
 client_encoding=iso88591; SELECT * FROM test;
   t
 -
  æøå
  ÆØÅ
 (2 rows)

 Just as I'd expect, as upper/lower/initcap are locale 
 independent for these characters.

 ---(end of 
 broadcast)---
 TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Bruce Momjian wrote:
 Palle Girgensohn wrote:
  
   Is this patch ready for application?
  
  I don't think so, not quite. I have not had any positive 
 reports from 
  linux users, this is only tested in a FreeBSD environment. 
 I'd say it 
  needs some more testing.
 
 OK.
 
  Also, apparently, ICU is installed by default in many linux 
  distributions, and usually it is version 2.8. Some linux users have 
  asked me if there are plans for a patch that works with ICU 2.8. 
  That's probably a good idea. IBM and the ICU folks seem to consider 
  3.2 to be the stable version, older versions are hard to 
 find on their 
  sites, but most linux distributers seem to consider it too bleeding 
  edge, even gentoo. I don't know why they don't agree.
 
 Good point.  Why would linux folks need ICU?  Doesn't their 
 OS support encodings natively?  I am particularly excited 
 about this for OSs that don't have such encodings, like UTF8 
 support for Win32.
 
 Because ICU will not be used unless enabled by configure, it 
 seems we are fine with only supporting the newest version.  
 Do Linux users need to use ICU for any reason?

Yes, because on many linux platforms locale support is broken.
Also, ICU enables full unicode support, particularly in multi-language
situations where locale is C, and makes upper/lower/initcap work as
expected, except where it depends on locale information.

There are also many other useful things in ICU that could be
implemented. Transliteration, and break-iterators for example.
Break-iteration particularly interresting for converting a text to a
list of words. Another is it's builtin substring searches.

 
   I do have a few questions:
  
   Why don't you use the lc_ctype_is_c() part of this test?
  
 if (pg_database_encoding_max_length()  1  !lc_ctype_is_c())
  
  Um, well, I didn't think about that. :)  What would be the 
 locale in 
  this case? c_C.UTF-8? ;)  Hmm, it is possible to have 
 CTYPE=C and use 
  a wide encoding, indeed. Then the strings will be handled 
 like byte-wide chars.
  Yeah, it's a bug. I'll fix it! Thanks.
 
 The additional test is more of an optmization, and it fixes a 
 problem with some OSs that have processing problems with UTF8 
 when the locale is supposed to be turned off, like in C.  I 
 realize ICU might be fine with it but the optimization still 
 is an issue.

That the locale is supposed to be turned off, doesn't mean it shouldn't
use ICU.
ICU is more than just locales.

   Why is so much code added, for example, in lower()?  The existing 
   multibyte code is much smaller, and lots of code is added 
 in other 
   places too.
  
  ICU uses UTF-16 internally, so all strings must be 
 converted from the 
  database encoding to UTF-16. Since that means the strings 
 need to be 
  copied, I took the same approach as in 
 varlena.c:varstr_cmp(), where 
  small strings use the heap and only larger strings use a palloc. 
  Comments in varstr_cmp about performance made me use that approach.
 
 Oh, interesting.   I think you need to create new functions that
 factor out that common code so the patch is smaller and 
 easier to maintain.
 
  Also, in the latest patch, I also added checks and logging 
 for *every* 
  status returned from ICU. I hope this will help debugging 
 on debian, 
  where previous version didn't work. That excessive status 
 checking is 
  hardly be necessary once the stuff is better tested.
  
  I think the string copying and heap/palloc choices stands 
 for most of 
  the code bloat, together with the excessive status checking 
 and logging.
 
 OK, move that into some common functions and I think it will 
 be better.
 
   Why do you need to add a mapping of encoding names from 
 iana to our 
   names?
  
  This was already answered by John Hansen... There's an old 
 thread here 
  about the choice of the name UNICODE to describe an 
 encoding, which 
  it doesn't. There's half a dozen unicode based encodings... 
 UTF-8 is 
  used by postgresql, that would have been a better name... Similarly 
  for most other encodings, really. ICU expect a setlocale(3) string 
  (i.e. IANA). PostgreSQL can't provide it, so a mapping 
 table is required.
 
 We have depricated UNICODE in 8.1 in favor of UTF8 (no dash). 
  Does that help?
 
  I use this patch in production on one FreeBSD 4.10 server 
 at the moment. 
  With the latest version, I've had no problems. Logging is 
 swithed on 
  for now, and it shows no signs of ICU complaining. I'd like more 
  reports on Linux, though.
 
 OK, I certainly would like this all done for 8.1 which should 
 have feature freeze on July 1.
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073
 
 ---(end of 
 broadcast)---
 TIP 4: Don't 'kill -9

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 --On lördag, maj 07, 2005 22.53.46 +1000 John Hansen 
 [EMAIL PROTECTED]
 wrote:
 
  Errm,... initdb --encoding UNICODE --locale C
 
 You mean that ICU *shall* be used even for the C locale, and 
 not as Bruce suggested here:

Yes, that's exactly what I mean.

 
  I do have a few questions:
 
  Why don't you use the lc_ctype_is_c() part of this test?
 
 if (pg_database_encoding_max_length()  1  !lc_ctype_is_c())
 
  Um, well, I didn't think about that. :)  What would be the 
 locale in 
  this case? c_C.UTF-8? ;)  Hmm, it is possible to have 
 CTYPE=C and use 
  a wide encoding, indeed. Then the strings will be handled 
 like byte-wide chars.
  Yeah, it's a bug. I'll fix it! Thanks.
 
 John disagrees here, and I'm obliged to agree. Using the C 
 locale, one will expect C collation, but upper/lower is 
 better off still using ICU. Hence, the above stuff is *not* a 
 bug. Do we agree?
 
 /Palle
 
 
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of 
 John Hansen
  Sent: Saturday, May 07, 2005 10:23 PM
  To: Palle Girgensohn; Bruce Momjian
  Cc: pgsql-hackers@postgresql.org
  Subject: Re: [HACKERS] Patch for collation using ICU
 
  
   I use this patch in production on one FreeBSD 4.10 server at the 
   moment.
   With the latest version, I've had no problems. Logging is
  swithed on
   for now, and it shows no signs of ICU complaining. I'd like more 
   reports on Linux, though.
 
  I currently use this on gentoo with ICU3.2 unmasked.
 
  Works a dream, even with locale C and UNICODE database.
 
  Small test:
 
  createdb --encoding UNICODE --locale C test psql test set 
  client_encoding=iso88591; CREATE TABLE test (t text); INSERT INTO 
  test (t) VALUES ('æøå'); set client_encoding=unicode; INSERT INTO 
  test (t) SELECT upper(t) FROM test; set client_encoding=iso88591; 
  SELECT * FROM test;
t
  -
   æøå
   ÆØÅ
  (2 rows)
 
  Just as I'd expect, as upper/lower/initcap are locale 
 independent for 
  these characters.
 
 
  ---(end of
  broadcast)---
  TIP 5: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/docs/faq
 
 
 
 
 
 
 
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Btw, I had been planning to propose replacing every single one of the built in 
charset conversion functions with calls to ICU (thus making pg _depend_ on 
ICU), as this would seem like a cleaner solution than for us to maintain our 
own conversion tables.

ICU also has a fair few conversions that we do not have at present.

Any thoughts?

... John

 -Original Message-
 From: John Hansen 
 Sent: Saturday, May 07, 2005 11:09 PM
 To: 'Palle Girgensohn'; 'Bruce Momjian'
 Cc: 'pgsql-hackers@postgresql.org'
 Subject: RE: [HACKERS] Patch for collation using ICU
 
  --On lördag, maj 07, 2005 22.53.46 +1000 John Hansen 
  [EMAIL PROTECTED]
  wrote:
  
   Errm,... initdb --encoding UNICODE --locale C
  
  You mean that ICU *shall* be used even for the C locale, and not as 
  Bruce suggested here:
 
 Yes, that's exactly what I mean.
 
  
   I do have a few questions:
  
   Why don't you use the lc_ctype_is_c() part of this test?
  
if (pg_database_encoding_max_length()  1  
 !lc_ctype_is_c())
  
   Um, well, I didn't think about that. :)  What would be the
  locale in
   this case? c_C.UTF-8? ;)  Hmm, it is possible to have
  CTYPE=C and use
   a wide encoding, indeed. Then the strings will be handled
  like byte-wide chars.
   Yeah, it's a bug. I'll fix it! Thanks.
  
  John disagrees here, and I'm obliged to agree. Using the C 
 locale, one 
  will expect C collation, but upper/lower is better off still using 
  ICU. Hence, the above stuff is *not* a bug. Do we agree?
  
  /Palle
  
  
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of
  John Hansen
   Sent: Saturday, May 07, 2005 10:23 PM
   To: Palle Girgensohn; Bruce Momjian
   Cc: pgsql-hackers@postgresql.org
   Subject: Re: [HACKERS] Patch for collation using ICU
  
   
I use this patch in production on one FreeBSD 4.10 
 server at the 
moment.
With the latest version, I've had no problems. Logging is
   swithed on
for now, and it shows no signs of ICU complaining. I'd 
 like more 
reports on Linux, though.
  
   I currently use this on gentoo with ICU3.2 unmasked.
  
   Works a dream, even with locale C and UNICODE database.
  
   Small test:
  
   createdb --encoding UNICODE --locale C test psql test set 
   client_encoding=iso88591; CREATE TABLE test (t text); 
 INSERT INTO 
   test (t) VALUES ('æøå'); set client_encoding=unicode; 
 INSERT INTO 
   test (t) SELECT upper(t) FROM test; set 
 client_encoding=iso88591; 
   SELECT * FROM test;
 t
   -
æøå
ÆØÅ
   (2 rows)
  
   Just as I'd expect, as upper/lower/initcap are locale
  independent for
   these characters.
  
  
   ---(end of
   broadcast)---
   TIP 5: Have you checked our extensive FAQ?
  
  http://www.postgresql.org/docs/faq
  
  
  
  
  
  
  
  

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Palle Girgensohn wrote:
 I'm aware of that. It might help for unicode, but there are a 
 bunch of 
 other encodings. IANA has decided that utf-8 has *no* 
 aliases, hence only 
 utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is 
 fogiving, I don't remember/know, but I think we need the mappings, 
 unfortunately.
 

Here is the list of encoding names and aliases the ICU accepts as of
3.2:
(it's a bit long...)

UTF-8 ibm-1208 ibm-1209 ibm-5304 ibm-5305 windows-65001 cp1208
UTF-16 ISO-10646-UCS-2 unicode csUnicode ucs-2
UTF-16BE x-utf-16be ibm-1200 ibm-1201 ibm-5297 ibm-13488 ibm-17584
windows-1201 cp1200 cp1201 UTF16_BigEndian
UTF-16LE x-utf-16le ibm-1202 ibm-13490 ibm-17586 UTF16_LittleEndian
windows-1200
UTF-32 ISO-10646-UCS-4 csUCS4 ucs-4
UTF-32BE UTF32_BigEndian ibm-1232 ibm-1233
UTF-32LE UTF32_LittleEndian ibm-1234
UTF16_PlatformEndian
UTF16_OppositeEndian
UTF32_PlatformEndian
UTF32_OppositeEndian
UTF-7 windows-65000
IMAP-mailbox-name
SCSU
BOCU-1 csBOCU-1
CESU-8
ISO-8859-1 ibm-819 IBM819 cp819 latin1 8859_1 csISOLatin1 iso-ir-100
ISO_8859-1:1987 l1 819
US-ASCII ASCII ANSI_X3.4-1968 ANSI_X3.4-1986 ISO_646.irv:1991
iso_646.irv:1983 ISO646-US us csASCII iso-ir-6 cp367 ascii7 646
windows-20127
gb18030 ibm-1392 windows-54936
ibm-367_P100-1995 ibm-367 IBM367
ibm-912_P100-1995 ibm-912 iso-8859-2 ISO_8859-2:1987 latin2 csISOLatin2
iso-ir-101 l2 8859_2 cp912 912 windows-28592
ibm-913_P100-2000 ibm-913 iso-8859-3 ISO_8859-3:1988 latin3 csISOLatin3
iso-ir-109 l3 8859_3 cp913 913 windows-28593
ibm-914_P100-1995 ibm-914 iso-8859-4 latin4 csISOLatin4 iso-ir-110
ISO_8859-4:1988 l4 8859_4 cp914 914 windows-28594
ibm-915_P100-1995 ibm-915 iso-8859-5 cyrillic csISOLatinCyrillic
iso-ir-144 ISO_8859-5:1988 8859_5 cp915 915 windows-28595
ibm-1089_P100-1995 ibm-1089 iso-8859-6 arabic csISOLatinArabic
iso-ir-127 ISO_8859-6:1987 ECMA-114 ASMO-708 8859_6 cp1089 1089
windows-28596 ISO-8859-6-I ISO-8859-6-E
ibm-813_P100-1995 ibm-813 iso-8859-7 greek greek8 ELOT_928 ECMA-118
csISOLatinGreek iso-ir-126 ISO_8859-7:1987 8859_7 cp813 813
windows-28597
ibm-916_P100-1995 ibm-916 iso-8859-8 hebrew csISOLatinHebrew iso-ir-138
ISO_8859-8:1988 ISO-8859-8-I ISO-8859-8-E 8859_8 cp916 916 windows-28598
ibm-920_P100-1995 ibm-920 iso-8859-9 latin5 csISOLatin5 iso-ir-148
ISO_8859-9:1989 l5 8859_9 cp920 920 windows-28599 ECMA-128
ibm-921_P100-1995 ibm-921 iso-8859-13 8859_13 cp921 921
ibm-923_P100-1998 ibm-923 iso-8859-15 Latin-9 l9 8859_15 latin0
csisolatin0 csisolatin9 iso8859_15_fdis cp923 923 windows-28605
ibm-942_P12A-1999 ibm-942 ibm-932 cp932 shift_jis78 sjis78
ibm-942_VSUB_VPUA ibm-932_VSUB_VPUA
ibm-943_P15A-2003 ibm-943 Shift_JIS MS_Kanji csShiftJIS windows-31j
csWindows31J x-sjis x-ms-cp932 cp932 windows-932 cp943c IBM-943C ms932
pck sjis ibm-943_VSUB_VPUA
ibm-943_P130-1999 ibm-943 Shift_JIS cp943 943 ibm-943_VASCII_VSUB_VPUA
ibm-33722_P12A-1999 ibm-33722 ibm-5050 EUC-JP
Extended_UNIX_Code_Packed_Format_for_Japanese csEUCPkdFmtJapanese
X-EUC-JP eucjis windows-51932 ibm-33722_VPUA IBM-eucJP
ibm-33722_P120-1999 ibm-33722 ibm-5050 cp33722 33722
ibm-33722_VASCII_VPUA
ibm-954_P101-2000 ibm-954 EUC-JP
ibm-1373_P100-2002 ibm-1373 windows-950
windows-950-2000 Big5 csBig5 windows-950 x-big5
ibm-950_P110-1999 ibm-950 cp950 950
macos-2566-10.2 Big5-HKSCS big5hk HKSCS-BIG5
ibm-1375_P100-2003 ibm-1375 Big5-HKSCS
ibm-1386_P100-2002 ibm-1386 cp1386 windows-936 ibm-1386_VSUB_VPUA
windows-936-2000 GBK CP936 MS936 windows-936
ibm-1383_P110-1999 ibm-1383 GB2312 csGB2312 EUC-CN ibm-eucCN hp15CN
cp1383 1383 ibm-1383_VPUA
ibm-5478_P100-1995 ibm-5478 GB_2312-80 chinese iso-ir-58 csISO58GB231280
gb2312-1980 GB2312.1980-0
ibm-964_P110-1999 ibm-964 EUC-TW ibm-eucTW cns11643 cp964 964
ibm-964_VPUA
ibm-949_P110-1999 ibm-949 cp949 949 ibm-949_VASCII_VSUB_VPUA
ibm-949_P11A-1999 ibm-949 cp949c ibm-949_VSUB_VPUA
ibm-970_P110-1995 ibm-970 EUC-KR KS_C_5601-1987 windows-51949 csEUCKR
ibm-eucKR KSC_5601 5601 ibm-970_VPUA
ibm-971_P100-1995 ibm-971 ibm-971_VPUA
ibm-1363_P11B-1998 ibm-1363 KS_C_5601-1987 KS_C_5601-1989 KSC_5601
csKSC56011987 korean iso-ir-149 5601 cp1363 ksc windows-949
ibm-1363_VSUB_VPUA
ibm-1363_P110-1997 ibm-1363 ibm-1363_VASCII_VSUB_VPUA
windows-949-2000 windows-949 KS_C_5601-1987 KS_C_5601-1989 KSC_5601
csKSC56011987 korean iso-ir-149 ms949
ibm-1162_P100-1999 ibm-1162
ibm-874_P100-1995 ibm-874 ibm-9066 cp874 TIS-620 tis620.2533 eucTH
cp9066
windows-874-2000 TIS-620 windows-874 MS874
ibm-437_P100-1995 ibm-437 IBM437 cp437 437 csPC8CodePage437 windows-437
ibm-850_P100-1995 ibm-850 IBM850 cp850 850 csPC850Multilingual
windows-850
ibm-851_P100-1995 ibm-851 IBM851 cp851 851 csPC851
ibm-852_P100-1995 ibm-852 IBM852 cp852 852 csPCp852 windows-852
ibm-855_P100-1995 ibm-855 IBM855 cp855 855 csIBM855 csPCp855
ibm-856_P100-1995 ibm-856 cp856 856
ibm-857_P100-1995 ibm-857 IBM857 cp857 857 csIBM857 windows-857
ibm-858_P100-1997 ibm-858 IBM00858 CCSID00858 CP00858
PC-Multilingual-850+euro cp858
ibm-860_P100-1995 ibm-860 IBM860 cp860 860 csIBM860

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 -Original Message-
 From: Palle Girgensohn [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, May 07, 2005 11:30 PM
 To: John Hansen; Bruce Momjian
 Cc: pgsql-hackers@postgresql.org
 Subject: RE: [HACKERS] Patch for collation using ICU

 --On lördag, maj 07, 2005 23.25.15 +1000 John Hansen 
 [EMAIL PROTECTED]
 wrote:

  Palle Girgensohn wrote:
  I'm aware of that. It might help for unicode, but there 
 are a bunch 
  of other encodings. IANA has decided that utf-8 has *no* aliases, 
  hence only
  utf-8 (with dash, but case insensitve) is accepted. Perhaps ICU is 
  fogiving, I don't remember/know, but I think we need the mappings, 
  unfortunately.

  Here is the list of encoding names and aliases the ICU accepts as of
  3.2:
  (it's a bit long...)

  UTF-8 ibm-1208 ibm-1209 ibm-5304 ibm-5305 windows-65001 cp1208

 No UTF8 in there. I think that's good, charset aliases are a hassle.

Yup! :)

 /Palle

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 -Original Message-
 From: Palle Girgensohn [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, May 07, 2005 11:33 PM
 To: John Hansen; Bruce Momjian
 Cc: pgsql-hackers@postgresql.org
 Subject: RE: [HACKERS] Patch for collation using ICU

 --On lördag, maj 07, 2005 22.22.52 +1000 John Hansen 
 [EMAIL PROTECTED]
 wrote:

  I use this patch in production on one FreeBSD 4.10 server at the 
  moment.
  With the latest version, I've had no problems. Logging is 
 swithed on 
  for now, and it shows no signs of ICU complaining. I'd like more 
  reports on Linux, though.

  I currently use this on gentoo with ICU3.2 unmasked.

  Works a dream, even with locale C and UNICODE database.

 Ah, good to hear, John. I beleive your report about linux is 
 what's keeping this back. Did you also manage to get it 
 running on Debian?

Not without ICU3.2

 /Palle

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 Did you try the latest patch? Maybe it will help, and if not, it will
 (hopefully) give a lot more informative error messages.

No, and I got rid of my debian boxes @ home.
The patch required a certain amount of modifications too, to even
compile with 2.8.

So I guess it's a valid question to ask: it it worth supporting 2.8?
It is of course an option to bundle icu 3.2 with pg!


... John

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 

 -Original Message-
 From: Andrew Dunstan [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, May 07, 2005 11:39 PM
 To: John Hansen
 Cc: Palle Girgensohn; Bruce Momjian; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU
 
 
 
 John Hansen wrote:
 
 Here is the list of encoding names and aliases the ICU accepts as of
 3.2:
 (it's a bit long...)
 
 UTF-8 ibm-1208 ibm-1209 ibm-5304 ibm-5305 windows-65001 cp1208
 UTF-16 ISO-10646-UCS-2 unicode csUnicode ucs-2
 
   
 
 [snip]
 
 Don't we use unicode as an alias for UTF-8 ?

Yes, unfortunately!

 
 cheers
 
 andrew
 
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Bruce Momjian wrote:
 
 There are two reasons for that optimization --- first, some 
 locale support is broken and Unicode encoding with a C locale 
 crashes (not an issue for ICU), and second, it is an 
 optimization for languages like Japanese that want to use 
 unicode, but don't need a locale because upper/lower means 
 nothing in those character sets.

No, upper/lower means nothing in those languages, so why would you need
to optimize upper/lower if they're not used??
And if they are, it's obviously because the text contains characters
from other languages (probably english) and as such they should behave
correctly.

Did I mention that for japanese and the like, ICU would also offer
transliteration...

 
 So, the first issue doesn't apply for ICU, and the second 
 might not depending on what characters you are using in the 
 Unicode character set.
 
 I guess I am little confused how ICU can do upper() when the 
 locale is C.  What is it using to determine A is upper for a? 
  Am I confused?

Simple, UNICODE basically consist of a table of characters
(http://www.unicode.org/Public/UNIDATA/UnicodeData.txt)

Excerpt:

0041;LATIN CAPITAL LETTER A;Lu;0;L;N0061;
...
0061;LATIN SMALL LETTER A;Ll;0;L;N;;;0041;;0041

From this you can see, that for 0041, which is capital letter A, there
is a mapping to it's lowercase counterpart, 0061
Likewise, there is a mapping for 0061 which says it's uppercase
counterpart is 0041.
There is also SpecialCasing.txt which covers those mappings that haven't
got a 1-1 mapping, such as the german SS.

These mappings are fixed, independent of locale, only a few cases from
specialcasing.txt depend on locale/context.



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 It seems 3.2 has much more support and bug fixes, I'm not 
 sure if we should really consider 2.8?

As I said, probably not worth the effort.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Bruce Momjian wrote:
 Palle Girgensohn wrote:
  
  --On l?rdag, maj 07, 2005 23.15.29 +1000 John Hansen 
  [EMAIL PROTECTED]
  wrote:
  
   Btw, I had been planning to propose replacing every single one of 
   the built in charset conversion functions with calls to ICU (thus 
   making pg _depend_ on ICU), as this would seem like a cleaner 
   solution than for us to maintain our own conversion tables.
  
   ICU also has a fair few conversions that we do not have 
 at present.
 
 That is a much larger issue, similar to our shipping our own 
 timezone database.  What does it buy us?
   
   o  Do we ship it in our tarball?
   o  Is the license compatible?
   o  Does it remove utils/mb conversions?
   o  Does it allow us to index LIKE (next high char)?
   o  Does it allow us to support multiple encodings in
  a single database easier?
   o  performance?
 
  I just had a similar though. And why use ICU only for 
 multibyte charsets? 
  If I use LATIN1, I still expect upper('?') = SS, and I 
 don't get it... 
  Same for the Turkish example.
 
 We assume the native toupper() can handle single-byte 
 character encodings.  We use towupper() only for wide character sets.

That assumption is wrong,...

Encoding latin1
Locale  de*

Select Upper('ß'); (lowercase german SS)
Should return SS, but returns ß

... John



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Tom Lane wrote:
 John Hansen [EMAIL PROTECTED] writes:
  Where'd you get the licence from?
 
 It was the first thing I came across in their docs:
 
 http://icu.sourceforge.net/userguide/intro.html
 
 Looking more closely, it may be that this license is only 
 intended to apply to the documentation and not the code ... 
 though free code with un-free documentation isn't real useful.
 

Right, it seems to apply only to the resources found on sourceforge.

   regards, tom lane
 
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Tom Lane wrote:
 John Hansen [EMAIL PROTECTED] writes:
  Btw, I had been planning to propose replacing every single 
 one of the 
  built in charset conversion functions with calls to ICU 
 (thus making 
  pg _depend_ on ICU),
 
 I find that fairly unacceptable ... especially given the 
 licensing questions, but in any case.

The licencing seems pretty clear to me.
http://www-306.ibm.com/software/globalization/icu/license.jsp

 
 It might be OK to rip out the existing conversion support and 
 say that *if* you want encoding conversion, you have to use 
 ICU.  But I don't want to be told you cannot build PG without 
 ICU period.

Right, that could be done, but I think the issue at heart is _are_ 
we going to use it at all, and if so, locale support would certainly 
benefit from going that way as well.

 
 The 3.2 vs 2.8 business is disturbing also; specifically, I 
 don't think we get to require 3.2 on a platform where 2.8 is 
 installed.

There seems to be nothing in the ICU licence that would prevent us from
bundling it.
This would solve both the 3.2 vs 2.8 problems, and would remove the
'dependency'.

 People just aren't going to hold still for that, even 
 assuming that ICU supports installing both versions at once, 
 which isn't clear to me at the moment ...

There's no problems with having both installed.
I did that on debian to get the patch going.
Tho, bundling it seems cleaner to me.

 
   regards, tom lane
 
 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

 I don't buy it. If current conversion tables does the right 
 thing, why we need to replace. Or if conversion tables are 
 not correct, why don't you fix it? I think the rule of 
 character conversion will not change frequently, especially 
 for LATIN languages. Thus maintaining cost is not too high.

I never said we need to, but if we're going to implement ICU,
then we might as well go all the way.

... John


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 10:09 AM
 To: John Hansen
 Cc: pgman@candle.pha.pa.us; [EMAIL PROTECTED]; 
 pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

  Bruce Momjian wrote:

   There are two reasons for that optimization --- first, 
 some locale 
   support is broken and Unicode encoding with a C locale 
 crashes (not 
   an issue for ICU), and second, it is an optimization for 
 languages 
   like Japanese that want to use unicode, but don't need a locale 
   because upper/lower means nothing in those character sets.

  No, upper/lower means nothing in those languages, so why would you 
  need to optimize upper/lower if they're not used??
  And if they are, it's obviously because the text contains 
 characters 
  from other languages (probably english) and as such they 
 should behave 
  correctly.

 Yes, Japanese (and probably Chinese and Korean) languages 
 include ASCII character. More precisely ASCII is part of Japanese
 encodings(LATIN1 is not, however). And we have no problem at 
 all with glibc/C locale. See below(unitest is an UNICODE database).

 unitest=# create table t1(t text);
 CREATE TABLE
 unitest=# \encoding EUC_JP
 unitest=# insert into t1 values('abc');
 INSERT 1842628 1
 unitest=# select upper(t) from t1;
upper   
 ---
  ABC
 (1 row)

 So Japanese(including ASCII)/UNICODE behavior is perfectly 
 correct at this moment. 

Right, so you _never_ use accented ascii characters in Japanese? 
(like  for example, whose uppercase is )

 So I strongly object removing that optimization.

I'm guessing this would call for a vote then, since if implementing ICU, then
I'd have to object to leaving it in.

Changing the bahaviour of ICU doesn't seem right. Changing the behaviour of pg, 
so that it works as it should when using unicode, seems the right solution to 
me.

 --
 Tatsuo Ishii

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-07 Thread John Hansen

Tatsuo Ishii wrote:
 Sent: Sunday, May 08, 2005 12:01 PM
 To: [EMAIL PROTECTED]
 Cc: pgsql-general@postgresql.org; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

 We have developed patches which relaxes the character 
 validation so that PostgreSQL accepts invalid characters. It 
 works like this:

That is just plain 100% wrong!!
Under no circumstances should there be invalid data in a database.
And if you're trying to make a database of invalid data, then at 
least encode it using a valid encoding.

In fact, I've proposed strengthening the validation routines for UTF-8.

... John

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] Invalid unicode in COPY problem

2005-05-07 Thread John Hansen

Madison Kelly wrote:
Under most circumstances I would agree with you completely. In my 
 case though I have to decide between risking a loss of a 
 user's data or 
 attempt to store the file name in some manner that would 
 return the same 
 name used by the file system.
 
The user (or one of his/her users in the case of an admin) may be 
 completely unaware of the file name being an invalid unicode 
 name. The 
 file itself though may still be quite valid and contain information 
 worthy of backing up. I could notify the user/admin that the 
 name is not 
 valid but there is no way I could rely on the name being 
 changed. Given 
 the choices, I would prefer to attempt to store/use the file 
 name with 
 the invalid unicode character than simply ignore the file.
 
Is there a way to store the name in raw binary? If so, 
 would this not 
 be safe because to postgresql it should no longer matter what 
 data is or 
 represents, right? Maybe there is a third option I am not yet 
 concidering?

Set the client_encoding to ascii when storing that name, and again when
retrieving it.
Or, use a bytea column.

 
 Madison

... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch for collation using ICU

2005-05-07 Thread John Hansen

Alvaro Herrera wrote:
 Sent: Sunday, May 08, 2005 2:49 PM
 To: John Hansen
 Cc: Tatsuo Ishii; pgman@candle.pha.pa.us; 
 [EMAIL PROTECTED]; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Patch for collation using ICU

 On Sun, May 08, 2005 at 02:07:29PM +1000, John Hansen wrote:
  Tatsuo Ishii wrote:

   So Japanese(including ASCII)/UNICODE behavior is 
 perfectly correct 
   at this moment.

  Right, so you _never_ use accented ascii characters in Japanese? 
  (like è for example, whose uppercase is È)

 That isn't ASCII.  It's latin1 or some other ASCII extension.

Point taken...
But...

If you want EUC_JP (Japanese + ASCII) then use that as your backend encoding, 
not UTF-8 (unicode).
UTF-8 encoded databases are very useful for representing multiple languages in 
the same database,
but this usefulness vanishes if functions like upper/lower doesn't work 
correctly.

So optimizing for 3 languages breaks more than a hundred, that's doesn't seem 
fair!

 --
 Alvaro Herrera ([EMAIL PROTECTED]) La principal 
 característica humana es la tontería
 (Augusto Monterroso)

... John

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Patch for collation using ICU

2005-05-06 Thread John Hansen


 Why do you need to add a mapping of encoding names from iana 
 to our names?
 

The pg encoding names are not recognized by ICU, hence the mappings

Install ICU 3.2 on your system, and run uconv -l, that will give you a
list of valid ICU encoding names.

... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Patch for collation using ICU

2005-05-06 Thread John Hansen

Btw,

Does it feel right to have pg depend on the bleeding edge version of
ICU?
On many distro's, even gentoo (known for being bleeding edge) 2.8 is
still the default.


2.8 and 3.2 are however incompatible, and supporting both, would bloat
the source somewhat.

... John


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] A proper fix for the conversion-function problem

2005-05-03 Thread John Hansen

 Are there any encodings we care about that require embedded zero
bytes?

UTF-8 does!

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] A proper fix for the conversion-function problem

2005-05-03 Thread John Hansen

Errm.. UTF-16/32

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of John Hansen
 Sent: Wednesday, May 04, 2005 1:22 PM
 To: Tom Lane; Tatsuo Ishii
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] A proper fix for the 
 conversion-function problem 
 
  Are there any encodings we care about that require embedded zero
 bytes?
 
 UTF-8 does!
 
 ---(end of 
 broadcast)---
 TIP 6: Have you searched our list archives?
 
http://archives.postgresql.org
 
 

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Call to build-in operator from new operator

2005-04-18 Thread John Hansen

Use the DirecFunctionCall1, DirecFunctionCall2, etc. functions.

... John

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
 Sent: Monday, April 18, 2005 10:40 PM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Call to build-in operator from new operator

 Greetings,

 I don't really know if this is the correct place to ask this 
 question, if not please direct me to the correct mailing list.

 I'm trying to develop a new operator for PostGreSQL (actually 
 for TelegraphCQ, which is an extension of PSQL). Part of the 
 operator's procedure is the @-operator. So now my question 
 is: How do I call the on_pb function from inside my own function?.
 The on_pb of course takes the argument 'PG_FUNCTION_ARGS' 
 which is defined in fmgr.h as 'FunctionCallInfo fcinfo' which 
 is defined as pointer to 'struct FunctionCallInfoData', so my 
 question boils down to:

 What do I put into this struct to call 'on_pb' with two 
 arguments from the call to my function?

 Further, is there a way to access data in tables in the 
 database other than those given as arguments to the function? And how?

 Sincerely
 Kim Bille
 Department of Computer Science
 Aalborg University
 Denmark

 --
 Mind are like parachutes --- they only work when open

 ---(end of 
 broadcast)---
 TIP 6: Have you searched our list archives?

http://archives.postgresql.org

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Recursive types?

2005-04-11 Thread John Hansen

Consider the following:

create table foo ( id serial primary key, path text);
alter table foo add ref foo;

   Table public.foo
 Column |  Type   |  Modifiers
+-+-
 id | integer | not null default nextval('public.foo_id_seq'::text)
 path   | text|
 ref| foo |
Indexes:
foo_pkey PRIMARY KEY, btree (id)


insert into foo (path) VALUES ('/');
insert into foo (path,ref) VALUES ('/foo',ROW(1,'/',NULL));
insert into foo (path,ref) VALUES
('/foo/bar',ROW(2,'/foo',ROW(1,'/',NULL)));

select * from foo;
 id |   path   |ref
+--+---
  1 | /|
  2 | /foo | (1,/,)
  3 | /foo/bar | (2,/foo,(1,/,))
(3 rows)

However, it is not possible to create such a type using create type, as
there is no alter type, even tho alter table does the same thing.

Is this a bug or a random feature?

It is also not possible to dump and restore this using pg_dump.

... John

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread John Hansen

On 2005-04-10, Tom Lane tgl ( at ) sss ( dot ) pgh ( dot ) pa ( dot )
us wrote:
 Andrew - Supernews andrew+nonews ( at ) supernews ( dot ) com
writes:
 I think you will find that this impression is actually false. Or
that at
 the very least, _correct_ verification of UTF-8 sequences will still
 catch essentially all cases of non-utf-8 input mislabelled as utf-8
 while allowing the full range of Unicode codepoints.

 Yeah?  Cool.  Does John's proposed patch do it correctly?

 http://candle.pha.pa.us/mhonarc/patches2/msg00076.html

It looks correct to me. The only thing I think that code will let
through
incorrectly are encoded surrogates; those could be fixed by adding one
line:

  switch (*source) {
  /* no fall-through in this inner switch */
  case 0xE0: if (a  0xA0) return false; break;
+ case 0xED: if (a  0x9F) return false; break;
  case 0xF0: if (a  0x90) return false; break;
  case 0xF4: if (a  0x8F) return false; break;


That's right, dono how I missed that one, but looks correct to me, and
is in line with the code in ConvertUTF.c from unicode.org, on which I
based the patch, extended to support 6 byte utf8 characters.

(Accepting encoded surrogates in utf-8 was always forbidden by most
specifications that used utf-8, though the Unicode specs originally
were
not absolute about it (but forbade generating them). Current Unicode
specifications define those sequences as malformed. Surrogates are the
code points from 0xD800 - 0xDFFF, which are used in UTF-16 to encode
characters 0x1 - 0x10 as two 16-bit values; UTF-8 requires that
such characters are encoded directly rather than via surrogate pairs.)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Unicode problems on IRC

2005-04-09 Thread John Hansen

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Bruce Momjian
 Sent: Sunday, April 10, 2005 8:18 AM
 To: Christopher Kings-Lynne
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Unicode problems on IRC

 Christopher Kings-Lynne wrote:
  Hey guys,

  The 'Unicode characters above 0x1' issue keeps rearing its ugly 
  head in the IRC channel.  I propose that it be fixed, even 
 backported...

  This is John Hansen's most recent patch to fix it:

  http://archives.postgresql.org/pgsql-patches/2004-11/msg00259.php

  And from what I can tell it was committed, then reverted because it 
  wasn't a bug.  It was going to go in for 8.1.

  We on the channel are starting to think that it is in fact a bug.  
  There are are people with legitimately utf-8 encoded XML documents 
  that they cannot store in PostgreSQL.  Apparently in the 
 distant past, 
  Unicode was limited to 0x1, but then was extended.

  Perhaps we can reopen this case...

 Uh, I thought we fixed this another way, buy not using 
 Unicode-aware functions for upper/lower/initcap when the 
 locale is C or POSIX. 
 That is backpatched to 8.0.X.  Does that not fix the problem reported?

No, as andrew said, what this patch does, is allow values  0x and
at the same time validates the input to make sure it's valid utf8.

... John

 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   pgman@candle.pha.pa.us   |  (610) 359-1001
   +  If your life is a hard drive, |  13 Roberts Road
   +  Christ can be your backup.|  Newtown Square, 
 Pennsylvania 19073

 ---(end of 
 broadcast)---
 TIP 9: the planner will ignore your desire to choose an index 
 scan if your
   joining column's datatypes do not match

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Patch for collation using ICU

2005-03-25 Thread John Hansen

 --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen 
 [EMAIL PROTECTED]
 wrote:
 
  Useful if it's going to support earlier releases of ICU
 
  Not all os's come with ICU3.2, debian for example, 
 currently has 2.1 
  in testing, and 2.6 in unstable.
 
 Oh, OK. FreeBSD has only the 3.2 as port. I can check the 
 older version, I doubt it would too much difference. Some 
 autoconf sorcery needed, perhaps.

Naww, it's no biggie, we'll just need to include ICU with pg I think.
I tried that, there are several functions from ICU that you use, that
are not in ICU2.1

Dono about 2.6.

However, ICU3.2 compiles on debian with a small change to the
debian/rules file.
debian/tmp/etc is missing, so add mkdir debian/tmp/etc

... John

 
 /Palle
 
 
  ... John
 
  -Original Message-
  From: [EMAIL PROTECTED]
  [mailto:[EMAIL PROTECTED] On Behalf Of Palle 
  Girgensohn
  Sent: Friday, March 25, 2005 10:40 AM
  To: pgsql-hackers@postgresql.org
  Subject: [HACKERS] Patch for collation using ICU
 
  Hi!
 
  I've put together a patch for using IBM's ICU package for 
 collation.
 
  If your OS does not have full support for collation ur 
  uppercase/lowercase in multibyte locales, this might be useful. If 
  you are using a multibyte character encoding in your database and 
  want collation, i.e. order by, and also lower(), upper() and 
  initcap() to work properly, this patch will do just that.
 
  This patch is needed for FreeBSD, since this OS has no support for 
  collation of for example unicode locales (that is, wcscoll(3) does 
  not do what you expect if you set LC_ALL=sv_SE.UTF-8, for 
 example). 
  AFAIK the patch is *not* necessary for Linux, although IBM 
 claims ICU 
  collation to be about twice as fast as glibc for simple western 
  locales.
 
  It adds a configure switch, `--with-icu', which will set 
 up the code 
  to use ICU instead of wchar_t and wcscoll.
 
  This has been tested only on FreeBSD-4.11  
 FreeBSD-5-stable, where 
  it seems to run well. I've not had the time to do any comparative 
  performance tests yet, but it seems it is at least not slower than 
  using LATIN1 with
  sv_SE.ISO8859-1 locale, perhaps even faster.
 
  I'd be delighted if some more experienced postgresql hackers would 
  review this stuff. The patch is pretty compact, so it's 
 fast reading 
  :)  I'm planning to add this patch as an option (tagged 
  experimental) to FreeBSD's postgresql port. Any ideas 
 about whether 
  this is a good idea or not?
 
  Any thoughts or ideas are welcome!
 
  Cheers,
  Palle
 
  Patch at:
  http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
  005-03-14.diff
 
  ICU at sourceforge: http://icu.sf.net/
 
 
  ---(end of
  broadcast)---
  TIP 7: don't forget to increase your free space map settings
 
 
 
 
 
 
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Patch for collation using ICU

2005-03-25 Thread John Hansen

Ok,.. tested on debian sarge with ICU 3.2 
UNICODE Database, C locale.

upper() and lower() returns an empty string for any input, including
7bit ascii, regardless of client_encoding, so something is obviously
broken.

Have you tested this patch on a UNICODE DB with locale C/POSIX ?

... John

 -Original Message-
 From: John Hansen 
 Sent: Friday, March 25, 2005 10:27 PM
 To: 'Palle Girgensohn'; 'pgsql-hackers@postgresql.org'
 Subject: RE: [HACKERS] Patch for collation using ICU
 
  --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen 
  [EMAIL PROTECTED]
  wrote:
  
   Useful if it's going to support earlier releases of ICU
  
   Not all os's come with ICU3.2, debian for example,
  currently has 2.1
   in testing, and 2.6 in unstable.
  
  Oh, OK. FreeBSD has only the 3.2 as port. I can check the older 
  version, I doubt it would too much difference. Some 
 autoconf sorcery 
  needed, perhaps.
 
 Naww, it's no biggie, we'll just need to include ICU with pg I think.
 I tried that, there are several functions from ICU that you 
 use, that are not in ICU2.1
 
 Dono about 2.6.
 
 However, ICU3.2 compiles on debian with a small change to the 
 debian/rules file.
 debian/tmp/etc is missing, so add mkdir debian/tmp/etc
 
 ... John
 
  
  /Palle
  
  
   ... John
  
   -Original Message-
   From: [EMAIL PROTECTED]
   [mailto:[EMAIL PROTECTED] On Behalf Of Palle 
   Girgensohn
   Sent: Friday, March 25, 2005 10:40 AM
   To: pgsql-hackers@postgresql.org
   Subject: [HACKERS] Patch for collation using ICU
  
   Hi!
  
   I've put together a patch for using IBM's ICU package for
  collation.
  
   If your OS does not have full support for collation ur 
   uppercase/lowercase in multibyte locales, this might be 
 useful. If 
   you are using a multibyte character encoding in your 
 database and 
   want collation, i.e. order by, and also lower(), upper() and
   initcap() to work properly, this patch will do just that.
  
   This patch is needed for FreeBSD, since this OS has no 
 support for 
   collation of for example unicode locales (that is, 
 wcscoll(3) does 
   not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
  example). 
   AFAIK the patch is *not* necessary for Linux, although IBM
  claims ICU
   collation to be about twice as fast as glibc for simple western 
   locales.
  
   It adds a configure switch, `--with-icu', which will set
  up the code
   to use ICU instead of wchar_t and wcscoll.
  
   This has been tested only on FreeBSD-4.11 
  FreeBSD-5-stable, where
   it seems to run well. I've not had the time to do any 
 comparative 
   performance tests yet, but it seems it is at least not 
 slower than 
   using LATIN1 with
   sv_SE.ISO8859-1 locale, perhaps even faster.
  
   I'd be delighted if some more experienced postgresql 
 hackers would 
   review this stuff. The patch is pretty compact, so it's
  fast reading
   :)  I'm planning to add this patch as an option (tagged
   experimental) to FreeBSD's postgresql port. Any ideas
  about whether
   this is a good idea or not?
  
   Any thoughts or ideas are welcome!
  
   Cheers,
   Palle
  
   Patch at:
   http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
   005-03-14.diff
  
   ICU at sourceforge: http://icu.sf.net/
  
  
   ---(end of
   broadcast)---
   TIP 7: don't forget to increase your free space map settings
  
  
  
  
  
  
  
  

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Patch for collation using ICU

2005-03-25 Thread John Hansen

 --On fredag, mars 25, 2005 23.39.33 +1100 John Hansen 
 [EMAIL PROTECTED]
 wrote:
 
  Ok,.. tested on debian sarge with ICU 3.2 UNICODE Database, 
 C locale.
 
  upper() and lower() returns an empty string for any input, 
 including 
  7bit ascii, regardless of client_encoding, so something is 
 obviously 
  broken.
 
  Have you tested this patch on a UNICODE DB with locale C/POSIX ?

FYI, I also found that initdb crashes with error 139 on any locale other
than C/POSIX.

 
 No, honestly not. Mostly tested it with my needs, sv_SE.UTF-8 
 and UNICODE, and also de_DE.UTF-8.
 
 How will PostgreSQL react to this combo? A database cluster 
 initdb:ed with locale=C/POSIX, and then a database in UNICODE 
 (really utf-8) representation... hmm... I think I might have 
 made a false assumption that the locale string would contain 
 the character encoding. I do something like encoding = 
 strchr(locale, '.') + 1... That code will be confused by a 'C' 
 locale, indeed. I'll check it out!
 
 /Palle
 
 
 
 
  ... John
 
  -Original Message-
  From: John Hansen
  Sent: Friday, March 25, 2005 10:27 PM
  To: 'Palle Girgensohn'; 'pgsql-hackers@postgresql.org'
  Subject: RE: [HACKERS] Patch for collation using ICU
 
   --On fredag, mars 25, 2005 16.34.41 +1100 John Hansen 
   [EMAIL PROTECTED]
   wrote:
  
Useful if it's going to support earlier releases of ICU
   
Not all os's come with ICU3.2, debian for example,
   currently has 2.1
in testing, and 2.6 in unstable.
  
   Oh, OK. FreeBSD has only the 3.2 as port. I can check the older 
   version, I doubt it would too much difference. Some
  autoconf sorcery
   needed, perhaps.
 
  Naww, it's no biggie, we'll just need to include ICU with 
 pg I think.
  I tried that, there are several functions from ICU that 
 you use, that 
  are not in ICU2.1
 
  Dono about 2.6.
 
  However, ICU3.2 compiles on debian with a small change to the 
  debian/rules file.
  debian/tmp/etc is missing, so add mkdir debian/tmp/etc
 
  ... John
 
  
   /Palle
  
   
... John
   
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf 
 Of Palle 
Girgensohn
Sent: Friday, March 25, 2005 10:40 AM
To: pgsql-hackers@postgresql.org
Subject: [HACKERS] Patch for collation using ICU
   
Hi!
   
I've put together a patch for using IBM's ICU package for
   collation.
   
If your OS does not have full support for collation ur 
uppercase/lowercase in multibyte locales, this might be
  useful. If
you are using a multibyte character encoding in your
  database and
want collation, i.e. order by, and also lower(), upper() and
initcap() to work properly, this patch will do just that.
   
This patch is needed for FreeBSD, since this OS has no
  support for
collation of for example unicode locales (that is,
  wcscoll(3) does
not do what you expect if you set LC_ALL=sv_SE.UTF-8, for
   example).
AFAIK the patch is *not* necessary for Linux, although IBM
   claims ICU
collation to be about twice as fast as glibc for 
 simple western 
locales.
   
It adds a configure switch, `--with-icu', which will set
   up the code
to use ICU instead of wchar_t and wcscoll.
   
This has been tested only on FreeBSD-4.11 
   FreeBSD-5-stable, where
it seems to run well. I've not had the time to do any
  comparative
performance tests yet, but it seems it is at least not
  slower than
using LATIN1 with
sv_SE.ISO8859-1 locale, perhaps even faster.
   
I'd be delighted if some more experienced postgresql
  hackers would
review this stuff. The patch is pretty compact, so it's
   fast reading
:)  I'm planning to add this patch as an option (tagged
experimental) to FreeBSD's postgresql port. Any ideas
   about whether
this is a good idea or not?
   
Any thoughts or ideas are welcome!
   
Cheers,
Palle
   
Patch at:
http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
005-03-14.diff
   
ICU at sourceforge: http://icu.sf.net/
   
   
---(end of
broadcast)---
TIP 7: don't forget to increase your free space map settings
   
   
  
  
  
  
  
  
 
 
 
 
 
 

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] Patch for collation using ICU

2005-03-25 Thread John Hansen

 -Original Message-
 From: Palle Girgensohn [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 26, 2005 1:10 PM
 To: pgsql-hackers@postgresql.org
 Cc: John Hansen; Andrew Dunstan
 Subject: Re: [HACKERS] Patch for collation using ICU

 --On fredag, mars 25, 2005 00.40.04 +0100 Palle Girgensohn 
 [EMAIL PROTECTED] wrote:

  Hi!

  I've put together a patch for using IBM's ICU package for collation.

  If your OS does not have full support for collation ur 
  uppercase/lowercase in multibyte locales, this might be 
 useful. If you 
  are using a multibyte character encoding in your database and want 
  collation, i.e. order by, and also lower(), upper() and 
 initcap() to 
  work properly, this patch will do just that.

  This patch is needed for FreeBSD, since this OS has no support for 
  collation of for example unicode locales (that is, 
 wcscoll(3) does not 
  do what you expect if you set LC_ALL=sv_SE.UTF-8, for 
 example). AFAIK 
  the patch is *not* necessary for Linux, although IBM claims ICU 
  collation to be about twice as fast as glibc for simple 
 western locales.

  It adds a configure switch, `--with-icu', which will set up 
 the code 
  to use ICU instead of wchar_t and wcscoll.

  This has been tested only on FreeBSD-4.11  
 FreeBSD-5-stable, where it 
  seems to run well. I've not had the time to do any comparative 
  performance tests yet, but it seems it is at least not slower than 
  using
  LATIN1 with sv_SE.ISO8859-1 locale, perhaps even faster.

  I'd be delighted if some more experienced postgresql hackers would 
  review this stuff. The patch is pretty compact, so it's 
 fast reading 
  :)  I'm planning to add this patch as an option (tagged 
  experimental) to FreeBSD's postgresql port. Any ideas 
 about whether 
  this is a good idea or not?

  Any thoughts or ideas are welcome!

  Cheers,
  Palle

  Patch at:

 http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-1
  4.d
  iff

  ICU at sourceforge: http://icu.sf.net/

 Hi!

 There's a new patch to fix some reported problems.

 http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
005-03-26.diff

 This version uses the DatabaseEncoding and sets the ICU 
 encoding at the same time. I had to create a conversion table 
 from PostgreSQL's own, somewhat odd and non-standard, names 
 of encodings, into the prefered IANA names. On or two of the 
 more odd ones might be slightly incorrect, hopefully not too 
 far off anyway?

 I've noticed a couple of things about using the ICU patch vs. pristine
 pg-8.0.1:

 - ORDER BY is case insensitive when using ICU. This might 
 break the SQL standard (?), but sure is nice :)

This would mean that indexes are also case insensitive right?
Which makes it a Bad Thing(tm).

 - When the database is initialized using the C locale, 
 upper() and lower() normally does not work at all for 
 non-ASCII characters even if the database's encoding is say 
 LATIN1 or UNICODE. (does not work for me anyway, on FreeBSD, 
 and this is probably correct since the locale is still `C', I 
 believe?). The ICU patch changes nothing for the LATIN1 case, 
 since it does not act on single byte encodings, but for the 
 UNICODE representation, it works and does what I expect it 
 to, namely upper() and lower() neatly
 upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ') 
 - 'åäö'. 
 This is a good thing, although I'm surprised that upper/lower 
 is dragged along with the LC_COLLATE fixation at initdb. I 
 never run initdb in the C locale, but only now do I realize 
 how broken that really is if you need to store anything else 
 than English :-)

That is what I would have expected. However, it probably won't work for the 
more exotic cases, like turkish I, which depends on the locale.

 I'd be delighted to get more feedback about this stuff.

 Thanks,
 Palle

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Patch for collation using ICU

2005-03-25 Thread John Hansen

Still doesn't work for me :(
UNICODE DB
C locale

set client_encoding = iso88591;
select upper('æ');
 upper
---

(1 row)

Trying to initdb with en_IN.utf8

/usr/lib/postgresql/bin/initdb -D /var/lib/postgres/data/ -E UNICODE 
--locale=en_IN.utf8
The files belonging to this database system will be owned by user postgres.
This user must also own the server process.

The database cluster will be initialized with locale en_IN.utf8.

fixing permissions on existing directory /var/lib/postgres/data ... ok
creating directory /var/lib/postgres/data/global ... ok
creating directory /var/lib/postgres/data/pg_xlog ... ok
creating directory /var/lib/postgres/data/pg_xlog/archive_status ... ok
creating directory /var/lib/postgres/data/pg_clog ... ok
creating directory /var/lib/postgres/data/pg_subtrans ... ok
creating directory /var/lib/postgres/data/base ... ok
creating directory /var/lib/postgres/data/base/1 ... ok
creating directory /var/lib/postgres/data/pg_tblspc ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /var/lib/postgres/data/base/1 ... ok
initializing pg_shadow ... ok
enabling unlimited row size for system tables ... ok
initializing pg_depend ... ok
creating system views ... ok
loading pg_description ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... child process exited with exit code 139
initdb: removing contents of data directory /var/lib/postgres/data
 
... John


 -Original Message-
 From: Palle Girgensohn [mailto:[EMAIL PROTECTED] 
 Sent: Saturday, March 26, 2005 1:10 PM
 To: pgsql-hackers@postgresql.org
 Cc: John Hansen; Andrew Dunstan
 Subject: Re: [HACKERS] Patch for collation using ICU
 
 --On fredag, mars 25, 2005 00.40.04 +0100 Palle Girgensohn 
 [EMAIL PROTECTED] wrote:
 
  Hi!
 
  I've put together a patch for using IBM's ICU package for collation.
 
  If your OS does not have full support for collation ur 
  uppercase/lowercase in multibyte locales, this might be 
 useful. If you 
  are using a multibyte character encoding in your database and want 
  collation, i.e. order by, and also lower(), upper() and 
 initcap() to 
  work properly, this patch will do just that.
 
  This patch is needed for FreeBSD, since this OS has no support for 
  collation of for example unicode locales (that is, 
 wcscoll(3) does not 
  do what you expect if you set LC_ALL=sv_SE.UTF-8, for 
 example). AFAIK 
  the patch is *not* necessary for Linux, although IBM claims ICU 
  collation to be about twice as fast as glibc for simple 
 western locales.
 
  It adds a configure switch, `--with-icu', which will set up 
 the code 
  to use ICU instead of wchar_t and wcscoll.
 
  This has been tested only on FreeBSD-4.11  
 FreeBSD-5-stable, where it 
  seems to run well. I've not had the time to do any comparative 
  performance tests yet, but it seems it is at least not slower than 
  using
  LATIN1 with sv_SE.ISO8859-1 locale, perhaps even faster.
 
  I'd be delighted if some more experienced postgresql hackers would 
  review this stuff. The patch is pretty compact, so it's 
 fast reading 
  :)  I'm planning to add this patch as an option (tagged 
  experimental) to FreeBSD's postgresql port. Any ideas 
 about whether 
  this is a good idea or not?
 
  Any thoughts or ideas are welcome!
 
  Cheers,
  Palle
 
  Patch at:
  
 http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-1
  4.d
  iff
 
  ICU at sourceforge: http://icu.sf.net/
 
 
 Hi!
 
 There's a new patch to fix some reported problems.
 
 http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
 005-03-26.diff
 
 This version uses the DatabaseEncoding and sets the ICU 
 encoding at the same time. I had to create a conversion table 
 from PostgreSQL's own, somewhat odd and non-standard, names 
 of encodings, into the prefered IANA names. On or two of the 
 more odd ones might be slightly incorrect, hopefully not too 
 far off anyway?
 
 I've noticed a couple of things about using the ICU patch vs. pristine
 pg-8.0.1:
 
 - ORDER BY is case insensitive when using ICU. This might 
 break the SQL standard (?), but sure is nice :)
 
 - When the database is initialized using the C locale, 
 upper() and lower() normally does not work at all for 
 non-ASCII characters even if the database's encoding is say 
 LATIN1 or UNICODE. (does not work for me anyway, on FreeBSD, 
 and this is probably correct since the locale is still `C', I 
 believe?). The ICU patch changes nothing for the LATIN1 case, 
 since it does not act on single byte encodings, but for the 
 UNICODE representation, it works and does what I expect it 
 to, namely upper() and lower() neatly
 upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ') 
 - 'åäö'. 
 This is a good thing, although I'm surprised that upper/lower 
 is dragged along with the LC_COLLATE fixation at initdb. I 
 never run initdb

Re: [HACKERS] Patch for collation using ICU

2005-03-24 Thread John Hansen

Useful if it's going to support earlier releases of ICU

Not all os's come with ICU3.2, debian for example, currently has 2.1 in
testing, and 2.6 in unstable.

... John 

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Palle Girgensohn
 Sent: Friday, March 25, 2005 10:40 AM
 To: pgsql-hackers@postgresql.org
 Subject: [HACKERS] Patch for collation using ICU
 
 Hi!
 
 I've put together a patch for using IBM's ICU package for collation.
 
 If your OS does not have full support for collation ur 
 uppercase/lowercase in multibyte locales, this might be 
 useful. If you are using a multibyte character encoding in 
 your database and want collation, i.e. order by, and also 
 lower(), upper() and initcap() to work properly, this patch 
 will do just that.
 
 This patch is needed for FreeBSD, since this OS has no 
 support for collation of for example unicode locales (that 
 is, wcscoll(3) does not do what you expect if you set 
 LC_ALL=sv_SE.UTF-8, for example). AFAIK the patch is *not* 
 necessary for Linux, although IBM claims ICU collation to be 
 about twice as fast as glibc for simple western locales.
 
 It adds a configure switch, `--with-icu', which will set up 
 the code to use ICU instead of wchar_t and wcscoll.
 
 This has been tested only on FreeBSD-4.11  FreeBSD-5-stable, 
 where it seems to run well. I've not had the time to do any 
 comparative performance tests yet, but it seems it is at 
 least not slower than using LATIN1 with
 sv_SE.ISO8859-1 locale, perhaps even faster.
 
 I'd be delighted if some more experienced postgresql hackers 
 would review this stuff. The patch is pretty compact, so it's 
 fast reading :)  I'm planning to add this patch as an option 
 (tagged experimental) to FreeBSD's postgresql port. Any 
 ideas about whether this is a good idea or not?
 
 Any thoughts or ideas are welcome!
 
 Cheers,
 Palle
 
 Patch at:
 http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2
005-03-14.diff
 
 ICU at sourceforge: http://icu.sf.net/
 
 
 ---(end of 
 broadcast)---
 TIP 7: don't forget to increase your free space map settings
 
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-25 Thread John Hansen

 John Hansen wrote:
  currently, upper/lower does not work with 2+ byte unicode 
 characters, 
  on any OS under the C locale.
 
 Sure it does.  It's just that the defined behavior of the C 
 locale is often useless in practice.

select upper('æøå');
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the database 
encoding.

Consequently it seems that is does not work.

... John

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-25 Thread John Hansen

  select upper('æøå');
  ERROR:  invalid multibyte character for locale
  HINT:  The server's LC_CTYPE locale is probably 
 incompatible with the database encoding.
 
  Consequently it seems that is does not work.
 
 It fails on my machine should not be read as it doesn't 
 work for anyone.
 It all depends on how your local mbstowcs() works.

Ok,... Do you have an example of a system on which it works?


... John

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-25 Thread John Hansen

 On HPUX 10.20, mbstowcs seems to treat all byte values as 
 single-byte characters in C locale, so my sample-of-one says 
 that it works everywhere ;-).

Right, so for the sample SQL I sent earlier, the result would be the same as 
the input?
That's hardly a working upper/lower

If a character doesn't have case then fine, but one that does, should at least 
produce a warning if it cannot be converted.


... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-24 Thread John Hansen

   To fix UTF8, the data needs to be converted to 
 UTF16 and then
   the Win32 wcscoll() can be used, and perhaps other functions
   like towupper().  However, UTF8 already works with normal
   locales but provides no ordering.

Right,. So if that's fixed, then UTF8 will work only on windows?
(currently, upper/lower does not work with 2+ byte unicode characters, on any 
OS)

... John


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-24 Thread John Hansen

K, let me rephrase:

currently, upper/lower does not work with 2+ byte unicode characters, on any OS 
under the C locale.

... John

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] [pgsql-hackers-win32] UNICODE/UTF-8 on win32

2005-02-24 Thread John Hansen

 currently, upper/lower does not work with 2+ byte unicode 
 characters, on any OS under the C locale.

Btw,...

There are only 15 cases in the utf8 repertoire that depends on locale, these 
are the only cases where pg should report:

ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the database 
encoding.

When doing a select upper/lower (col)
All others should work just fine.

The error should probably also be changed to a warning, and just return the 
offending character unmodified.


... John

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Terminating a SETOF function call sequence

2005-02-17 Thread John Hansen

 ... c) would be very bad since it 
 doesn't give me any chance to release the resources that 
 where used in order to produce the rows.

You are supposed to free resources used to produce the rows before 
srf_return_next();
The actual rows are pfree()'d by pg. (an dso are any other palloc()'d 
resources, but I'd recommend freeing them anyway, especially if youre going to 
use the function in an index or transactions, since resources a not freed till 
the end of the transaction)

... JOhn

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Schema name of function

2005-02-14 Thread John Hansen

Just got reminded...

Is there a way for a C function to determine the name of the schema in which is 
was created?

... John



---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Schema name of function

2005-02-14 Thread John Hansen

Beautiful, just what I was looking for.

Thnx,

John

 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, February 15, 2005 6:31 AM
 To: Michael Fuhr
 Cc: John Hansen; pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] Schema name of function 

 Michael Fuhr [EMAIL PROTECTED] writes:
  On Mon, Feb 14, 2005 at 11:11:53AM -0500, Tom Lane wrote:
  In C, it'd be a lot easier (and faster) to do a couple of 
  SearchSysCache calls than to use SPI to get those rows.

  The following appears to work -- does it look right, aside from the 
  missing error checking?

tuple = SearchSysCache(PROCOID,
   ObjectIdGetDatum(fcinfo-flinfo-fn_oid),
   0, 0, 0);
nsoid = SysCacheGetAttr(PROCOID, tuple,
Anum_pg_proc_pronamespace, isnull);
schemaname = get_namespace_name(nsoid);
ReleaseSysCache(tuple);

 That would work.  Since pronamespace is one of the fixed 
 non-nullable columns of pg_proc, you don't really need to use 
 SysCacheGetAttr: you can just map the C struct onto the tuple 
 and grab the field directly.

   nsoid = ((Form_pg_proc) GETSTRUCT(tuple))-pronamespace;

 utils/cache/lsyscache.c contains lots of examples of this 
 sort of thing.
 (get_namespace_name is one, in fact.)

   regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

[HACKERS] unicode upper/lower functions

2005-02-02 Thread John Hansen

Hi list,


Attached for your perusal, unicode versions of upper/lower, that work
independent of locale except for the following languages:

Turkish, Azeri, and Lithuanian.
There are 15 locale specific cases in total not covered.


-- 
John Hansen [EMAIL PROTECTED]
GeekNET


collate.tar.gz
Description: application/compressed-tar

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [NOVICE] Last ID Problem

2005-02-02 Thread John Hansen

 Attempts to return the id of the last value to be inserted into a table.
 You can either provide a sequence name (preferred) or provide a table
 name with optional schema. The $catalog and $field arguments are always 
 ignored.
 The current value of the sequence is returned by a call to the
 'currval' PostgreSQL function. This will fail if the sequence has not yet
 been used in the current database connection.


This suffers from the same problems that currval does when using
connection pools tho.
 
I previously suggested a function similar to last_insert_id in behaviour,
and have attached it to this email for reference.

Even so, this also suffers from the same problems when using a connection pool.

The solution I proposed, namely having the tuple returned by
inserts/updates (perhaps even deletes?) would only mean changing the
client library to handle this, and as an example, libpg could easily
figure out the OID of said tuple and return that if it's present for
PQExec() (for backwards compatibility just as it does today,) and add a
separate PQExecSelect() that instead returns the tuple(s) as if they had
been SELECTed.

-- 
John Hansen [EMAIL PROTECTED]
GeekNET
#include postgres.h
#include fmgr.h
#include storage/relfilenode.h
#include commands/sequence.h

static int64 _lastval = 0;

PG_FUNCTION_INFO_V1(nextval_new);
Datum nextval_new(PG_FUNCTION_ARGS) {
_lastval = DatumGetInt64(nextval(fcinfo));
PG_RETURN_INT64(_lastval);
}

PG_FUNCTION_INFO_V1(lastval);
Datum lastval(PG_FUNCTION_ARGS) {
PG_RETURN_INT64(_lastval);
}
SET search_path = pg_catalog;

BEGIN;

DELETE FROM pg_catalog.pg_proc WHERE proname = 'nextval';

CREATE FUNCTION nextval(text)
RETURNS bigint
AS 'lastval.so','nextval_new'
LANGUAGE 'C';

COMMENT ON FUNCTION nextval(text) IS 'sequence next value';

CREATE FUNCTION lastval()
RETURNS bigint
AS 'lastval.so','lastval'
LANGUAGE 'C';

COMMENT ON FUNCTION lastval() IS 'sequence last value';

COMMIT;
# -

lastval :
	gcc -I /usr/include/postgresql/server/ -I /usr/include/postgresql/ -shared -o lastval.so lastval.c
	strip lastval.so

install :
	install -s -m 755 lastval.so $(DESTDIR)/usr/lib/postgresql/lib/;

clean :
	rm -f *.o *~ core *.so;


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] subselects in the target list

2005-02-02 Thread John Hansen

On Wed, 2005-02-02 at 23:22 -0500, Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  neilc=# select a, (select * from abc) from abc;
  ERROR:  subquery must return only one column
 
  Is there a reason we can't treat a subselect in the target list as
  returning a composite type?
 
 Given the 8.0 infrastructure for unnamed record types it might be
 possible to do that; it was surely never possible before.  Whether it's
 a good idea is another question.  The syntax you are showing is designed
 to return a scalar.  It will (and should) barf on multiple rows as well
 as multiple columns.

Right, the point is, that is does not, if said srf-function is written
in say, C.

However, this is somewhat similar to the WITH LATERAL clause previously
discussed in connection with UNNEST and multisets, so perhaps it's not
such a bad idea after all?

  For that matter, is this behavior also intentional?
 
  neilc=# select a, foo_abc2() FROM abc;
  ERROR:  set-valued function called in context that cannot accept a set
  CONTEXT:  PL/pgSQL function foo_abc2 line 1 at return next
 
 It's an implementation restriction in plpgsql: we didn't make it support
 the old-style SRF API.  I'm unconvinced that it's worth fixing
 considering that this whole behavior (SRFs in the targetlist) is
 deprecated.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings
-- 
John Hansen [EMAIL PROTECTED]
GeekNET


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] unicode upper/lower functions

2005-02-02 Thread John Hansen

uhmm,...

Forgot to change the copyright.

Please accept this under the same terms as postgresql itself.


... John


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])

Re: [HACKERS] [NOVICE] Last ID Problem

2005-02-01 Thread John Hansen

  Since OID's are now deprecated, and will eventually disappear, 
  wouldn't it be a good idea, to have INSERT and UPDATE 
 return a copy of 
  the tuple that was inserted/updated?
 
 How about the TID?

Yea, that'd work. As long as you can get an arbitrary column back out, 'as it 
was at the time it was committed'.

Since not everything runs in a transaction,. And someone might have modified 
the row by the time you get to fetching it back out

Or in terms of tuples,. No longer exist, if vacuum full have run...

... JOhn

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [NOVICE] Last ID Problem

2005-02-01 Thread John Hansen

 No one has stated that they will disappear.

Ohh,... just the impression I've been getting when speaking with people.

... John


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

1 2 >

1 - 100 of 164 matches

Mail list logo