date:20050410

Re: [HACKERS] Question regarding clock-sweep

2005-04-10 Thread Tom Lane

Josh Berkus  writes:
> Now that I'm beginning serious performance testing of clock-sweep, I
> was going back through the lock discussion and am not sure what the
> patch that actually went in 3 weeks ago consisted of.  Is it
> clock-sweep with a used/unused bit or a counter?  How is it handling
> seq scans?

It's clock-sweep with a counter.  The counter increments on reference,
up to a small maximum value (BM_MAX_USAGE_COUNT in buf_internals.h),
and decrements when the clock hand passes over the buffer.  I'd be
interested to see trials with different values of BM_MAX_USAGE_COUNT
... I made it 5 to start with but that was a WAG.

There's not any special smarts for seqscans, but the counter should
handle that.

> Oh, and incidentally, can I use the same database files for 8.0.2 and 8.1cvs 
> 3/10/05?

Sorry, we forced initdb already several times...

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Question regarding clock-sweep

2005-04-10 Thread Neil Conway

Josh Berkus wrote:
Oh, and incidentally, can I use the same database files for 8.0.2 and 8.1cvs 
3/10/05?
No.
-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

[HACKERS] Question regarding clock-sweep

2005-04-10 Thread Josh Berkus

Tom,

Now that I'm beginning serious performance testing of clock-sweep, I was going 
back through the lock discussion and am not sure what the patch that actually 
went in 3 weeks ago consisted of.   Is it clock-sweep with a used/unused bit 
or a counter?   How is it handling seq scans?

Oh, and incidentally, can I use the same database files for 8.0.2 and 8.1cvs 
3/10/05?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews

On 2005-04-10, "John Hansen" <[EMAIL PROTECTED]> wrote:
> That's right, dono how I missed that one, but looks correct to me, and
> is in line with the code in ConvertUTF.c from unicode.org, on which I
> based the patch, extended to support 6 byte utf8 characters.

Frankly, you should probably de-extend it back down to 4 bytes. That's
enough to encode the Unicode range of 0x00 - 0x10, and enough
other stuff would break if anyone allocated a character outside that
range that I don't think it it worth worrying about. (Even the ISO
people have agreed to conform to that limitation.) Even if insanity
struck simultaneously at both standards bodies, 4 bytes is enough to
go to 0x1F so there is still substantial slack. (A number of other
specifications based on utf-8 have removed the 5 and 6 byte sequences
too, so there is substantial precedent for this.)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Case Sensitivity

2005-04-10 Thread Bruno Wolff III

On Sat, Apr 09, 2005 at 21:02:34 +0200,
  [EMAIL PROTECTED] wrote:
> Is there a way to set case sensitivity on?

In what context?

If you are talking about mixed case table or column names, then you need
to quote them with double quotes (").

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] Functionscan estimates

2005-04-10 Thread Josh Berkus

People:

(HACKERS: Please read this entire thread at 
http://archives.postgresql.org/pgsql-performance/2005-04/msg00179.php 
Sorry for crossing this over.)

> > The larger point is that writing an estimator for an SRF is frequently a
> > task about as difficult as writing the SRF itself
>
> True, although I think this doesn't necessarily kill the idea. If
> writing an estimator for a given SRF is too difficult, the user is no
> worse off than they are today. Hopefully there would be a fairly large
> class of SRFs for which writing an estimator would be relatively simple,
> and result in improved planner behavior.

For that matter, even supplying an estimate constant would be a vast 
improvement over current functionality.  I would suggest, in fact, that we 
allow the use of either a constant number, or an estimator function, in that 
column.  Among other things, this would allow implementing the constant 
number right now and the use of an estimating function later, in case we can 
do the one but not the other for 8.1.

To be more sophisticated about the estimator function, it could take a subset 
of the main functions arguments, based on $1 numbering, for example:
CREATE FUNCTION some_func ( INT, TEXT, TEXT, INT, INT ) ...
ALTER FUNCTION some_func WITH ESTIMATOR some_func_est( $4, $5 )

This would make writing estimators which would work for several functions 
easier.   Estimators would be a special type of functions which would take 
any params and RETURN ESTIMATOR, which would be implicitly castable from some 
general numeric type (like INT or FLOAT).

> > I don't foresee a whole lot of use of an estimator hook designed as
> > proposed here.  In particular, if the API is such that we can only
> > use the estimator when all the function arguments are plan-time
> > constants, it's not going to be very helpful.

Actually, 95% of the time I use SRFs they are accepting constants and not row 
references.  And I use a lot of SRFs.

>
> Yes :( One approach might be to break the function's domain into pieces
> and have the estimator function calculate the estimated result set size
> for each piece. So, given a trivial function like:
>
> foo(int):
> if $1 < 10 then produce 100 rows
> else produce 1 rows
>
> If the planner has encoded the distribution of input tuples to the
> function as a histogram, it could invoke the SRF's estimator function
> for the boundary values of each histogram bucket, and use that to get an
> idea of the function's likely result set size at runtime.
>
> And yes, the idea as sketched is totally unworkable :) For one thing,
> the difficulty of doing this grows rapidly as the number of arguments to
> the function increases. But perhaps there is some variant of this idea
> that might work...
>
> Another thought is that the estimator could provide information on the
> cost of evaluating the function, the number of tuples produced by the
> function, and even the distribution of those tuples.

Another possibility would be to support default values for all estimator 
functions and have functions called in row context passed DEFAULT, thus 
leaving it up to the estimator writer to supply median values for context 
cases.  Or to simply take the "first" values and use those. 

While any of these possibilites aren't ideal, they are an improvement over the 
current "flat 1000" estimate.   As I said, even the ability to set a 
per-function flat constant estimate would be an improvement.

> BTW, why is this on -performance? It should be on -hackers.

'cause I spend more time reading -performance, and I started the thread.  
Crossed over now.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Oliver Jowett

Tom Lane wrote:

> Yeah?  Cool.  Does John's proposed patch do it "correctly"?
> 
> http://candle.pha.pa.us/mhonarc/patches2/msg00076.html

Some comments on that patch:

Doesn't pg_utf2wchar_with_len need changes for the longer sequences?

UtfToLocal also appears to need changes.

If we support sequences >4 bytes (>U+10), then UtfToLocal/LocalToUtf
and the associated translation tables need a redesign as they currently
assume the sequence fits in an unsigned int. (IIRC, Unicode doesn't use
>U+10, but UTF-8 can encode it?)

-O

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] System vs non-system casts

2005-04-10 Thread Jim C. Nasby

In working on the newsysviews project we've discovered that there's no
definitive way to determine if a cast is a system cast (system as in
part of postgresql/created by createdb) or not. What pg_dump does (and
what we're doing now for lack of a better solution) is any cast that
doesn't involve a user-created type, or a user-created conversion
function is considered a system cast. This means if a user creates a
cast between two different system types using a system function (to use
a bad example, say text->int), that cast won't show up in pg_user_casts,
and more important, it won't be backed up by pg_dump.

This seems sub-optimal. :)

Is there a reasonable way to fix this? For most objects, you can
determine if it's a system object or not based on the schema it lives
in. So, one possibility is to put casts into schemas. This would have
the added effect of allowing you to 'hide' a cast by removing it's
schema from search_path.

Another possibility would be to add an is_system column to pg_cast.
Casts created by the system as part of database creation (or at least
the initial creation of the template databases) would have this field
set to true, whereas user created casts would have it set to false.
Instead of having two seperate methods to create casts, you could do a
bulk update of pg_cast as part of database creation.

Thoughts?
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] [PATCHES] DELETE ... USING

2005-04-10 Thread Josh Berkus

Bruce,

> If everyone else is OK with having it fail, that is fine with me, but I
> wanted to make sure folks saw this was happening. ÂI basically saw no
> discussion that we were disabling that syntax. Â[CC moved to hackers.]

I believe we hashed this out when we added add_missing_from back in 7.3.

In any case, yes, making that kind of query fail is intentional.  So it should 
go in the release notes as a warning.  Suggested text:


add_missing_from now defaults to "true".  This means that queries such as the 
following:
SELECT pg_class.*;
DELETE FROM table_1 WHERE table_2.fk = table_1.key AND table_2.col3 = TRUE;
... will no fail with default settings.   Either set add_missing_from to TRUE 
to re-enable them, or modify your application to support the correct syntax, 
such as the new DELETE FROM ... USING (see below).


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Case Sensitivity

2005-04-10 Thread Euler Taveira de Oliveira

Ola' juan,

> Is there a way to set case sensitivity on?
> 
No. Discussions about this thread are in the archives
(http://archives.postgresql.org). Take a look at:
http://www.postgresql.org/docs/8.0/interactive/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS


Euler Taveira de Oliveira
euler[at]yahoo_com_br





Yahoo! Acesso Grátis - Internet rápida e grátis. 
Instale o discador agora! http://br.acesso.yahoo.com/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread John Hansen

>On 2005-04-10, Tom Lane  wrote:
>> Andrew - Supernews 
writes:
>>> I think you will find that this impression is actually false. Or
that at
>>> the very least, _correct_ verification of UTF-8 sequences will still
>>> catch essentially all cases of non-utf-8 input mislabelled as utf-8
>>> while allowing the full range of Unicode codepoints.
>>
>> Yeah?  Cool.  Does John's proposed patch do it "correctly"?
>>
>> http://candle.pha.pa.us/mhonarc/patches2/msg00076.html
>
>It looks correct to me. The only thing I think that code will let
through
>incorrectly are encoded surrogates; those could be fixed by adding one
line:
>
>  switch (*source) {
>  /* no fall-through in this inner switch */
>  case 0xE0: if (a < 0xA0) return false; break;
>+ case 0xED: if (a > 0x9F) return false; break;
>  case 0xF0: if (a < 0x90) return false; break;
>  case 0xF4: if (a > 0x8F) return false; break;
>

That's right, dono how I missed that one, but looks correct to me, and
is in line with the code in ConvertUTF.c from unicode.org, on which I
based the patch, extended to support 6 byte utf8 characters.

>(Accepting encoded surrogates in utf-8 was always forbidden by most
>specifications that used utf-8, though the Unicode specs originally
were
>not absolute about it (but forbade generating them). Current Unicode
>specifications define those sequences as malformed. Surrogates are the
>code points from 0xD800 - 0xDFFF, which are used in UTF-16 to encode
>characters 0x1 - 0x10 as two 16-bit values; UTF-8 requires that
>such characters are encoded directly rather than via surrogate pairs.)
>
>-- 
>Andrew, Supernews
>http://www.supernews.com - individual and corporate NNTP services

... John

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Tab-completion feature ?

2005-04-10 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> After the "TO" there is one space and the cursor is after that space
> I press tab and I get
>
> leda=# ALTER TABLE any_table RENAME TO TO

What is happening is that psql is simply assuming that the first "TO"
may be the name of a column you are about to rename. It's the same
as:

ALTER TABLE any_table RENAME mycolumn TO

We can probably have the tab-completion code call up a list of column
names for comparison: not sure if is worth the trouble though.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200504101714
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFCWZc7vJuQZxSWSsgRAoPiAKCVQHa2swRy6/jGzKGJplVv7je1mACg2Z6J
hJ8eSYiss3LDHsBQLBWrBJc=
=jjYQ
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Case Sensitivity

2005-04-10 Thread juan




Is there a way to set case sensitivity 
on?
 
Thanks in advance
juan

Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan

2005-04-10 Thread John A Meinel

Tom Lane wrote:
"Jim C. Nasby" <[EMAIL PROTECTED]> writes:
On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
Can anyone suggest a more general rule?  Do we need for example to
consider whether the relation membership is the same in two clauses
that might be opposite sides of a range restriction?  It seems like
a.x > b.y AND a.x < b.z

In a case like this, you could actually look at the  data in b and see
what the average range size is.

Not with the current statistics --- you'd need some kind of cross-column
statistics involving both y and z.  (That is, I doubt it would be
helpful to estimate the average range width by taking the difference of
independently-calculated mean values of y and z ...)  But yeah, in
principle it would be possible to make a non-default estimate.
			regards, tom lane
Actually, I think he was saying do a nested loop, and for each item in
the nested loop, re-evaluate if an index or a sequential scan is more
efficient.
I don't think postgres re-plans once it has started, though you could
test this in a plpgsql function.
John
=:->


signature.asc
Description: OpenPGP digital signature

[HACKERS] Catching DDL events (or equivalent functionality)

2005-04-10 Thread Master of the beasts

Hi,

I know that you can not (and maybe should not) install triggers
on system catalogs. But, if I want to catch certain DDL events 
(such as adding a column), is there any way to do it in PostgreSQL?

Maybe, it could be useful that the triggers (installed on normal tables) 
can be fired not only by DML but by DDL events. 

Thanks in advance.

Regards,
The F. Jackal.

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Should we still require RETURN in plpgsql?

2005-04-10 Thread Terry Yapt

Hello...

On Tue, 05 Apr 2005 02:28:23 -0400, [EMAIL PROTECTED] (Tom Lane)
wrote:
>
>How does Oracle's PL/SQL handle this?

On ORACLE a FUNCTION MUST return a value.  If the FUNCTION doesn't
return a value Oracle give a 'hint' on FUNCTION compilation and error
on SELECT function invocation:  ORA-06503.

When we don't want to return any result on ORACLE we must use
PROCEDURE statement instead of FUNCTION.

Example:

SQL> CREATE OR REPLACE FUNCTION F_test RETURN NUMBER IS
  2  BEGIN
  3NULL;
  4  END F_TEST;
  5  /

Function created.

SQL> SELECT F_TEST FROM DUAL;
SELECT TUZSA.F_TEST FROM DUAL
   *
ERROR at line 1:
ORA-06503: PL/SQL: Function returned without value
ORA-06512: at "F_TEST", line 3

SQL> 

Greetings.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[HACKERS] Raise Exception

2005-04-10 Thread Mario Reis


  Dear Sir,

 I' ve recently join to PostGreSql community. I'm testing it on a local
network and I'm very found of it .
However there are a few things that I'd like to understand better.

 As far as i realise, every time the Server validates a wrong value for an
input it Raises an Exception with the check failure for each input "for each
record"
 For example, for each invalid foreing key, it automaticly raises an
exception.If you
have a large file 20fields to validate before insert the validates each on
at a time and
raises an exception for each falure.
 As far as i understand it should join all  the validities from each field
record and latter
 display/notify, all them at once, joinning all the errors after an insert
or update instruction
 for each record of course.
 This way it w'll save time and resouces communications special in a large
network with a large
 number of users.

 Sorry if i got it wrong. I also appologise if this isn't the right place to
put this question but
i don't know where else puting it.

 I hope you w'll understand what i mean. Sorry for my poor english.

 Thanks any way

 Mário


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews

On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote:
> Andrew - Supernews <[EMAIL PROTECTED]> writes:
>> I think you will find that this impression is actually false. Or that at
>> the very least, _correct_ verification of UTF-8 sequences will still
>> catch essentially all cases of non-utf-8 input mislabelled as utf-8
>> while allowing the full range of Unicode codepoints.
>
> Yeah?  Cool.  Does John's proposed patch do it "correctly"?
>
> http://candle.pha.pa.us/mhonarc/patches2/msg00076.html

It looks correct to me. The only thing I think that code will let through
incorrectly are encoded surrogates; those could be fixed by adding one line:

  switch (*source) {
  /* no fall-through in this inner switch */
  case 0xE0: if (a < 0xA0) return false; break;
+ case 0xED: if (a > 0x9F) return false; break;
  case 0xF0: if (a < 0x90) return false; break;
  case 0xF4: if (a > 0x8F) return false; break;

(Accepting encoded surrogates in utf-8 was always forbidden by most
specifications that used utf-8, though the Unicode specs originally were
not absolute about it (but forbade generating them). Current Unicode
specifications define those sequences as malformed. Surrogates are the
code points from 0xD800 - 0xDFFF, which are used in UTF-16 to encode
characters 0x1 - 0x10 as two 16-bit values; UTF-8 requires that
such characters are encoded directly rather than via surrogate pairs.)

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] static genericcostestimate

2005-04-10 Thread Ramy M. Hassan

Tom Lane wrote:
"Ramy M. Hassan" <[EMAIL PROTECTED]> writes:
 

The genericcostestimate function is currently static. This limits the 
development of new access methods as loadable modules without touching 
pgsql sources. Currently I have to include a copy of the function in the 
module, which is obviously too bad.
Is there any reason to keep this function static ?
   

Is it really of much use for your access method?  It's such a crude hack
that I didn't want to encourage people to use it ... it is really just a
stopgap until someone gets around to thinking harder about the actual
access behavior of the existing index AMs.
BTW, what are you working on?  I had no idea that anyone was
experimenting with new index methods.
 

I am currently working on porting SP-GiST to postgresql.
SP-GiST is an adaptation of GiST to support space partitioning trees ( 
http://www.cs.purdue.edu/homes/aref/dbsystems_files/SP-GiST/ )
The current standalone SP-GiST implementation is based on libgist v1.0 
from berkeley ( http://gist.cs.berkeley.edu/libgistv1/ )
The core SP-GiST is being implemented as module to be loaded before any 
spgist extention module.
I am expecting the first alpha release early of May.
Currently, there is no effort done in cost estimation for SP-GiST, so 
the genericcostestimate seams to be ok for now.

			regards, tom lane
 


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org

Re: [HACKERS] Three-byte Unicode characters

2005-04-10 Thread Marc G. Fournier

On Sun, 10 Apr 2005, Peter Eisentraut wrote:
Bruce Momjian wrote:
So, we do have a bug, and we are probably going to need to fix it in
8.0.X.
This has never worked in all the years we have had Unicode
functionality, so I don't understand why we have to rush to fix it now.
Certainly, it ought to be fixed, but not in a minor release.
Agreed ... this is extending an existing feature to include a broader 
charset, not fixing a but ...


Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664
---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] static genericcostestimate

2005-04-10 Thread Tom Lane

"Ramy M. Hassan" <[EMAIL PROTECTED]> writes:
> The genericcostestimate function is currently static. This limits the 
> development of new access methods as loadable modules without touching 
> pgsql sources. Currently I have to include a copy of the function in the 
> module, which is obviously too bad.
> Is there any reason to keep this function static ?

Is it really of much use for your access method?  It's such a crude hack
that I didn't want to encourage people to use it ... it is really just a
stopgap until someone gets around to thinking harder about the actual
access behavior of the existing index AMs.

BTW, what are you working on?  I had no idea that anyone was
experimenting with new index methods.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Tom Lane

Andrew - Supernews <[EMAIL PROTECTED]> writes:
> On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote:
>> The impression I get is that most of the 'Unicode characters above
>> 0x1' reports we've seen did not come from people who actually needed
>> more-than-16-bit Unicode codepoints, but from people who had screwed up
>> their encoding settings and were trying to tell the backend that Latin1
>> was Unicode or some such.

> I think you will find that this impression is actually false. Or that at
> the very least, _correct_ verification of UTF-8 sequences will still
> catch essentially all cases of non-utf-8 input mislabelled as utf-8
> while allowing the full range of Unicode codepoints.

Yeah?  Cool.  Does John's proposed patch do it "correctly"?

http://candle.pha.pa.us/mhonarc/patches2/msg00076.html

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Three-byte Unicode characters

2005-04-10 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Bruce Momjian wrote:
>> So, we do have a bug, and we are probably going to need to fix it in
>> 8.0.X.

> This has never worked in all the years we have had Unicode 
> functionality, so I don't understand why we have to rush to fix it now.  
> Certainly, it ought to be fixed, but not in a minor release.

The reasons why we rejected applying John's patch at the tail end
of the 8.0 cycle are still valid: it is a new feature and there
is nontrivial risk of introducing new bugs (more specifically,
exposing bits of the system that aren't prepared for more-than-16-bit
characters).

I'm fine with changing it in the 8.1 cycle, but I think a back-patch
would be folly. 

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] static genericcostestimate

2005-04-10 Thread Ramy M. Hassan

Hi,
The genericcostestimate function is currently static. This limits the 
development of new access methods as loadable modules without touching 
pgsql sources. Currently I have to include a copy of the function in the 
module, which is obviously too bad.
Is there any reason to keep this function static ?

Thanks
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Unicode problems on IRC

2005-04-10 Thread Andrew - Supernews

On 2005-04-10, Tom Lane <[EMAIL PROTECTED]> wrote:
> The impression I get is that most of the 'Unicode characters above
> 0x1' reports we've seen did not come from people who actually needed
> more-than-16-bit Unicode codepoints, but from people who had screwed up
> their encoding settings and were trying to tell the backend that Latin1
> was Unicode or some such.  So I'm a bit worried that extending the
> backend support to full 32-bit Unicode will do more to mask encoding
> mistakes than it will do to create needed functionality.

I think you will find that this impression is actually false. Or that at
the very least, _correct_ verification of UTF-8 sequences will still
catch essentially all cases of non-utf-8 input mislabelled as utf-8
while allowing the full range of Unicode codepoints. (The current check
will report the "characters above 0x1" error even on input which is
blatantly not utf-8 at all.)

One of UTF-8's nicest properties is that other encodings are almost never
also valid utf-8. I did some tests on this myself some years ago, feeding
hundreds of thousands of short non-utf-8 strings (taken from Usenet
subject lines in non-english-speaking hierarchies) into a utf-8 decoder.
The false accept rate was on the order of 0.01%, and going back and
re-checking my old data, _none_ of the incorrectly detected sequences
would have been interpreted as characters over 0x.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Three-byte Unicode characters

2005-04-10 Thread Peter Eisentraut

Bruce Momjian wrote:
> So, we do have a bug, and we are probably going to need to fix it in
> 8.0.X.

This has never worked in all the years we have had Unicode 
functionality, so I don't understand why we have to rush to fix it now.  
Certainly, it ought to be fixed, but not in a minor release.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Three-byte Unicode characters

2005-04-10 Thread Bruce Momjian

[ This email to hackers from last night got lost so I am remailing.]

Tom Lane wrote:
> "John Hansen" <[EMAIL PROTECTED]> writes:
> >> That is backpatched to 8.0.X.  Does that not fix the problem reported?
> 
> > No, as andrew said, what this patch does, is allow values > 0x and
> > at the same time validates the input to make sure it's valid utf8.
> 
> The impression I get is that most of the 'Unicode characters above
> 0x1' reports we've seen did not come from people who actually needed
> more-than-16-bit Unicode codepoints, but from people who had screwed up
> their encoding settings and were trying to tell the backend that Latin1
> was Unicode or some such.  So I'm a bit worried that extending the
> backend support to full 32-bit Unicode will do more to mask encoding
> mistakes than it will do to create needed functionality.
> 
> Not that I'm against adding the functionality.  I'm just doubtful that
> the reports we've seen really indicate that we need it, or that adding
> it will cut down on the incidence of complaints :-(

OK, I got on the IRC server and talked to folks who actually understand
this.  They say there are Chinese who are reporting this problem, so I
Googled and found this:

http://www.yale.edu/chinesemac/pages/charset_encoding.html#Unicode

See the paragraph with "Supplementary Ideographic Plane".  You will see
that paragraph says:

The Supplementary Ideographic Plane (SIP) currently contains 42,711
additional characters in "CJK Unified Ideographs Extension B"
(U+2-2A6D6). The PDF chart for this is available at:
http://www.unicode.org/charts/PDF/U2.pdf

I assume it is that U+2-2A6D6 range that people are complaining
about.

So, we do have a bug, and we are probably going to need to fix it in
8.0.X.

I apologize to people who reported this problem and I wasn't attentive
to the seriousness of it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Question regarding clock-sweep

Re: [HACKERS] Question regarding clock-sweep

[HACKERS] Question regarding clock-sweep

Re: [HACKERS] Unicode problems on IRC

Re: [HACKERS] Case Sensitivity

Re: [HACKERS] [PERFORM] Functionscan estimates

Re: [HACKERS] Unicode problems on IRC

[HACKERS] System vs non-system casts

Re: [HACKERS] [PATCHES] DELETE ... USING

Re: [HACKERS] Case Sensitivity

Re: [HACKERS] Unicode problems on IRC

Re: [HACKERS] Tab-completion feature ?

[HACKERS] Case Sensitivity

Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan

[HACKERS] Catching DDL events (or equivalent functionality)

Re: [HACKERS] Should we still require RETURN in plpgsql?

[HACKERS] Raise Exception

Re: [HACKERS] Unicode problems on IRC

Re: [HACKERS] static genericcostestimate

Re: [HACKERS] Three-byte Unicode characters

Re: [HACKERS] static genericcostestimate

Re: [HACKERS] Unicode problems on IRC

Re: [HACKERS] Three-byte Unicode characters

[HACKERS] static genericcostestimate

Re: [HACKERS] Unicode problems on IRC

Re: [HACKERS] Three-byte Unicode characters

[HACKERS] Three-byte Unicode characters

27 matches

Site Navigation

Mail list logo

Footer information