Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Tom Lane
BTW, before I forget, this little project turned up a couple of
small improvements for the current buildfarm infrastructure:

1.  There are half a dozen entries with obviously bogus timestamps:

bfarm=# select sysname,snapshot,branch from mfailures where snapshot < 
'2004-01-01';
  sysname   |  snapshot   | branch 
+-+
 corgi  | 1997-10-14 14:20:10 | HEAD
 kookaburra | 1970-01-01 01:23:00 | HEAD
 corgi  | 1997-09-30 11:47:08 | HEAD
 corgi  | 1997-10-17 14:20:11 | HEAD
 corgi  | 1997-12-21 15:20:11 | HEAD
 corgi  | 1997-10-15 14:20:10 | HEAD
 corgi  | 1997-09-28 11:47:09 | HEAD
 corgi  | 1997-09-28 11:47:08 | HEAD
(8 rows)

indicating wrong system clock settings on these buildfarm machines.
(Indeed, IIRC these failures were actually caused by the ridiculous
clock settings --- we have at least one regression test that checks
century >= 21 ...)  Perhaps the buildfarm server should bounce
reports with timestamps more than a day in the past or a few minutes in
the future.  I think though that a more useful answer would be to
include "time of receipt of report" in the permanent record, and then
subsequent analysis could make its own decisions about whether to
believe the snapshot timestamp --- plus we could track elapsed times for
builds, which could be interesting in itself.

2. I was annoyed repeatedly that some buildfarm members weren't
reporting log_archive_filenames entries, which forced going the long
way round in the process I was using.  Seems like we need some more
proactive means for getting buildfarm owners to keep their script
versions up-to-date.  Not sure what that should look like exactly,
as long as it's not "you can run an ancient version as long as you
please".

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes:
> These on mongoose are most likely a result of flaky hardware.

Yeah, I saw a pretty fair number of irreproducible issues that are
probably hardware flake-outs.  Of course you can't tell which are those
and which are low-probability software bugs for many moons...

I believe that a large fraction of the buildfarm consists of
semi-retired equipment that is probably more prone to this sort of
problem than newer stuff would be.  But that's the price we must pay
for building such a large test farm on a shoestring.  What we need to do
to deal with it, I think, is institutionalize some kind of long-term
tracking so that we can tell the recurrent from the non-recurrent
issues.  I don't quite know how to do that; what I did over this past
weekend was labor-intensive and not scalable.

SoC project perhaps?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Tom Lane
"Joshua D. Drake" <[EMAIL PROTECTED]> writes:
>> Some of these might possibly be interesting to other people ...

> If you provide the various greps, etc... I will put it into the website 
> proper...

Unfortunately I didn't keep notes on exactly what I searched for in each
case.  Some of them were not based on grep at all, but rather "this
failure looks similar to those others and happened in the period between
a known bad patch commit and its fix".  The goal was essentially to
group together failures that probably arose from the same cause --- I
may have made a mistake or two along the way ...

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Jeremy Drake
On Sun, 18 Mar 2007, Tom Lane wrote:

>  another icc crash| 
> 2007-02-03 10:50:01 | 1
>  icc "internal error" | 
> 2007-03-16 16:30:01 |29

These on mongoose are most likely a result of flaky hardware.  They tend
to occur most often when either
a) I am doing something else on the box when the build runs, or
b) the ambient temperature in the room is > ~72degF

I need to bring down this box at some point and try to figure out if it is
bad memory or what.

Anyway, ICC seems to be one of the few things that are really succeptable
to hardware issues (on this box at least, it is mostly ICC and firefox),
and I apologize for the noise this caused in the buildfarm logs...

-- 
American business long ago gave up on demanding that prospective
employees be honest and hardworking.  It has even stopped hoping for
employees who are educated enough that they can tell the difference
between the men's room and the women's room without having little
pictures on the doors.
-- Dave Barry, "Urine Trouble, Mister"

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread ITAGAKI Takahiro

Jeff Davis <[EMAIL PROTECTED]> wrote:

> Some people think it's a bug, some people don't. It is technically
> documented behavior, but I don't think the documentation is clear
> enough. I think it is a bug that should be fixed, and here's another
> message in the thread that expresses my opinion:

Agreed. I think it is a bug, too.
Insertion of invalid characters makes read queries busted.

$ initdb --encoding=utf8
# CREATE TABLE test (t text);
# INSERT INTO test VALUES('A');
# SELECT * FROM test;
 t
---
 A
(1 row)

# INSERT INTO test VALUES(E'\200');
# SELECT * FROM test;
ERROR:  invalid byte sequence for encoding "UTF8": 0x80
HINT:  This error can also happen if the byte sequence does not match the
encoding expected by the server, which is cont rolled by "client_encoding".


Could it lead to DoS?
http://www.postgresql.org/support/security
| [D] A vulnerability that is exploitable for denial-of-service,
| but requiring a valid prior login. 

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Joshua D. Drake

   | 2007-01-31 17:30:01 |16

 use of // comment| 
2007-02-16 09:23:02 | 1
 xml code teething problems   | 
2007-02-16 16:01:05 |79
(54 rows)

Some of these might possibly be interesting to other people ...


If you provide the various greps, etc... I will put it into the website 
proper...


Joshua D. Drake



regards, tom lane





---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Returning control to the backend for every row returned would likely be 
> excessive, but you could return once every k rows and get most of the 
> benefits of both approaches (k might be on the order of 1000).

However, this still leaves us with no idea how to persuade perl, tcl,
python, et al to cooperate.

I think you are underestimating the cost of suspending/resuming any of
those interpreters, and overestimating the cost of a tuplestore, which
on a per-tuple basis is really pretty cheap.  It's quite likely that the
proposed project would produce piddling or negative gains, after
expending a huge amount of work.  (A tenth of the effort on optimizing
tuplestore some more would probably be a better investment.)

A cross-check on this theory could be made without a lot of effort: hack
SQL functions to use a tuplestore (fed via the tuplestore destreceiver,
so as not to exit the executor) instead of return-after-every-tuple.
Compare performance.  I kinda suspect you'll find it a loss even there.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Ok, good point. Now, which of those need to have a check for valid encoding?

The vast majority will barf on any non-ASCII character anyway ... only
the ones that don't will need a check.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ILIKE and indexes

2007-03-18 Thread Tom Lane
"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> The usual trick recommended in the doc is to use lower() and LIKE but
> it leads to bad row estimates (it's constant whatever the search
> pattern is)

Not if you have an index on lower(col) which one supposes you'd have
anyway for such an application.  Or are you running an ancient PG
release?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> OK, for anyone that wants to play, I have created an extract that 
> contains a summary of every non-CVS-related failure we've had. It's a 
> single table looking like this:

I did some analysis on this data.  Attached is a text dump of a table
declared as

CREATE TABLE mreasons (
sysname text,
snapshot timestamp without time zone,
branch text,
reason text,
known boolean
);

where the sysname/snapshot/branch data is taken from your table,
"reason" is a brief sketch of the failure, and "known" indicates
whether the cause is known ... although as I went along it sort
of evolved into "does this seem worthy of more investigation?".

I looked at every failure back through early December.  I'd intended to
go back further, but decided I'd hit a point of diminishing returns.
However, failures back to the beginning of July that matched grep
searches for recent symptoms are classified in the table.

The gross stats are: 2231 failures classified, 71 distinct reason
codes, 81 failures (with 18 reasons) that seem worthy of closer
investigation:

bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons 
where not known group by 1,2 order by 1,2 ;
  reason  |branch   
  |   latest| count 
--+---+-+---
 Input/output error - possible hardware problem   | HEAD
  | 2007-03-06 10:30:01 | 1
 No rule to make target   | HEAD
  | 2007-02-08 15:30:01 | 6
 No rule to make target   | 
REL8_0_STABLE | 2007-02-28 03:15:02 | 9
 No rule to make target   | 
REL8_2_STABLE | 2006-12-17 20:00:01 | 1
 could not open relation with OID | HEAD
  | 2007-03-16 16:45:01 | 2
 could not open relation with OID | 
REL8_1_STABLE | 2006-08-29 23:30:07 | 2
 createlang not found?| 
REL8_1_STABLE | 2007-02-28 02:50:00 | 1
 irreproducible contrib/sslinfo build failure, likely not our bug | HEAD
  | 2007-02-03 07:03:02 | 1
 irreproducible opr_sanity failure| HEAD
  | 2006-12-18 19:15:02 | 2
 libintl.h rejected by configure  | HEAD
  | 2007-01-11 20:35:00 | 3
 libintl.h rejected by configure  | 
REL8_0_STABLE | 2007-03-01 20:28:04 |22
 postmaster failed to start   | 
REL7_4_STABLE | 2007-02-28 22:23:20 | 1
 postmaster failed to start   | 
REL8_0_STABLE | 2007-02-28 22:30:44 | 1
 random Solaris configure breakage| HEAD
  | 2007-01-14 05:30:00 | 1
 random Windows breakage  | HEAD
  | 2007-03-16 09:48:31 | 3
 random Windows breakage  | 
REL8_0_STABLE | 2007-03-15 03:15:09 | 7
 segfault during bootstrap| HEAD
  | 2007-03-12 23:03:03 | 1
 server does not shut down| HEAD
  | 2007-01-08 03:03:03 | 3
 tablespace is not empty  | HEAD
  | 2007-02-24 15:00:10 | 6
 tablespace is not empty  | 
REL8_1_STABLE | 2007-01-25 02:30:01 | 2
 unexpected statement_timeout failure | HEAD
  | 2007-01-25 05:05:06 | 1
 unexplained tsearch2 crash   | HEAD
  | 2007-01-10 22:05:02 | 1
 weird DST-transition-like timestamp test failure | HEAD
  | 2007-02-04 07:25:04 | 1
 weird assembler failure, likely not our bug  | HEAD
  | 2006-12-26 17:02:01 | 1
 weird assembler failure, likely not our bug  | 
REL8_2_STABLE | 2007-02-03 23:47:01 | 1
 weird install failure| HEAD
  | 2007-01-25 12:35:00 | 1
(26 rows)

I think I know the cause of the recent 'could not open relation with
OID' failures in HEAD, but the rest of these maybe need a look.
Any volunteers?

Also, for completeness, the causes I wrote off as not interesting
(anymore, in some cases):

bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where 
known group by 1 order by 1 ;
reason|   
latest| count 
--+-+---
 DST transition test failure  

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Andrew Dunstan



Gregory Stark wrote:

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

  

Below is a list of the input routines in the adt directory, courtesy of grep.



Grep isn't a good way to get these, your list missed a bunch.

postgres=# select distinct prosrc from pg_proc where oid in (select typinput 
from pg_type);

  

[snip]

(64 rows)

  


Ok, good point. Now, which of those need to have a check for valid encoding?

cheers

andrew

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Gregory Stark

"Andrew Dunstan" <[EMAIL PROTECTED]> writes:

> Below is a list of the input routines in the adt directory, courtesy of grep.

Grep isn't a good way to get these, your list missed a bunch.

postgres=# select distinct prosrc from pg_proc where oid in (select typinput 
from pg_type);
   prosrc
-
 abstimein
 aclitemin
 any_in
 anyarray_in
 anyelement_in
 array_in
 bit_in
 boolin
 box_in
 bpcharin
 byteain
 cash_in
 charin
 cidin
 cidr_in
 circle_in
 cstring_in
 date_in
 domain_in
 float4in
 float8in
 inet_in
 int2in
 int2vectorin
 int4in
 int8in
 internal_in
 interval_in
 language_handler_in
 line_in
 lseg_in
 macaddr_in
 namein
 numeric_in
 oidin
 oidvectorin
 opaque_in
 path_in
 point_in
 poly_in
 record_in
 regclassin
 regoperatorin
 regoperin
 regprocedurein
 regprocin
 regtypein
 reltimein
 smgrin
 textin
 tidin
 time_in
 timestamp_in
 timestamptz_in
 timetz_in
 tintervalin
 trigger_in
 unknownin
 uuid_in
 varbit_in
 varcharin
 void_in
 xidin
 xml_in
(64 rows)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Islam Hegazy
Returning k rows would be a reasonable solution but which functions need to 
be modified to achieve this.



- Original Message - 
From: "Neil Conway" <[EMAIL PROTECTED]>

To: "Andrew Dunstan" <[EMAIL PROTECTED]>
Cc: "Martijn van Oosterhout" ; "Islam Hegazy" 
<[EMAIL PROTECTED]>; 

Sent: Sunday, March 18, 2007 4:57 PM
Subject: Re: [HACKERS] modifying the tbale function



Andrew Dunstan wrote:
I'm not convinced it would be a huge gain anyway. Switching madly in and 
out of the perl interpreter at least is a known performance problem, IIRC


Returning control to the backend for every row returned would likely be 
excessive, but you could return once every k rows and get most of the 
benefits of both approaches (k might be on the order of 1000). The problem 
with the current approach is that it makes returning large result sets 
from PL functions very expensive, since they need to be spooled to disk.


As for using threads, that's pretty much a non-starter: we can't safely 
allow calls into the backend from multiple concurrent threads, and I doubt 
that will chance any time soon.


-Neil


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Neil Conway

Andrew Dunstan wrote:
I'm not convinced it would be a huge gain anyway. Switching madly in 
and out of the perl interpreter at least is a known performance 
problem, IIRC


Returning control to the backend for every row returned would likely be 
excessive, but you could return once every k rows and get most of the 
benefits of both approaches (k might be on the order of 1000). The 
problem with the current approach is that it makes returning large 
result sets from PL functions very expensive, since they need to be 
spooled to disk.


As for using threads, that's pretty much a non-starter: we can't safely 
allow calls into the backend from multiple concurrent threads, and I 
doubt that will chance any time soon.


-Neil


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Andrew Dunstan



Martijn van Oosterhout wrote:

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).


  


We haven't even worked out how to do that cleanly for plpgsql, which we 
control, let alone for any third party interpreter.


I'm not convinced it would be a huge gain anyway. Switching madly in and 
out of the perl interpreter at least is a known performance problem, 
IIRC - Perl XML objects have, or used to have, problems with that (and 
they still don't perform terribly well).


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Andrew Dunstan



I wrote: 


The escape processing is actually done in the lexer in the case of 
literals. We have to allow for bytea literals there too, regardless of 
encoding. The lexer naturally has no notion of the intended 
destination of the literal, So we need to defer the validity check to 
the *in functions for encoding-aware types. And it as Tom has noted, 
COPY does its own escape processing but does it before the transcoding.


So ISTM that any solution other than something like I have proposed 
will probably involve substantial surgery.


Below is a list of the input routines in the adt directory, courtesy of 
grep.


I'm thinking we will need to put checks in:

   varcharin
   bpcharin
   textin
   unknownin (?)
   namein (?)

Any others?

cheers

andrew


acl.c:aclitemin
bool.c:boolin
char.c:charin
date.c:timetypmodin
date.c:timetztypmodin
float.c:dasin
float.c:dsin
nabstime.c:abstimein
nabstime.c:reltimein
nabstime.c:tintervalin
name.c:namein
not_in.c:oidnotin
numeric.c:numerictypmodin
oid.c:oidin
oid.c:oidvectorin
regproc.c:regprocin
regproc.c:regprocedurein
regproc.c:regoperin
regproc.c:regoperatorin
regproc.c:regclassin
regproc.c:regtypein
tid.c:tidin
timestamp.c:timestamptypmodin
timestamp.c:timestamptztypmodin
timestamp.c:intervaltypmodin
varbit.c:bittypmodin
varbit.c:varbittypmodin
varchar.c:bpcharin
varchar.c:bpchartypmodin
varchar.c:varcharin
varchar.c:varchartypmodin
varlena.c:byteain
varlena.c:textin
varlena.c:unknownin
xid.c:xidin
xid.c:cidin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] modifying the tbale function

2007-03-18 Thread Martijn van Oosterhout
On Sun, Mar 18, 2007 at 01:54:55PM -0600, Islam Hegazy wrote:
> I am trying to modify the execution of the table function to work in
> iterator fashion instead of materializing the output. I have been
> digging the Postgresql code source for about a month now and I can
> figure out where the execution of the table function works. I will be
> very grateful if anyone tell where to begin as my project due is
> after 10 days only.

I've been thinking recently about why it's so difficult. It occurs to
me that the problem is because the language interpreters don't lend
themselves to being an iterator.

What you want is that when you call a perl tablefunction that as soon
as the perl function returns a row to return that to the caller. That
means the perl interpreter has to be able to save all its state,
return to the caller and when next called resume where it left off.
I don't know if it can do that, but it would have to be implemented for
each language (or use threads).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


[HACKERS] modifying the tbale function

2007-03-18 Thread Islam Hegazy
Hi there

I am trying to modify the execution of the table function to work in iterator 
fashion instead of materializing the output. I have been digging the Postgresql 
code source for about a month now and I can figure out where the execution of 
the table function works. I will be very grateful if anyone tell where to begin 
as my project due is after 10 days only.

Regards
Islam Hegazy

Re: [HACKERS] ILIKE and indexes

2007-03-18 Thread Guillaume Smet

On 3/18/07, Martijn van Oosterhout  wrote:

Er, it's link between LIKE and the ~=~ that's hard coded


Yes.


So I think it's easier that you think: just build the operator class
and make sure you use the right operator so the planner uses it. ILIKE
already maps to an operator...


Yeah I know. The fact is that you can't use an index for any pattern
and it depends on the database encoding too. The code in the planner
checks that the pattern and the database encoding makes the index
usable and rewrites the LIKE clause (and the ILIKE clause if the
pattern begins with a non alpha character) so that it can use the
index.
So I'm pretty sure I have to change this behaviour in the planner or
did I miss something?

--
Guillaume

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ILIKE and indexes

2007-03-18 Thread Martijn van Oosterhout
On Sun, Mar 18, 2007 at 07:30:35PM +0100, Guillaume Smet wrote:
> I have planned to write the operator class as a "contrib" module but I
> couldn't find the link between LIKE operator and text_pattern_ops
> opclass which uses ~=~ and all its variants. Andrew from Supernews
> told me it was hardcoded in the planner so the introduction of this
> new opclass requires a few changes to the planner to take it into
> account for ILIKE.

Er, it's link between LIKE and the ~=~ that's hard coded, however the
link between the operator class and the operator is nothing special,
that's why it's an operator class.

So I think it's easier that you think: just build the operator class
and make sure you use the right operator so the planner uses it. ILIKE
already maps to an operator...

Hope this helps,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [COMMITTERS] pgsql: Add GUC temp_tablespaces to provide a default location for

2007-03-18 Thread Jaime Casanova

On 3/17/07, Tom Lane <[EMAIL PROTECTED]> wrote:

"Jaime Casanova" <[EMAIL PROTECTED]> writes:
> On 3/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:
>> In the second place, it's a serious violation of what little modularity
>> and layering we have for fd.c to be calling into commands/tablespace.c.
>> This is not merely cosmetic but has real consequences: one being that
>> it's now unsafe to call OpenTemporaryFile outside a transaction.

> ok, you are right... what do you suggest?
> maybe move the GetTempTablespace function to somewhere in src/backend/utils?

You missed the point entirely.  Relocating the code to some other file
wouldn't change the objection: the problem is that fd.c mustn't invoke
any transactional facilities such as catalog lookups.  It's too low
level for that.



oh, i see...


You could perhaps do it the other way around: some transactional
code (eg the assign hook for a GUC variable) tells fd.c to save
some private state controlling future temp file creations.



the problem with the assign hook function is that it can't read
catalogs too if we are in a non-interactive command...

so, we need a list with the oids of the tablespaces, we can initialize
this list the first time we need a temp file (i haven't seen exactly
where we can do this, yet), and if we set the GUC via a SET command
then we can let the assign hook do the job.


BTW, if we are now thinking of temp files as being directed to
particular tablespaces, is there any reason still to have per-database
subdirectories for them?  It might simplify life if there were just
one default temp directory, $PGDATA/base/pgsql_tmp/, plus one per
configured temp tablespace, $PGDATA/pg_tblspc//pgsql_tmp/.



what the  directory shoud be, i understand ypur idea as just
$PGDATA/pg_tblspc/pgsql_tmp/...

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] ILIKE and indexes

2007-03-18 Thread Guillaume Smet

Hi all,

I'm currently facing a common problem with queries using ILIKE: it
can't use an index except if the pattern begins with non alpha
characters.

The usual trick recommended in the doc is to use lower() and LIKE but
it leads to bad row estimates (it's constant whatever the search
pattern is) and in several use cases we have, it's a real problem
because the rows are far from being equally distributed.
To take a real life example, if I look for 'c%' or 'l%' patterns in
one of my tables, it returns a lot of rows and the nested loop chosen
by the planner for every pattern is a very poor choice for these
particular patterns.

I'd like to see an opclass similar to
(text|bpchar|varchar|name)_pattern_ops to deal with ILIKE.
I found this post of Jan
http://archives.postgresql.org/pgsql-hackers/2003-10/msg01550.php but
I'd really like not to introduce a new set of non standard operators
to deal with this feature.

I have planned to write the operator class as a "contrib" module but I
couldn't find the link between LIKE operator and text_pattern_ops
opclass which uses ~=~ and all its variants. Andrew from Supernews
told me it was hardcoded in the planner so the introduction of this
new opclass requires a few changes to the planner to take it into
account for ILIKE.

What I'd like to do:
* introduce 4 new opclasses called
(text|bpchar|varchar|name)_icpattern_ops with ~=~* operator and
variants
* change the planner to make it use these operators for ILIKE in the
same way it is done for LIKE (mostly remove the non alpha limitation
and point the planner to the new operators)

Is there any fundamental problem in this approach? I mostly wonder if
there are any significant problems which prevented us from doing it
before and I've missed in my analysis. Is there anything I should
particularly take care of?

Thanks for any advice or comment.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] msvc build broken for ecpg

2007-03-18 Thread Magnus Hagander
>> It's intended to parse as much as possible out of the existing files.
>> Unfortunately, the makefiles aren't consistent enough to make this
>> always work, which is why each project in /bin/ for example is added
>> manually. In /contrib/ they're a lot more consistent, so they're
>> auto-parsed.
>>
>> But - what specifically do you want to know? I'll be happy to give the
>> details if I know what you're after :-)
>>
>>
>>   
> 
> What I want to know is how resistant it is to breakage. Will it break if
> we add or remove a source file?

No.

> A header?

No, unless it requires changing the global include path.

> A directory?

As a part of the existing makefile structure? No.
As a new makefile target (say a new directory in /bin for a new .exe)? Yes.


> A library?

If the library is required to build, yes, it will need to be added.
If it's optional, it will still work but not use the library, until added.
If you mean a library that we *build* (like libpq), yes (unless it's in
contrib where it's picked up automatically).


//Magnus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] msvc build broken for ecpg

2007-03-18 Thread Andrew Dunstan



Magnus Hagander wrote:

Perhaps Magnus could give some precise details about the requirements of
the buildsystem he's constructed.



It's intended to parse as much as possible out of the existing files.
Unfortunately, the makefiles aren't consistent enough to make this
always work, which is why each project in /bin/ for example is added
manually. In /contrib/ they're a lot more consistent, so they're
auto-parsed.

But - what specifically do you want to know? I'll be happy to give the
details if I know what you're after :-)


  


What I want to know is how resistant it is to breakage. Will it break if 
we add or remove a source file? A header? A directory? A library?


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] msvc build broken for ecpg

2007-03-18 Thread Magnus Hagander
Andrew Dunstan wrote:
> 
> The recent ecpg commit seems to have broken MSVC builds:
> 
> 
> descriptor.obj : error LNK2019: unresolved external symbol _pg_snprintf
> referenced in function _ECPGget_desc
> error.obj : error LNK2001: unresolved external symbol _pg_snprintf
> misc.obj : error LNK2001: unresolved external symbol _pg_snprintf
> path.obj : error LNK2001: unresolved external symbol _pg_snprintf
> error.obj : error LNK2019: unresolved external symbol _pg_fprintf
> referenced in function _sqlprint
> misc.obj : error LNK2001: unresolved external symbol _pg_fprintf
> path.obj : error LNK2001: unresolved external symbol _pg_fprintf
> execute.obj : error LNK2019: unresolved external symbol _pg_sprintf
> referenced in function _ECPGis_type_an_array
> execute.obj : error LNK2019: unresolved external symbol _pg_printf
> referenced in function _ECPGexecute
> misc.obj : error LNK2019: unresolved external symbol _pg_vfprintf
> referenced in function _ECPGlog
> path.obj : error LNK2019: unresolved external symbol _strlcpy referenced
> in function _join_path_components
> thread.obj : error LNK2001: unresolved external symbol _strlcpy
> .\Debug\libecpg\libecpg.dll : fatal error LNK1120: 6 unresolved externals
>10 Warning(s)
>13 Error(s)
> 

ecpg needs to link with libpgport. Instead of always linking with it, I
went with adding it only to those projects that actually need it. Seems
I missed ecpg on that - probably because one of my dev boxes builds
without ecpg since I don't have the pthreads lib on it.

So it's not actually ecpgs fault...


> This makes me worry about how fragile the support for MSVC is. If it
> can't handle changes like this then we'll forever be breaking things.
> That was one of the main reasons we went with MinGW/Msys in the first
> place - namely that we'd be using a single build system across all
> platforms.
> 
> This is not to say that the current build infrastructure is ideal - in
> fact I think it's pretty creaky. But I don't think we can accept an
> additional system which is always in danger of being out of sync.
> 
> Perhaps Magnus could give some precise details about the requirements of
> the buildsystem he's constructed.

It's intended to parse as much as possible out of the existing files.
Unfortunately, the makefiles aren't consistent enough to make this
always work, which is why each project in /bin/ for example is added
manually. In /contrib/ they're a lot more consistent, so they're
auto-parsed.

But - what specifically do you want to know? I'll be happy to give the
details if I know what you're after :-)

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] msvc build broken for ecpg

2007-03-18 Thread Andrew Dunstan


The recent ecpg commit seems to have broken MSVC builds:


descriptor.obj : error LNK2019: unresolved external symbol _pg_snprintf 
referenced in function _ECPGget_desc

error.obj : error LNK2001: unresolved external symbol _pg_snprintf
misc.obj : error LNK2001: unresolved external symbol _pg_snprintf
path.obj : error LNK2001: unresolved external symbol _pg_snprintf
error.obj : error LNK2019: unresolved external symbol _pg_fprintf 
referenced in function _sqlprint

misc.obj : error LNK2001: unresolved external symbol _pg_fprintf
path.obj : error LNK2001: unresolved external symbol _pg_fprintf
execute.obj : error LNK2019: unresolved external symbol _pg_sprintf 
referenced in function _ECPGis_type_an_array
execute.obj : error LNK2019: unresolved external symbol _pg_printf 
referenced in function _ECPGexecute
misc.obj : error LNK2019: unresolved external symbol _pg_vfprintf 
referenced in function _ECPGlog
path.obj : error LNK2019: unresolved external symbol _strlcpy referenced 
in function _join_path_components

thread.obj : error LNK2001: unresolved external symbol _strlcpy
.\Debug\libecpg\libecpg.dll : fatal error LNK1120: 6 unresolved externals
   10 Warning(s)
   13 Error(s)


This makes me worry about how fragile the support for MSVC is. If it 
can't handle changes like this then we'll forever be breaking things. 
That was one of the main reasons we went with MinGW/Msys in the first 
place - namely that we'd be using a single build system across all 
platforms.


This is not to say that the current build infrastructure is ideal - in 
fact I think it's pretty creaky. But I don't think we can accept an 
additional system which is always in danger of being out of sync.


Perhaps Magnus could give some precise details about the requirements of 
the buildsystem he's constructed.



cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] TODO item: Improve speed with indexes

2007-03-18 Thread Tom Lane
"Aleksis Petrov" <[EMAIL PROTECTED]> writes:
> The other solution could be to not update the indexes at all (just REINDEX
> it). But this does'nt seem to fit in with the current implementation of
> repair_frag() function.

Considerably more to the point: what happens if VACUUM fails partway
through (ie, before it can complete the reindexing)?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PATCHES] xml2 contrib patch supporting default XML namespaces

2007-03-18 Thread Mike Rylander

On 3/6/07, Mike Rylander <[EMAIL PROTECTED]> wrote:

On 3/6/07, Peter Eisentraut <[EMAIL PROTECTED]> wrote:
> Mike Rylander wrote:
> > The patch adds support for default XML namespaces in xml2 by providing
> > a mechanism for supplying a prefix to a named namespace URI.
>
> How does it support multiple namespaces in one document?

It supports one default (unprefixed) namespace URI per document, which
ISTM is the overwhelmingly common case (and the itch that I must
scratch).


I think there is some confusion about what the current xml2 contrib
module supports and what my patch adds.  The current code, as it
stands today, supports multiple namespaces just fine.  The only
requirement is that each namespace have a prefix, or else one is
forced to use the local-name() construct with every single node for
those nodes in unprefixed ("default") namespaces.  This patch simply
adds support for registering a prefix for an unprefixed namespace,
which is an extremely common case in XML and causes the use of overly
verbose contortions when designing XPath expressions.  To illustrate
this, xml2 currently supports all of these statements:

SELECT xpath_nodeset('foo','/x/y');
SELECT xpath_nodeset('foo','/x/a:y');
SELECT xpath_nodeset('foo','/b:x/a:y');

All number and manner of /prefixed/ namespaces work fine today.
However, in order to match an element or attribute with an unprefixed
namespace, the xpath becomes a study in overly verbose, human error
inducing repetition.  For instance, consider the extremely common case
of an xhtml document that does not use a prefix for the xhtml
namespace.  Using the xml2 contrib module as it stands today, without
my patch, using XPath to get the title of the document might look
something like this:

/*[local-name()="html"]/*[local-name()="head"]/*[local-name()="title"]

Now just imagine the XPath needed to get a portion of the body in a
nested div based on the existence of some other node ... the logic
gets lost in the noise simply because of the overhead of
namespace-qualifying the elements.

Namespaces were introduced in XML to address verbosity issues (among
other things), but as XPath was designed primarily as a language for
use inside XSLT (where namespace support is fully integrated) it
didn't get the treatment needed to handle unprefixed namespaces.  To
address /that/ issue, my patch allows the registration of a supplied
prefix for a supplied URI, which solves the common "default namespace"
problem in a completely backward compatible way.  The above example
XPath can now become:

/x:html/x:head/x:title

simply by supplying 2 more arguments to the _ns version of any of the
xpath_ functions available in xml2.  I challenge anyone to claim that
the [local-name()="foo] variant is easier to read and less error prone
than the second, namespace-prefixed variant.  They are exactly
equivalent, but the second (quite obviously) is Better(tm).

I understand that XML support is planned and at least partially
implemented for 8.3, but many production instances will be unable (or,
in fact, unwilling) to upgrade to 8.3 for quite some time.  Because
this patch is completely backward compatible it can (theoretically) be
included in future 8.1 and 8.2 releases, and for those of us that need
more full XML support in the short term the upgrade of a contrib
module is probably a very viable option -- it is for me, anyway.

So, to sum up, please let me know what I can do to increase the
chances of getting this patch included.  Alternatively, if my patch is
being vetoed, please let me know that too so that I can create a local
maintenance plan for this.

Thanks in advance.  I've attached the patch again for reference.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org


xml2-namespaces.patch
Description: Binary data

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Martijn van Oosterhout
On Sun, Mar 18, 2007 at 08:25:56AM -0400, Andrew Dunstan wrote:
> It does also seem from my test results that transcoding to MB charsets 
> (or at least to utf-8) is surprisingly expensive, and that this would be 
> a good place to look at optimisation possibilities. The validity tests 
> can also be somewhat expensive.

Hmm, I just noticed that the verify string works one character at a
time, at least that part could be dramatically optimised.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


[HACKERS] New Project: PostGIS application to Missing People

2007-03-18 Thread Luis Gustavo Lira



Dear PostgreSQL community

Concept for a PostGIS application to Missing People:

This is a proposal for design a new concept for integrated PostGIS application
and how to implement features to improve tracking information about missing
people. This application will be useful in disaster scenarios, looking for
missing kids, rescue kidnapped people, human right watch, etc . This task surely
is not an easy one but I think it would be a great service to the global
community and I would really enjoy mentoring this project. That is my 
motivation.

I am heavily involved with many open source projects. I am a member of the Linux
User Group at Catholic University in Peru (TUXPUC) and the president of the
Peruvian Chapter of FreeCulture.org (Cultura Libre). 


I am a qualified and experienced engineer. I am quite familiar with Java, C++,
Python, Eclipse IDE, many SQLs which I use very often to programming. I hold a
degree in Physics and a graduate diplom in Telecomunications.

BSc. Luis Gustavo Lira
   



___
Si desea recibir, semanalmente, el Boletín Electrónico de la PUCP, ingrese a:
http://www.pucp.edu.pe/boletin/


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] xpath_array with namespaces support

2007-03-18 Thread Nikolay Samokhvalov

On 3/17/07, Andrew Dunstan <[EMAIL PROTECTED]> wrote:

In principle I am in favor of the patch.

Would it be better to use some more unlikely name for the dummy root
element used to process fragments than  ?

Perhaps even something in a special namespace?



I did think about it, but I didn't find any difficulties with simple
 The thing is that regardless the element name we have
corresponding shift in XPath epression -- so, there cannot be any
problem from my point of view... But maybe I don't see something and
it's better to avoid _possible_ problem. It depends on PostgreSQL code
style itself -- what is the best approach in such cases? To avoid
unknown possible difficulties or to be clear?

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] TODO item: Improve speed with indexes

2007-03-18 Thread Aleksis Petrov

Hi everyone,

Regarding the mentioned TODO item, the past discussion as I see from the
following thread:

http://archives.postgresql.org/pgsql-general/2007-01/msg00879.php


concludes that during vacuum full, the relations' indexes should be
recreated via REINDEX, rather than updating them.

One solution which I see to this is to simply invoke reindex_relation(), or
the lower level reindex_index() function somewhere after full_vacuum_rel()
routine is done with its work.

I prototyped this and it does solve the problem of index bloating during
vacuum full. However this may not be the best solution since full_vacuum_rel
function is still updating the relation's indexes.

The other solution could be to not update the indexes at all (just REINDEX
it). But this does'nt seem to fit in with the current implementation of
repair_frag() function.

What do you all think would be the correct approach to solve this?

Thanks


Re: [HACKERS] initdb fails - postgresql does not support leap seconds

2007-03-18 Thread Eric
Sorry, here's the first post, just now included hackers (more info in
previous post)...

The following error occurred when issuing a vanilla initdb command on
my Windows system.

FATAL:  could not select a suitable default timezone
DETAIL:  It appears that your GMT time zone uses leap seconds.
PostgreSQL does not support leap seconds.

I built this from 8.1.8 source (mingw/msys/gcc).  Configure, make and
make install were fine.

I previously ran just fine on version 8.1.5 (which I did not compile)
on the same system, no problems.

Didn't select anything unusual in configure either.

Thanks!


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] initdb fails - postgresql does not support leap seconds

2007-03-18 Thread Eric
A little more info that might help: I had to put a hack into two .h
files (libpq.h, libpq-int.h) and add

typedef int ssize_t

to get things to compile.  I did notice that libpq-int.h had this
typedef inside an #ifdef WIN32_ONLY_COMPILER which apparently wasn't
triggered.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] initdb fails - postgresql does not support leap seconds

2007-03-18 Thread Eric
Have now also tried:

downloading 8.2.2 and compiling it - no luck when running initdb (same
error)

rerunning configure with and without --with-includes=/mingw/include --
with-libraries=/mingw/lib (always running --without-zlib) - same error

I have been searching newsgroups, archives, etc. in hope someone has
seen this.  No luck there and no responses yet so I am including
hackers, I hope this is ok but am close to throwing in the towel (have
been throught the windows FAQ too).

I am a newcomer to PG development and wondering if there is light not
too far in the distance or if I can expect to wade through a lot more
of these problems before writing a line of code.  I was also hoping to
use MSVC++ but first things first, I would settle for a build under
msys/mingw that I can launch.

Windows does seem like the ugly stepchild for PG, would I be
significantly better off under Mac OS X (have it)?  Other Unix (would
have to buy it)?

Thanks again for any help...



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] xpath_array with namespaces support

2007-03-18 Thread Nikolay Samokhvalov

On 3/5/07, Nikolay Samokhvalov <[EMAIL PROTECTED]> wrote:

On 3/4/07, Nikolay Samokhvalov <[EMAIL PROTECTED]> wrote:
> I'll fix these issues and extend the patch with resgression tests and
> docs for xpath_array(). I'll resubmit it very soon.

Here is a new version of the patch. I didn't change any part of docs yet.
Since there were no objections I've changed the name of the function
to xmlpath().


Updated version of the patch contains bugfix: there were a problem
with path queries that pointed to elements (cases when a set of
document parts that correspond to subtrees should be returned).
Example is (included in regression test):

xmltest=# SELECT xmlpath('//b', 'one two three etc');
xmlpath
-
{two,etc}
(1 row)

Waiting for more feedback, please check it.

--
Best regards,
Nikolay
Index: src/backend/utils/adt/xml.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/xml.c,v
retrieving revision 1.35
diff -u -r1.35 xml.c
--- src/backend/utils/adt/xml.c	15 Mar 2007 23:12:06 -	1.35
+++ src/backend/utils/adt/xml.c	18 Mar 2007 13:32:21 -
@@ -47,6 +47,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #endif /* USE_LIBXML */
 
 #include "catalog/namespace.h"
@@ -67,6 +69,7 @@
 #include "utils/datetime.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
+#include "access/tupmacs.h"
 #include "utils/xml.h"
 
 
@@ -88,6 +91,7 @@
 static int		parse_xml_decl(const xmlChar *str, size_t *lenp, xmlChar **version, xmlChar **encoding, int *standalone);
 static bool		print_xml_decl(StringInfo buf, const xmlChar *version, pg_enc encoding, int standalone);
 static xmlDocPtr xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace, xmlChar *encoding);
+static text		*xml_xmlnodetoxmltype(xmlNodePtr cur);
 
 #endif /* USE_LIBXML */
 
@@ -1463,7 +1467,6 @@
 	return buf.data;
 }
 
-
 /*
  * Map SQL value to XML value; see SQL/XML:2003 section 9.16.
  */
@@ -2403,3 +2406,258 @@
 	else
 		appendStringInfoString(result, "\n\n");
 }
+
+
+/*
+ * XPath related functions
+ */
+
+#ifdef USE_LIBXML
+/* 
+ * Convert XML node to text (dump subtree in case of element, return value otherwise)
+ */
+text *
+xml_xmlnodetoxmltype(xmlNodePtr cur)
+{
+	xmlChar			*str;
+	xmltype*result;
+	size_tlen;
+	xmlBufferPtr 		buf;
+	
+	if (cur->type == XML_ELEMENT_NODE)
+	{
+		buf = xmlBufferCreate();
+		xmlNodeDump(buf, NULL, cur, 0, 1);
+		result = xmlBuffer_to_xmltype(buf);
+		xmlBufferFree(buf);
+	}
+	else
+	{
+		str = xmlXPathCastNodeToString(cur);
+		len = strlen((char *) str);
+		result = (text *) palloc(len + VARHDRSZ);
+		SET_VARSIZE(result, len + VARHDRSZ);
+		memcpy(VARDATA(result), str, len);
+	}
+	
+	return result;
+}
+#endif
+
+/*
+ * Evaluate XPath expression and return array of XML values.
+ * As we have no support of XQuery sequences yet, this functions seems
+ * to be the most useful one (array of XML functions plays a role of
+ * some kind of substritution for XQuery sequences).
+
+ * Workaround here: we parse XML data in different way to allow XPath for
+ * fragments (see "XPath for fragment" TODO comment inside).
+ */
+Datum
+xmlpath(PG_FUNCTION_ARGS)
+{
+#ifdef USE_LIBXML
+	ArrayBuildState		*astate = NULL;
+	xmlParserCtxtPtr	ctxt = NULL;
+	xmlDocPtr			doc = NULL;
+	xmlXPathContextPtr	xpathctx = NULL;
+	xmlXPathCompExprPtr	xpathcomp = NULL;
+	xmlXPathObjectPtr	xpathobj = NULL;
+	int32len, xpath_len;
+	xmlChar*string, *xpath_expr;
+	boolres_is_null = FALSE;
+	int	i;
+	xmltype*data;
+	text*xpath_expr_text;
+	ArrayType			*namespaces;
+	int	*dims, ndims, ns_count = 0, bitmask = 1;
+	char*ptr;
+	bits8*bitmap;
+	char**ns_names = NULL, **ns_uris = NULL;
+	int16typlen;
+	booltypbyval;
+	chartypalign;
+	
+	/* the function is not strict, we must check first two args */
+	if (PG_ARGISNULL(0) || PG_ARGISNULL(1))
+		PG_RETURN_NULL();
+	
+	xpath_expr_text = PG_GETARG_TEXT_P(0);
+	data  = PG_GETARG_XML_P(1);
+	
+	/* Namespace mappings passed as text[].
+	 * Assume that 2-dimensional array has been passed, 
+	 * the 1st subarray is array of names, the 2nd -- array of URIs,
+	 * example: ARRAY[ARRAY['myns', 'myns2'], ARRAY['http://example.com', 'http://example2.com']]. 
+	 */
+	if (!PG_ARGISNULL(2))
+	{
+		namespaces = PG_GETARG_ARRAYTYPE_P(2);
+		ndims = ARR_NDIM(namespaces);
+		dims = ARR_DIMS(namespaces);
+		
+		/* Sanity check */
+		if (ndims != 2)
+			ereport(ERROR, (errmsg("invalid array passed for namespace mappings"),
+			errdetail("Only 2-dimensional array may be used for namespace mappings.")));
+		
+		Assert(ARR_ELEMTYPE(namespaces) == TEXTOID);
+		
+		ns_count = ArrayGetNItems(ndims, dims) / 2;
+		get_typlenbyvalalign(ARR_ELEMTYPE(namespaces),
+			 &typlen, &typbyval, &typalign);
+		ns_names = (char **) palloc(ns_count * sizeof(char *));
+		ns_uris = (char **) palloc(ns_count * sizeof(char *));
+		ptr = ARR_DATA_PTR(names

Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Andrew Dunstan

Martijn van Oosterhout wrote:

On Sat, Mar 17, 2007 at 11:46:01AM -0400, Andrew Dunstan wrote:
  
How can we fix this? Frankly, the statement in the docs warning about 
making sure that escaped sequences are valid in the server encoding is a 
cop-out. We don't accept invalid data elsewhere, and this should be no 
different IMNSHO. I don't see why this should be any different from, 
say, date or numeric data. For years people have sneered at MySQL 
because it accepted dates like Feb 31st, and rightly so. But this seems 
to me to be like our own version of the same problem.



It seems to me that the easiest solution would be to forbid \x?? escape
sequences where it's greater than \x7F for UTF-8 server encodings.
Instead introduce a \u escape for specifying the unicode character
directly. Under the basic principle that any escape sequence still has
to represent a single character. The result can be multiple bytes, but
you don't have to check for consistancy anymore. 


Have a nice day,
  


The escape processing is actually done in the lexer in the case of 
literals. We have to allow for bytea literals there too, regardless of 
encoding. The lexer naturally has no notion of the intended destination 
of the literal, So we need to defer the validity check to the *in 
functions for encoding-aware types. And it as Tom has noted, COPY does 
its own escape processing but does it before the transcoding.


So ISTM that any solution other than something like I have proposed will 
probably involve substantial surgery.


It does also seem from my test results that transcoding to MB charsets 
(or at least to utf-8) is surprisingly expensive, and that this would be 
a good place to look at optimisation possibilities. The validity tests 
can also be somewhat expensive.


But correctness matters most, IMNSHO.

cheers

andrew




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Buildfarm feature request: some way to track/classify failures

2007-03-18 Thread Josh Berkus

Andrew,

Lastly, note that some buildfarm enhancements are on the SOC project 
list. I have no idea if anyone will express any interest in that, of 
course. It's not very glamorous work.


On the other hand, I think there are a lot more student perl hackers and 
web people than there are folks with the potential to do backend stuff. 
 So who knows?


--Josh


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Project suggestion: benchmark utility for PostgreSQL

2007-03-18 Thread Josh Berkus

Mickael,


I am a student and I would like to have your opinion on a project I
plan to submit for GSoC. At school, when I work with relational
databases I have problems to test tables' structure and queries
because I need to insert test data manually, which is very unpleasant.
Therefore, I suggest creating a benchmark utility for PostgreSQL.


I think your project sounds really cool, but also not doable in 3 months 
from scratch.  You need to build on the work of others.  Simply 
designing a viable benchmark schema and data set would be a 
more-than-3-month process; developing them *and* the tools to use them 
would likely take you more than a year.  I know whereof I speak.


Therefore, I think you should attach your proposal to one of the 
following projects:


PGBuildfarm: orient your tools more towards being "performance unit 
tests" than part of a benchmark.  Your tool could then become an 
additional component of the Buildfarm, and your mentor would be Andrew 
Dunstan.  Note that this would make any GUI components the last thing 
you do.


TPC-E/DBT5: you could work with Rilson on modularizing DBT5 so that 
users could run a smaller version and do "unit tests" of parts of the 
TPCE schema/queryset.  In that case, your mentor would be Mark Wong.


OpenJPA: the JPA project is working on creating OSDB+Java performance 
unit tests as well (database-agnostic).  I know they could use help; if 
you did this, I'd recuit a JPA person to be your mentor.


EAStress: You could use Spec's recently liberalized rules to build your 
tools on top of the EAstress workload.  This would have a couple 
disadvantages, though: EAstress doesn't use database features much, and 
the workload isn't open source, just free.  In that case, your mentor 
would be me.



For a programming language, as it would be for GSoC, it has to be
realized in three month and I believe the utility has to be
cross-platform (anyway I want it to be). So I think Java would be
good. I am very used to Java and Swing programming. What do you think
about that choice? If you feel Java is a bad choice, there is
C++/Boost/wxWidget/ (like pgAdmin). But with wxWidget, I am not sure
if a GUI works under Windows and Linux it will work under MacOS
without hacks.


I don't see any issue with using Java.   As SoC administrator, my main 
concern is that you finish a usable tool, so I'd go with whatever you 
can code the best in.


Anyway, it sounds like a really cool project, and I look forward to your 
application.


--Josh Berkus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Martijn van Oosterhout
On Sat, Mar 17, 2007 at 11:46:01AM -0400, Andrew Dunstan wrote:
> How can we fix this? Frankly, the statement in the docs warning about 
> making sure that escaped sequences are valid in the server encoding is a 
> cop-out. We don't accept invalid data elsewhere, and this should be no 
> different IMNSHO. I don't see why this should be any different from, 
> say, date or numeric data. For years people have sneered at MySQL 
> because it accepted dates like Feb 31st, and rightly so. But this seems 
> to me to be like our own version of the same problem.

It seems to me that the easiest solution would be to forbid \x?? escape
sequences where it's greater than \x7F for UTF-8 server encodings.
Instead introduce a \u escape for specifying the unicode character
directly. Under the basic principle that any escape sequence still has
to represent a single character. The result can be multiple bytes, but
you don't have to check for consistancy anymore.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] Bug in UTF8-Validation Code?

2007-03-18 Thread Grzegorz Jaskiewicz


Maybe we should add as resurce intensive check to ascii encoding(s),  
that would even the score ;p




let's test mysql on this, and see how worse does it perform.


--
Grzegorz 'the evil' Jaskiewicz

evil C/evil C++ developer for hire


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly