Re: [HACKERS] Searching for substring with tsearch(1/2)

2003-12-10 Thread Teodor Sigaev
Tsearch was never minded as prefix search, and index structure doesn't support 
any kind of prefix or suffix. But you can write extension to tsearch, which will 
search by prefix. But such solution wiil not use index, only sequence scan.


How efficient would tsearch be for really big expressions (where 'hu%'
would be expanded (using a btree word index on one column word table) to
tsearch equivalent of ( human or humanity or humming or huge or
..1000 words here...) before passing the expression to tsearch?
GiST index of tsearch doen't support prefix search, so it will works only by 
seqscan, as we know :) disk is much more slow than processor, speed will be 
limited by disk.


Prefix searches easy realized with inverted index, but it require a lot of 
programing.
The simplest way is:
create table invidx (
	lexeme text not null primary key,
ids[] int
);

where ids[] - array with identificators of documents which contains this word.


How hard (or sensible ;) would be creating such an index using GiST ?
As proved by tsearch GiST can cope well with many-to-many indexes.
Sorry, I don't understand. Do you mean that GiST supports one heap tuple in 
several index tuple? If yes then no :). GiST doesn't support this feature. I 
don't think that GiST may help in this situation.



create table invidx (
lexeme   text not null,
textdate date not null,
ids[]int,
primary  key (lexeme, textdate)
);
which would partition the invidx table on textdate (or some other
suitable datum)

2 If word is frequent then query with 'IN (select * from func()) may works slow...
if it is often too slow then creating a temp table and doing a plain
join may be faster.
Table structure as indidx decrease this problem.
--
Teodor Sigaev  E-mail: [EMAIL PROTECTED]
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] Searching for substring with tsearch(1/2)

2003-12-10 Thread Hannu Krosing
Teodor Sigaev kirjutas K, 10.12.2003 kell 11:20:
 Tsearch was never minded as prefix search, and index structure doesn't support 
 any kind of prefix or suffix. But you can write extension to tsearch, which will 
 search by prefix. But such solution wiil not use index, only sequence scan.
  
  
  How efficient would tsearch be for really big expressions (where 'hu%'
  would be expanded (using a btree word index on one column word table) to
  tsearch equivalent of ( human or humanity or humming or huge or
  ..1000 words here...) before passing the expression to tsearch?
 
 GiST index of tsearch doen't support prefix search, so it will works only by 
 seqscan, as we know :) disk is much more slow than processor, speed will be 
 limited by disk.

I meant that the expansion of 'hu%' is done before and outside of
tsearch, so the question is how efficient will tsearch be for searching
for hudreds or thousands of words in one expression.


  How hard (or sensible ;) would be creating such an index using GiST ?
  As proved by tsearch GiST can cope well with many-to-many indexes.
 
 Sorry, I don't understand. Do you mean that GiST supports one heap tuple in 
 several index tuple? If yes then no :). GiST doesn't support this feature. I 
 don't think that GiST may help in this situation.

but tsearch seems to support this, and tsearch uses GiST. Is this
functionality added entirely by tsearch ?

--
Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Searching for substring with tsearch(1/2)

2003-12-10 Thread Teodor Sigaev
I meant that the expansion of 'hu%' is done before and outside of
tsearch, so the question is how efficient will tsearch be for searching
for hudreds or thousands of words in one expression.
Ok, I see. The answer - bad. Index structure is signature tree with constant 
signature length, by default 2016 bits. Siganture makes by hashing word and sets 
bits number HASHVAL % 2016 to 1. So, if query has many terms and all terms are 
ored then there is a lot of signatures that matched by query. This means a lot 
of pages in index will be readed.


How hard (or sensible ;) would be creating such an index using GiST ?
As proved by tsearch GiST can cope well with many-to-many indexes.
Sorry, I don't understand. Do you mean that GiST supports one heap tuple in 
several index tuple? If yes then no :). GiST doesn't support this feature. I 
don't think that GiST may help in this situation.


but tsearch seems to support this, and tsearch uses GiST. Is this
functionality added entirely by tsearch ?
No, one heap tuple - one index tuple.

I'll try to explain index structure used by tsearch (three levels just for example):
Root page
 internal tuple 1  - second level page 1
   internal tuple 1.1 -
   internal tuple 1.2 -
 internal tuple 2  - second level page 2
   internal tuple 2.1 -
   internal tuple 2.2 - third level (leaf) page 2.2
  leaf tuple 2.2.1 - heap tuple
  leaf tuple 2.2.2 - heap tuple
leaf tuple contains one of two types of predicats:
  1 just lexemes (without psition information)
  2 if store size of first type is too big then tuple
stores signature as described above.
internal tuple contains ored (super-imposed) signatures of childs.



--
Teodor Sigaev  E-mail: [EMAIL PROTECTED]
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] ERROR: Index pg_toast_8443892_index is not a btree

2003-12-10 Thread strk
I agree about keeping it simple for the users. Anyway if that 
shows up a bad problems with either the implementation or the
operating system of the users it would be nice to know how 
to inspect it further. In my case this could also help
debugging a postgres extension (postgis) which is involved in
text-internal conversion and is showing heap corruption problems.

The question now is: what does that message mean ? Did a routine
try to create an index and left its work before finishing it ?

--strk;

JanWieck wrote:
 Christopher Kings-Lynne wrote:
 
  I couldn't agree more. Look at this very instance. He now found the 
  right reindex command and the corrupted file is gone. We don't have the 
  slightest clue what happened to that file. Was it truncated? Did some 
  other process scribble around in the shared memory? How do you tell now?
  
  The end user just could not care less.  They want their machine running 
  again as soon as is humanly possible without going through a back and 
  forth process of subscribing to some lists they don't care about, etc.
 
 I know, that's (unfortunately) true. Although it's not very farsighted 
 because better bug reports usually lead to better software in the next 
 release.
 
 
 Jan
 
 -- 
 #==#
 # It's easier to get forgiveness for being wrong than for being right. #
 # Let's break this rule - forgive me.  #
 #== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 73.5 and uw 713

2003-12-10 Thread ohp
Hi Tom,

At last I have a much better trace for the vacuum full bug.

Can some one help me on this one?

Image mémoire de postmaster (processus p1) créée
FICHIER IMAGE MEMOIRE [swapn dans qsort.c]
11 (segv code[SEGV_MAPERR] address[0x842]) SIGNALE dans p1
0xbffae03f (swapn+47:)  movl   (%esi),%eax
debug Suivi de pile correspondant à p1, Programme postmaster
*[0] swapn(0x2, 0x831b758, 0x831b770)   [0xbffae03f]
 [1] qst(0x80448cc, 0x831b758, 0x831b788)   [0xbffadca2]
 [2] qsort(0x831b758, 0x18, 0x2, 0x80eb9f8) [0xbffae17f]
 [3] repair_frag(vacrelstats=0x83122bc, onerel=0x82cf56c, vacuum_pages=0x8046a64, 
fraged_pages=0x8046a54, nindexes=1, Irel=0x83672e0)   [EMAIL PROTECTED]
 [4] full_vacuum_rel(onerel=0x82cf56c, vacstmt=0x83104b4)   [EMAIL PROTECTED]
 [5] vacuum_rel(relid=16408, vacstmt=0x83104b4, expected_relkind=114 (or 'r'))  [EMAIL 
PROTECTED]
 [6] vacuum(vacstmt=0x83104b4)  [EMAIL PROTECTED]
 [7] ProcessUtility(parsetree=0x83104b4, dest=Remote, completionTag=) [EMAIL 
PROTECTED]@713]
 [8] pg_exec_query_string(query_string=0x831020c, dest=Remote, 
parse_context=0x830e204) [EMAIL PROTECTED]@789]
 [9] PostgresMain(argc=4, argv=0x8046d78, username=ohp)   [EMAIL PROTECTED]@2013]
 [10] DoBackend(port=0x829e500) [EMAIL PROTECTED]
 [11] BackendStartup(port=0x829e500)[EMAIL PROTECTED]
 [12] ServerLoop( présumé: 0x1, 0x8297af8, 0x1) [EMAIL PROTECTED]
 [13] PostmasterMain(argc=1, argv=0x8297af8)[EMAIL PROTECTED]
 [14] main(argc=1, argv=0x8047c44, 0x8047c4c)   [EMAIL PROTECTED]
 [15] _start()  [0x806ad1c]
debug

On Mon, 8 Dec 2003, Tom Lane wrote:

 Date: Mon, 08 Dec 2003 14:03:42 -0500
 From: Tom Lane [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Cc: pgsql-hackers list [EMAIL PROTECTED]
 Subject: Re: [HACKERS] 73.5 and uw 713

 [EMAIL PROTECTED] writes:
  I've upgraded my system from 7.3.4 to 7.3.5 yesterday and have already
  experienced to crash during vacuum full.
  I have'nt recompiled with debug yet but it's a sigsegv in function
  repair_frag in vacuum.c

 Considering that vacuum.c hasn't changed in that branch since 7.3beta4,
 it's highly unlikely that this represents a regression between 7.3.4 and
 7.3.5.  Pre-existing bug, maybe ...

   regards, tom lane


-- 
Olivier PRENANT Tel: +33-5-61-50-97-00 (Work)
6, Chemin d'Harraud Turrou   +33-5-61-50-97-01 (Fax)
31190 AUTERIVE   +33-6-07-63-80-64 (GSM)
FRANCE  Email: [EMAIL PROTECTED]
--
Make your life a dream, make your dream a reality. (St Exupery)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] pljava revisited

2003-12-10 Thread Thomas Hallgren
Hi,
I'm working on a new pl/java prototype that I hope will become production
quality some time in the future. Before my project gets to far, I'd like to
gather some input from other users. I've taken a slightly different approach
than what seems to be the case for other attempts that I've managed to dig
up. Here's some highlights in my approach:

1. A new Java VM is spawned for each connection. I know that this will give
a performance hit when a new connection is created. The alternative however,
implies that all calls becomes inter-process calls which I think is a much
worse scenario. Especially since most modern environments today has some
kind of connection pooling. Another reason is that the connections
represents sessions and those sessions gets a very natural isolation using
separate VM's. A third reason is that the current connection would become
unavailable in a remote process (see #5).

2. There's no actual Java code in the body of a function. Simply a reference
to a static method. My reasoning is that when writing (and debugging) java,
you want to use your favorite IDE. Mixing Java with SQL just gets messy.

3. As opposed to the Tcl, Python, and Perl, that for obvious reasons uses
strings, my pl/java will use native types wherever possible. A flag can be
added to the function definition if real objects are preferred instead of
primitives (motivated by the fact that the primitives cannot reflect NULL
values).

4. The code is actually written using JNI and C++ but without any templates,
no -style object references, no operator overloads, external class
libraries etc. I use C++ simply to get better quality, readability and
structure on the code.

5. I plan to write a JDBC layer using JNI on top of the SPI calls to enable
JDBC functionality on the current connection. Some things will be limited
(begin/commit etc. will not be possible to do here for instance).

Current status is that my first calls from Postgres to Java has been made.
Lot's of work remain.

What are your thoughts and ideas?

Thomas Hallgren



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] PostgreSQL port to pure Java?

2003-12-10 Thread Merlin Moncure
Robert Treat wrote:
 Someone did it but it didn't catch fire.

I think what will catch fire in a big way is plphp.  Managers will like
an all php platform that is extremely capable and productive.
Developers will enjoy php's natural syntax and agnostic approach to
programming.  PHP5, when it becomes production ready, will offer high
level language features that compete with Java, C#, and Object Pascal.  

Merlin

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] ERROR: Index pg_toast_8443892_index is not a btree

2003-12-10 Thread Tom Lane
strk [EMAIL PROTECTED] writes:
 The question now is: what does that message mean ?

It means that the magic number that should be on the first page of the
btree index isn't right.  We can deduce that something has clobbered the
first page of the index, but guessing what and how requires much more
information.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] pljava revisited

2003-12-10 Thread Andrew Dunstan
Thomas Hallgren wrote:

Hi,
I'm working on a new pl/java prototype that I hope will become production
quality some time in the future. Before my project gets to far, I'd like to
gather some input from other users. I've taken a slightly different approach
than what seems to be the case for other attempts that I've managed to dig
up. Here's some highlights in my approach:
1. A new Java VM is spawned for each connection. I know that this will give
a performance hit when a new connection is created. The alternative however,
implies that all calls becomes inter-process calls which I think is a much
worse scenario. Especially since most modern environments today has some
kind of connection pooling. Another reason is that the connections
represents sessions and those sessions gets a very natural isolation using
separate VM's. A third reason is that the current connection would become
unavailable in a remote process (see #5).
Maybe on-demand might be better - if the particular backend doesn't need 
it why incur the overhead?

2. There's no actual Java code in the body of a function. Simply a reference
to a static method. My reasoning is that when writing (and debugging) java,
you want to use your favorite IDE. Mixing Java with SQL just gets messy.


Perhaps an example or two might help me understand better how this would 
work.

3. As opposed to the Tcl, Python, and Perl, that for obvious reasons uses
strings, my pl/java will use native types wherever possible. A flag can be
added to the function definition if real objects are preferred instead of
primitives (motivated by the fact that the primitives cannot reflect NULL
values).
4. The code is actually written using JNI and C++ but without any templates,
no -style object references, no operator overloads, external class
libraries etc. I use C++ simply to get better quality, readability and
structure on the code.
Other pl* (perl, python, tcl) languages have vanilla C glue code. Might 
be better to stick to this. If you aren't using advanced C++ features 
that shouldn't be too hard - well structured C can be just as readable 
as well structured C++. At the very lowest level, about the only things 
C++ buys you are the ability to declare variables in arbitrary places, 
and // style comments.

5. I plan to write a JDBC layer using JNI on top of the SPI calls to enable
JDBC functionality on the current connection. Some things will be limited
(begin/commit etc. will not be possible to do here for instance).
Again. examples would help me understand better.

Is there a web page for your project?

cheers

andrew

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] ERROR: Index pg_toast_8443892_index is not a btree

2003-12-10 Thread Jan Wieck
Tom Lane wrote:

strk [EMAIL PROTECTED] writes:
The question now is: what does that message mean ?
It means that the magic number that should be on the first page of the
btree index isn't right.  We can deduce that something has clobbered the
first page of the index, but guessing what and how requires much more
information.
Clobbered or truncated. A zero size index file causes the same message.

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] pljava revisited

2003-12-10 Thread Andrew Rawnsley
On Dec 10, 2003, at 11:23 AM, Andrew Dunstan wrote:

Thomas Hallgren wrote:

Hi,
I'm working on a new pl/java prototype that I hope will become 
production
quality some time in the future. Before my project gets to far, I'd 
like to
gather some input from other users. I've taken a slightly different 
approach
than what seems to be the case for other attempts that I've managed 
to dig
up. Here's some highlights in my approach:

1. A new Java VM is spawned for each connection. I know that this 
will give
a performance hit when a new connection is created. The alternative 
however,
implies that all calls becomes inter-process calls which I think is a 
much
worse scenario. Especially since most modern environments today has 
some
kind of connection pooling. Another reason is that the connections
represents sessions and those sessions gets a very natural isolation 
using
separate VM's. A third reason is that the current connection would 
become
unavailable in a remote process (see #5).

Maybe on-demand might be better - if the particular backend doesn't 
need it why incur the overhead?

I think a JVM per connection is going to add too much overhead, even if 
its on-demand. Some platforms handle
multiple JVMs better than others, but still. 25 or so individual JVMs 
is going to be  a mess, in terms of resource consumption.

Start time/connect time will be an issue. Saying 'people use pools', 
while generally accurate, kind of sweeps the problem
under the carpet instead of the dust bin.

2. There's no actual Java code in the body of a function. Simply a 
reference
to a static method. My reasoning is that when writing (and debugging) 
java,
you want to use your favorite IDE. Mixing Java with SQL just gets 
messy.



Perhaps an example or two might help me understand better how this 
would work.

3. As opposed to the Tcl, Python, and Perl, that for obvious reasons 
uses
strings, my pl/java will use native types wherever possible. A flag 
can be
added to the function definition if real objects are preferred 
instead of
primitives (motivated by the fact that the primitives cannot reflect 
NULL
values).

4. The code is actually written using JNI and C++ but without any 
templates,
no -style object references, no operator overloads, external class
libraries etc. I use C++ simply to get better quality, readability and
structure on the code.

Other pl* (perl, python, tcl) languages have vanilla C glue code. 
Might be better to stick to this. If you aren't using advanced C++ 
features that shouldn't be too hard - well structured C can be just as 
readable as well structured C++. At the very lowest level, about the 
only things C++ buys you are the ability to declare variables in 
arbitrary places, and // style comments.

Agreed. Given that the rest of the code base is CI would imagine 
that the Powers that Be would frown a bit on merging
C++ code in, and relegate it to contrib for eternity...

Not knocking the idea, mind you - I think it would be great if it can 
be pulled off. Was thinking about it myself as a way to learn more
of the backend code and scrape the thick layer of rust off of my C 
skills. Would like to see where you are with it.


5. I plan to write a JDBC layer using JNI on top of the SPI calls to 
enable
JDBC functionality on the current connection. Some things will be 
limited
(begin/commit etc. will not be possible to do here for instance).

Again. examples would help me understand better.

Is there a web page for your project?

cheers

andrew

---(end of 
broadcast)---
TIP 5: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faqs/FAQ.html



Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pljava revisited

2003-12-10 Thread Thomas Hallgren
The JVM will be started on-demand.
Although I realize that one JVM per connection will consume a fair amount of
resources, I still think it is the best solution. The description of this
system must of course make it very clear that this is what happens and
ultimately provide the means of tuning the JVM's as much as possible.

I advocate this solution because I think that the people that has the
primary interest of a pl/java will be those who write enterprise systems
using Java. J2EE systems are always equipped with connection pools.

But, I'm of course open for other alternatives. Let's say that there's a JVM
with a thread-pool that the Postgress sessions will connect to using some
kind of RPC. This implies that each call will have an overhead of at least 2
OS context switches. Compared to in-process calls, this will severely
crippel the performance. How do you suggest that we circumvent this problem?

Antother problem is that we will immeditately loose the ability to use the
current connection provided by the SPI interfaces. We can of course
establish a back-channel to the original process but that will incure even
more performance hits. A third alternative is to establish brand new
connections in the remote JVM. Problem then is to propagate the transaction
context correctly. Albeit solvable, the performance using distributed
transactions will be much worse than in-process. How do we solve this?

C++ or C is not a big issue. I might rewrite it into pure C. The main reason
for C++ is to be able to use objects with virtual methods. I know how to do
that in C too but I don't quite agree that its just as clean :-)

- thomas

 I think a JVM per connection is going to add too much overhead, even if
 its on-demand. Some platforms handle
 multiple JVMs better than others, but still. 25 or so individual JVMs
 is going to be  a mess, in terms of resource consumption.

 Start time/connect time will be an issue. Saying 'people use pools',
 while generally accurate, kind of sweeps the problem
 under the carpet instead of the dust bin.

 
  2. There's no actual Java code in the body of a function. Simply a
  reference
  to a static method. My reasoning is that when writing (and debugging)
  java,
  you want to use your favorite IDE. Mixing Java with SQL just gets
  messy.
 
 
 
  Perhaps an example or two might help me understand better how this
  would work.
 
 
  3. As opposed to the Tcl, Python, and Perl, that for obvious reasons
  uses
  strings, my pl/java will use native types wherever possible. A flag
  can be
  added to the function definition if real objects are preferred
  instead of
  primitives (motivated by the fact that the primitives cannot reflect
  NULL
  values).
 
  4. The code is actually written using JNI and C++ but without any
  templates,
  no -style object references, no operator overloads, external class
  libraries etc. I use C++ simply to get better quality, readability and
  structure on the code.
 
 
  Other pl* (perl, python, tcl) languages have vanilla C glue code.
  Might be better to stick to this. If you aren't using advanced C++
  features that shouldn't be too hard - well structured C can be just as
  readable as well structured C++. At the very lowest level, about the
  only things C++ buys you are the ability to declare variables in
  arbitrary places, and // style comments.
 

 Agreed. Given that the rest of the code base is CI would imagine
 that the Powers that Be would frown a bit on merging
 C++ code in, and relegate it to contrib for eternity...

 Not knocking the idea, mind you - I think it would be great if it can
 be pulled off. Was thinking about it myself as a way to learn more
 of the backend code and scrape the thick layer of rust off of my C
 skills. Would like to see where you are with it.


 
  5. I plan to write a JDBC layer using JNI on top of the SPI calls to
  enable
  JDBC functionality on the current connection. Some things will be
  limited
  (begin/commit etc. will not be possible to do here for instance).
 
 
  Again. examples would help me understand better.
 
  Is there a web page for your project?
 
 
  cheers
 
  andrew
 
 
  ---(end of
  broadcast)---
  TIP 5: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faqs/FAQ.html
 
 

 Andrew Rawnsley
 President
 The Ravensfield Digital Resource Group, Ltd.
 (740) 587-0114
 www.ravensfield.com


 ---(end of broadcast)---
 TIP 7: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pljava revisited

2003-12-10 Thread Andrew Dunstan
Thomas Hallgren wrote:

The JVM will be started on-demand.
Although I realize that one JVM per connection will consume a fair amount of
resources, I still think it is the best solution. The description of this
system must of course make it very clear that this is what happens and
ultimately provide the means of tuning the JVM's as much as possible.
I advocate this solution because I think that the people that has the
primary interest of a pl/java will be those who write enterprise systems
using Java. J2EE systems are always equipped with connection pools.
Yes, but as was pointed out even if I use connection pooling I would 
rather not have, say, 25 JVMs loaded if I can help it.

But, I'm of course open for other alternatives. Let's say that there's a JVM
with a thread-pool that the Postgress sessions will connect to using some
kind of RPC. This implies that each call will have an overhead of at least 2
OS context switches. Compared to in-process calls, this will severely
crippel the performance. How do you suggest that we circumvent this problem?


Context switches are not likely to be more expensive that loading an 
extra JVM, I suspect. Depending on your OS/hw they can be incredibly 
cheap, in fact.

Antother problem is that we will immeditately loose the ability to use the
current connection provided by the SPI interfaces. We can of course
establish a back-channel to the original process but that will incure even
more performance hits. A third alternative is to establish brand new
connections in the remote JVM. Problem then is to propagate the transaction
context correctly. Albeit solvable, the performance using distributed
transactions will be much worse than in-process. How do we solve this?
We are theorising ahead of data, somewhat. My suggestion would be to 
continue in the direction you are going, and later, when you can, stress 
test it. Ideally, if you then need to move to a shared JVM this would be 
transparent to upper levels of the code.

C++ or C is not a big issue. I might rewrite it into pure C. The main reason
for C++ is to be able to use objects with virtual methods. I know how to do
that in C too but I don't quite agree that its just as clean :-)
 

Maybe not, but it's what is used in the core Pg distribution. Go with 
the flow :-)

cheers

andrew

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] pljava revisited

2003-12-10 Thread Andrew Rawnsley
On Dec 10, 2003, at 1:51 PM, Andrew Dunstan wrote:

Thomas Hallgren wrote:

The JVM will be started on-demand.
Although I realize that one JVM per connection will consume a fair 
amount of
resources, I still think it is the best solution. The description of 
this
system must of course make it very clear that this is what happens and
ultimately provide the means of tuning the JVM's as much as possible.

I advocate this solution because I think that the people that has the
primary interest of a pl/java will be those who write enterprise 
systems
using Java. J2EE systems are always equipped with connection pools.

Yes, but as was pointed out even if I use connection pooling I would 
rather not have, say, 25 JVMs loaded if I can help it.

Its also a bit of a solution by circumstance, rather that a solution by 
design.

But, I'm of course open for other alternatives. Let's say that 
there's a JVM
with a thread-pool that the Postgress sessions will connect to using 
some
kind of RPC. This implies that each call will have an overhead of at 
least 2
OS context switches. Compared to in-process calls, this will severely
crippel the performance. How do you suggest that we circumvent this 
problem?

My comments here are pretty off the cuff. You've thought about this far 
more than I have.




Context switches are not likely to be more expensive that loading an 
extra JVM, I suspect. Depending on your OS/hw they can be incredibly 
cheap, in fact.

Antother problem is that we will immeditately loose the ability to 
use the
current connection provided by the SPI interfaces. We can of course
establish a back-channel to the original process but that will incure 
even
more performance hits. A third alternative is to establish brand new
connections in the remote JVM. Problem then is to propagate the 
transaction
context correctly. Albeit solvable, the performance using distributed
transactions will be much worse than in-process. How do we solve this?

We are theorising ahead of data, somewhat. My suggestion would be to 
continue in the direction you are going, and later, when you can, 
stress test it. Ideally, if you then need to move to a shared JVM this 
would be transparent to upper levels of the code.

Agreed - sounds like you've done a fair amount of ground work. I at 
least am interested in where you're going with it.


C++ or C is not a big issue. I might rewrite it into pure C. The main 
reason
for C++ is to be able to use objects with virtual methods. I know how 
to do
that in C too but I don't quite agree that its just as clean :-)

Maybe not, but it's what is used in the core Pg distribution. Go with 
the flow :-)

cheers

andrew

---(end of 
broadcast)---
TIP 7: don't forget to increase your free space map settings



Andrew Rawnsley
President
The Ravensfield Digital Resource Group, Ltd.
(740) 587-0114
www.ravensfield.com
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] pljava revisited

2003-12-10 Thread Jan Wieck
Andrew Rawnsley wrote:

Other pl* (perl, python, tcl) languages have vanilla C glue code. 
Might be better to stick to this. If you aren't using advanced C++ 
features that shouldn't be too hard - well structured C can be just as 
readable as well structured C++. At the very lowest level, about the 
only things C++ buys you are the ability to declare variables in 
arbitrary places, and // style comments.

Agreed. Given that the rest of the code base is CI would imagine 
that the Powers that Be would frown a bit on merging
C++ code in, and relegate it to contrib for eternity...
It will probably have to live on GBorg right from the beginning anyway, 
so the Powers might not care at all.

Thus far _all_ procedural languages are loadable modules. VM or not, I 
don't see why this one would be any different. That also answers the on 
demand question to some extent, doesn't it?

Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pljava revisited

2003-12-10 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Thomas Hallgren wrote:
 C++ or C is not a big issue. I might rewrite it into pure C. The main reason
 for C++ is to be able to use objects with virtual methods. I know how to do
 that in C too but I don't quite agree that its just as clean :-)

 Maybe not, but it's what is used in the core Pg distribution. Go with 
 the flow :-)

If you have any hope of someday seeing pljava merged into the main
PG distribution, you had better stick to C.  IMHO there would be
essentially no chance of adopting a module that requires C++, simply
because the additional configuration and portability work would be
too much of a pain in the neck.  libpq++ got heaved overboard largely
because the autoconf burden for it was too high, and we're unlikely
to look favorably on something that would make us put that back in.

Of course, if you don't think pljava will ever become mainstream,
this argument won't have much force to you ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pljava revisited

2003-12-10 Thread Robert Treat
On Wed, 2003-12-10 at 13:04, Jan Wieck wrote:
 Andrew Rawnsley wrote:
 
  Other pl* (perl, python, tcl) languages have vanilla C glue code. 
  Might be better to stick to this. If you aren't using advanced C++ 
  features that shouldn't be too hard - well structured C can be just as 
  readable as well structured C++. At the very lowest level, about the 
  only things C++ buys you are the ability to declare variables in 
  arbitrary places, and // style comments.
 
  
  Agreed. Given that the rest of the code base is CI would imagine 
  that the Powers that Be would frown a bit on merging
  C++ code in, and relegate it to contrib for eternity...
 
 It will probably have to live on GBorg right from the beginning anyway, 
 so the Powers might not care at all.
 
 Thus far _all_ procedural languages are loadable modules. VM or not, I 
 don't see why this one would be any different. That also answers the on 
 demand question to some extent, doesn't it?
 

Maybe I'm mixing concepts here, but didn't Joe Conway create the ability
to do pl module loading on demand or on connection creation via GUC? 
ISTR he needed this due to R's overhead. If so seems this could be
implemented both ways, with a recommendation on which is best to follow.
Speaking of plR, I'd recommend anyone interested in developing pl's,
whether enhancing old ones or creating new ones, to check out the plR
code on gborg, it was written recently and is pretty advanced.

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] pljava revisited

2003-12-10 Thread ow

--- Thomas Hallgren [EMAIL PROTECTED] wrote:
 The JVM will be started on-demand.
 Although I realize that one JVM per connection will consume a fair amount of
 resources, I still think it is the best solution. The description of this
 system must of course make it very clear that this is what happens and
 ultimately provide the means of tuning the JVM's as much as possible.

I think the new 1.5 JDK Tiger (to be released soon) will feature the shared
VM option, i.e. one JVM could be used to run multiple and independent apps.
Maybe worth looking into this.

 I advocate this solution because I think that the people that has the
 primary interest of a pl/java will be those who write enterprise systems
 using Java. J2EE systems are always equipped with connection pools.

IMHO, pl/java would be a great feature for Postgresql to have. It would
increase pgSql's chances to be considered as an enterprise RDBMS since most
of the enterprise apps are written in Java nowdays.

Regards,




__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: Index pg_toast_8443892_index is not a btree

2003-12-10 Thread Robert Treat
On Tue, 2003-12-09 at 20:19, Jan Wieck wrote:
 Christopher Kings-Lynne wrote:
 
  I couldn't agree more. Look at this very instance. He now found the 
  right reindex command and the corrupted file is gone. We don't have the 
  slightest clue what happened to that file. Was it truncated? Did some 
  other process scribble around in the shared memory? How do you tell now?
  
  The end user just could not care less.  They want their machine running 
  again as soon as is humanly possible without going through a back and 
  forth process of subscribing to some lists they don't care about, etc.
 
 I know, that's (unfortunately) true. Although it's not very farsighted 
 because better bug reports usually lead to better software in the next 
 release.
 

HINT::  You might be able to solve this problem by running the REINDEX
command. Of course if you do that you'll destroy all evidence of what
caused the problem, possibly forcing this problem on other users in the
future because you were unwilling to help us to improve the software.
But we understand, it's not like we wrote an entire database system for
you... oh wait we did.

:-)

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pljava revisited

2003-12-10 Thread Thomas Hallgren
Two comments.
Context switches are of course much cheaper then loading a JVM. No argument
there. The point is that the JVM is loaded once for each connection (when
the connection makes the first call to a java function). Millions of calls
may follow that reuses the same JVM. Each of those calls will suffer from
context switches if the JVM is remote. A 1 to a million (or more) ratio is
in fact very likey when function calls are used in predicates and/or
projections of selects on larger tables.

Regarding C++, as I said, no big deal. I'll change it for the reasons
mentioned before I release my first cut.

Thanks,

- thomas



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[HACKERS] Canonicalization of WHERE clauses considered harmful

2003-12-10 Thread Tom Lane
I've been thinking about Josh's recent complaint about poor planning
of queries like

SELECT t1.a, t2.b
FROM t1, t2
WHERE t1.a = t2.a
AND (
( t1.c = x
  AND t1.f IN (m, n, o)
  AND t2.d = v
  AND t2.e BETWEEN j AND k
)
OR
( t1.c = y
  AND t1.f IN (n, o, p)
  AND t2.d = v
  AND t2.e BETWEEN k AND h
)
OR 
( t1.c = z
  AND t1.f IN (p, q)
  AND t2.d = w
  AND t2.e BETWEEN k AND h
)
)

(which is an abstraction of Q19 in the TPC-R benchmark).  I believe that
most of the blame for this problem can be laid at the feet of 
optimizer/prep/prepqual.c, which attempts to transform the WHERE clause
into a canonical form (either CNF AND-of-ORs, or DNF OR-of-ANDs, according
to some heuristics that are not very compelling).  I am not certain
whether prepqual will actually fire on this query (still waiting for those
EXPLAIN results, Josh), but if it doesn't then it'd really be disastrous,
and if it does then it's still not going to improve matters much, because
forcing this WHERE into pure AND-of-ORs form isn't very helpful.

I've worked on prepqual.c some over the years, but it never occurred to me
to question the fundamental premise of creating a normal-form WHERE clause
before.  Yet as I look at this example, I'm wondering why we should try
to.

Josh lied a little bit about the TPC-R query: the form actually given in
the TPC spec is

WHERE
(
  t1.a = t2.a
  AND t1.c = x
  AND t1.f IN (m, n, o)
  AND t2.d = v
  AND t2.e BETWEEN j AND k
)
OR
(
  t1.a = t2.a
  AND t1.c = y
  AND t1.f IN (n, o, p)
  AND t2.d = v
  AND t2.e BETWEEN k AND h
)
etc

It is absolutely critical that the optimizer recognize that the t1.a =
t2.a clause is common to all the OR branches and can be factored out
to give the form Josh showed.  Without that, you don't have any equijoin
clause and you're going to end up with a horrid nested-loop join.
In the TPC query per spec, some of the other single-table restrictions are
also common to all the OR branches (ie, x, y, etc are not all different
constants) and should be pulled out so that they can be applied as
restriction clauses to the input tables before the join.

In other words, we'd like the optimizer to transform
(a AND b) OR (a AND c)
to
a AND (b OR c)

Currently, this is accomplished by the roundabout method of converting
the WHERE clause to CNF (AND-of-ORs) and then simplifying duplicate
sub-clauses within an OR:
(a AND b) OR (a AND c)
expands by repeated application of the distributive law to
(a OR a) AND (a OR c) AND (b OR a) AND (b OR c)
and then qual_cleanup notices that (a OR a) is redundant, leaving
a AND (a OR c) AND (b OR a) AND (b OR c)
So we manage to pull out a all right, but we've left the query cluttered
with additional, redundant clauses --- there is no logic that will notice
that this could be simplified to
a AND (b OR c)
The extra clauses make for useless work during planning and during
execution; they also screw up selectivity estimates (since the selectivity
estimator doesn't realize they are redundant).  This is bad.

So as a mechanism for pulling out duplicate clauses, canonicalization
sucks.  Is there anything else that it does for us?  After some thought
the only possible benefit I can see is that it can sometimes manufacture
restriction clauses from what otherwise would be join clauses.  In the
TPC-R example, once we've pulled out duplicate clauses we'd still be left
with an OR of ANDed conditions, some on t1 and some on t2.  If we leave
this as-is, there's not much to be done with it except apply it as a join
filter after the t1/t2 join is made.  However, if we force to CNF form
then some of the resulting sub-clauses will refer only to t1, some only
to t2, and some to both tables.  The ones that mention only t1 or only
t2 would be usable as restriction clauses during the table scans, which
is a big help in cutting the size of the required join.  Also they could
possibly be used as indexscan qualifications (I'm not sure whether TPC-R
expects relevant indexes to be in place).

So it seems to me that a reasonable heuristic for deciding whether to
CNF-ify an OR clause is to check whether any of the resulting subclauses
will refer to fewer relations than the original OR clause does.  If not,
we may as well leave it as an OR clause and avoid generating redundant
clauses.  There is no such test at present.

Pulling out duplicate clauses from the OR arms would probably be better
done directly than by relying on CNF-ification.

Another consideration is that the planner already has logic for generating
multi-indexscans by pulling relevant conditions out of OR-of-AND trees.
So, at least for the case where there's a relevant index, we do not need
any CNF conversion to be 

Re: [HACKERS] Canonicalization of WHERE clauses considered harmful

2003-12-10 Thread Bruno Wolff III
On Wed, Dec 10, 2003 at 16:54:54 -0500,
  Tom Lane [EMAIL PROTECTED] wrote:
 In other words, we'd like the optimizer to transform
   (a AND b) OR (a AND c)
 to
   a AND (b OR c)
 
 Currently, this is accomplished by the roundabout method of converting
 the WHERE clause to CNF (AND-of-ORs) and then simplifying duplicate
 sub-clauses within an OR:
   (a AND b) OR (a AND c)
 expands by repeated application of the distributive law to
   (a OR a) AND (a OR c) AND (b OR a) AND (b OR c)
 and then qual_cleanup notices that (a OR a) is redundant, leaving
   a AND (a OR c) AND (b OR a) AND (b OR c)
 So we manage to pull out a all right, but we've left the query cluttered
 with additional, redundant clauses --- there is no logic that will notice
 that this could be simplified to
   a AND (b OR c)
 The extra clauses make for useless work during planning and during
 execution; they also screw up selectivity estimates (since the selectivity
 estimator doesn't realize they are redundant).  This is bad.
 
 
 Comments?

Shouldn't it be possible to simplify
a AND (a OR c) AND (b OR a) AND (b OR c)
to
a AND (b or c)
using
a AND (a OR x) == a
?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Canonicalization of WHERE clauses considered harmful

2003-12-10 Thread Tom Lane
Bruno Wolff III [EMAIL PROTECTED] writes:
 Shouldn't it be possible to simplify
 a AND (a OR c) AND (b OR a) AND (b OR c)
 to
 a AND (b or c)
 using
 a AND (a OR x) == a

That would be one possible response, but it strikes me as a band-aid fix.
It would add quite a bit of overhead (looking to see if any OR
subclauses match any top-level clauses) on top of the rather expensive
equality checks qual_cleanup() is already making.  Another problem is
that sometimes prepqual decides not to attempt CNF-ification at all,
which means that the ability to pull out duplicate subclauses is lost
altogether.

I think it'd be cleaner to expend the cycles on a direct check for
duplicate subclauses, which we could apply independently of any decision
to convert what remains to CNF form.

We've had lots of problems with prepqual before, which is why those
heuristics about whether to try for CNF or DNF are in there; but I've
never been very happy about them.  What I'm on about now is the idea
that maybe the whole problem should be thrown overboard ...

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Canonicalization of WHERE clauses considered harmful

2003-12-10 Thread Kurt Roeckx
On Wed, Dec 10, 2003 at 04:54:54PM -0500, Tom Lane wrote:
 Currently, this is accomplished by the roundabout method of converting
 the WHERE clause to CNF (AND-of-ORs) and then simplifying duplicate
 sub-clauses within an OR:
   (a AND b) OR (a AND c)
 expands by repeated application of the distributive law to
   (a OR a) AND (a OR c) AND (b OR a) AND (b OR c)

This is wrong.

What would be true however is:

(a OR b) AND (b OR C)
= (a AND b) OR (a AND c) OR (b AND b) OR (b AND C)

(Replace AND by * and OR by +)


Kurt


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Canonicalization of WHERE clauses considered harmful

2003-12-10 Thread Kurt Roeckx
On Wed, Dec 10, 2003 at 05:35:11PM -0500, Tom Lane wrote:
 Kurt Roeckx [EMAIL PROTECTED] writes:
  On Wed, Dec 10, 2003 at 04:54:54PM -0500, Tom Lane wrote:
  (a AND b) OR (a AND c)
  expands by repeated application of the distributive law to
  (a OR a) AND (a OR c) AND (b OR a) AND (b OR c)
 
  This is wrong.
 
 I don't think so.  The distributive law is
 
   a OR (b AND c) == (a OR b) AND (a OR c)

Oh, I was reading it wrong.

So basicly it makes a + bc = (a+b)*(a+c) = a + ab + ac + bc
And ab + ac = a * (a+b) * (a+c) * (b+c) = ab + abc + ac


(This is so much easier to read then using AND and OR.)


Kurt


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[HACKERS] CSV hack

2003-12-10 Thread David Fetter
Kind people,

I've come up with yet another little hack, this time for turning 1-d
arrays into CSV format.  It's very handy in conjunction with the
array_accum aggregate (can this be made a standard aggregate?) in
http://developer.postgresql.org/docs/postgres/xaggr.html.

Here 'tis...

CREATE OR REPLACE FUNCTION csv(anyarray) RETURNS TEXT AS
'DECLARE
in_array ALIAS FOR $1;
temp_string TEXT;
quoted_string TEXT;
i INTEGER;
BEGIN
FOR i IN array_lower(in_array, 1)..array_upper(in_array, 1)
LOOP
IF in_array[i]::TEXT ~ 
THEN
temp_string :=  || replace(in_array[i]::TEXT, , )  || 
;
ELSE
temp_string := in_array[i]::TEXT;
END IF;
IF i = array_lower(in_array, 1)
THEN
quoted_string := temp_string;
ELSE
quoted_string := quoted_string || '','' || temp_string;
END IF;
END LOOP;
RETURN quoted_string;
END;
' LANGUAGE 'plpgsql';

Those DBD::Pg users among us who'd like to be able to bind_columns to
postgresql arrays may have a leg up with Text::CSV_XS.

Other middleware should be able to handle such things, too. :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100cell: +1 415 235 3778

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] [pgadmin-support] Cannot add an column of type serial

2003-12-10 Thread Andreas Pflug
David Sigeti wrote:

At 12:21 PM 12/10/2003 +0100, Andreas Pflug wrote:

David Sigeti wrote:

I am using pgadminIII 1.0.2 with PostgreSQL 7.4 under W2K SP4 and 
Cygwin (current as of 2 or 3 weeks ago).  The server is running 
locally.

If I attempt to add an column of type serial or bigserial to a 
table, the operation fails with a message telling me that adding an 
column with a default value is not implemented.  What is really 
weird is that one can, in fact, add a column with a default value, 
like an int4 with a default value or 0.

pgAdmin breaks this into two steps for you; have a look at the SQL page.

The problem does not arise if the serial (or bigserial) column is 
created when the table is created (as a part of CREATE TABLE).
serial isn't a true datatype but a keyword that can be used only at 
creation time. Still, pgAdmin3 should create the commands necessary 
for this to make life easier; I added this as a TODO item.


Adding this functionality would be great but I suggest that the error 
message be changed right away as it is seriously misleading.

The error message originates from the backend. I agree that this might 
be confusing to newbies, who don't know that serial actually is int4 
default nextval('...')  (currently pgAdmin3 doesn't either :-)

I'm forwarding this to pgsql-hackers.

Regards,
Andreas


---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] postgres panic error

2003-12-10 Thread Jason Tishler
Yurgis,

On Tue, Dec 09, 2003 at 04:18:06PM -0800, Yurgis Baykshtis wrote:
 I tried to raise the question on pg-hackers forum and cygwin forum
 (regarding readdir() misbehavior) but could not get any help so far :(

If you can produce a minimal test case that reproduces the problem, then
one of the core Cygwin developers might be more willing to attempt to
fix it.

Jason

-- 
PGP/GPG Key: http://www.tishler.net/jason/pubkey.asc or key servers
Fingerprint: 7A73 1405 7F2B E669 C19D  8784 1AFD E4CC ECF4 8EF6

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] CSV hack

2003-12-10 Thread Joe Conway
David Fetter wrote:
I've come up with yet another little hack, this time for turning 1-d 
arrays into CSV format.
You mean like this (which is new in 7.4)?

regression=# select array_to_string (array[1,2,3], ',');
 array_to_string
-
 1,2,3
(1 row)
See:
http://www.postgresql.org/docs/current/static/functions-array.html
It's very handy in conjunction with the array_accum aggregate (can
this be made a standard aggregate?) in 
http://developer.postgresql.org/docs/postgres/xaggr.html.
Early in the 7.4 dev cycle array_accum() was actually in cvs as a 
built-in C function (and it still does exist in PL/R as such). But 
toward the end of the cycle an objection was raised and it was removed. 
Search the archives in the May/June 2003 timeframe.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [Dbdpg-general] Re: [HACKERS] CSV hack

2003-12-10 Thread David Fetter
On Wed, Dec 10, 2003 at 05:08:19PM -0800, Joe Conway wrote:
 David Fetter wrote:
 I've come up with yet another little hack, this time for turning 1-d 
 arrays into CSV format.
 
 You mean like this (which is new in 7.4)?
 
 regression=# select array_to_string (array[1,2,3], ',');
  array_to_string
 -
  1,2,3
 (1 row)

Not quite.  The CSV thing quotes the way you'd see in CSV files, as in

SELECT csv(array['1'::text, '2'::text, '3'::text]);
csv
---
 1,2,3

 See:
 http://www.postgresql.org/docs/current/static/functions-array.html

It's great, and I use it :)

 It's very handy in conjunction with the array_accum aggregate (can
 this be made a standard aggregate?) in 
 http://developer.postgresql.org/docs/postgres/xaggr.html.
 
 Early in the 7.4 dev cycle array_accum() was actually in cvs as a
 built-in C function (and it still does exist in PL/R as such). But
 toward the end of the cycle an objection was raised and it was
 removed.  Search the archives in the May/June 2003 timeframe.

Um, OK.  It would be kinda handy, tho.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100cell: +1 415 235 3778

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] CSV hack

2003-12-10 Thread Andrew Dunstan
You also need to quote values containing the separator.

cheers

andrew (who used to set creating CSV as a programming exercise - 
students almost never get it right)

David Fetter wrote:

Kind people,

I've come up with yet another little hack, this time for turning 1-d
arrays into CSV format.  It's very handy in conjunction with the
array_accum aggregate (can this be made a standard aggregate?) in
http://developer.postgresql.org/docs/postgres/xaggr.html.
Here 'tis...

CREATE OR REPLACE FUNCTION csv(anyarray) RETURNS TEXT AS
'DECLARE
   in_array ALIAS FOR $1;
   temp_string TEXT;
   quoted_string TEXT;
   i INTEGER;
BEGIN
   FOR i IN array_lower(in_array, 1)..array_upper(in_array, 1)
   LOOP
   IF in_array[i]::TEXT ~ 
   THEN
   temp_string :=  || replace(in_array[i]::TEXT, , )  || ;
   ELSE
   temp_string := in_array[i]::TEXT;
   END IF;
   IF i = array_lower(in_array, 1)
   THEN
   quoted_string := temp_string;
   ELSE
   quoted_string := quoted_string || '','' || temp_string;
   END IF;
   END LOOP;
   RETURN quoted_string;
END;
' LANGUAGE 'plpgsql';
Those DBD::Pg users among us who'd like to be able to bind_columns to
postgresql arrays may have a leg up with Text::CSV_XS.
Other middleware should be able to handle such things, too. :)

Cheers,
D
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match