Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread Oleg Bartunov
On Fri, 12 Nov 2004, John Hansen wrote:
I miss about UTF-8 :) ltree doesn't supports UTF-8 yet.
ok,. how about all the 'other' characters from us-ascii :
,[EMAIL PROTECTED]&*()_+-=[]{}\|'"?><`~
these 'should' all be valid for the ltxtquery, ltree, and ltree[] types,
except maybe for . which is used as seperator (and maybe . should be
valid too, if prepended with a '\', just as you would with a regex to
make the next character a literal).
I agree with you, but Unfortunately, it's not easy and we have no time
to work on ltree now.
... John

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly <[EMAIL PROTECTED]> writes:
> I may not be explaining myself well or I may fundamentally 
> misunderstand how copy works.

Well, you're definitely ignoring the character-set-conversion issue.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Bruce Momjian

Can I see an example of such a failure line?

---

Andrew Dunstan wrote:
> 
> Darcy Buskermolen has drawn my attention to unfortunate behaviour of 
> COPY CSV with fields containing embedded line end chars if the embedded 
> sequence isn't the same as those of the file containing the CSV data. In 
> that case we error out when reading the data in. This means there are 
> cases where we can produce a CSV data file which we can't read in, which 
> is not at all pleasant.
> 
> Possible approaches to the problem:
> . make it a documented limitation
> . have a "csv read" mode for backend/commands/copy.c:CopyReadLine() that 
> relaxes some of the restrictions on inconsistent line endings
> . escape embedded line end chars
> 
> The last really isn't an option, because the whole point of CSVs is to 
> play with other programs, and my understanding is that those that 
> understand multiline fields (e.g. Excel) expect them not to be escaped, 
> and do not produce them escaped.
> 
> So right now I'm tossing up in my head between the first two options. Or 
> maybe there's another solution I haven't thought of.
> 
> Thoughts?
> 
> cheers
> 
> andrew
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] newbie compile question. please help

2004-11-11 Thread Gevik Babakhani
Problem solved.

I added some printf statements in order do print some debug information.

The tools like pg_ctl always run postmaster -V in order to get the version
number. The printf statements that I added, printed extra info which pg_ctl
could not "understand". Therefore I got the error. 

Regards,
Gevik

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dave Cramer
Sent: Friday, November 12, 2004 1:01 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Subject: Re: [HACKERS] newbie compile question. please help

Check the path of pg_ctl and postmaster, you must have two versions on 
the machine somewhere.

Dave

[EMAIL PROTECTED] wrote:

>Dear Folks,
>
>Could someone please tell me what I am doing wrong here?
>
>I made some minor changes to
>main.c   then
>make then
>make install
>
>when I want to start the postmaster I get
>
>**
>The program "postmaster" was found by
>"/home/gevik/postgres/build2/bin/pg_ctl"
>but was not the same version as pg_ctl.
>Check your installation.
>*
>
>What am I doing wrong here?
>
>Regards,
>
>Gevik
>
>
>---(end of broadcast)---
>TIP 6: Have you searched our list archives?
>
>   http://archives.postgresql.org
>
>
>  
>

-- 
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT

2004-11-11 Thread Bruce Momjian
Andrew McMillan wrote:
-- Start of PGP signed section.
> On Wed, 2004-11-10 at 11:45 -0500, Tom Lane wrote:
> > Andrew McMillan <[EMAIL PROTECTED]> writes:
> > > When tracking down gnarly problems in heavily multi-user applications
> > > enabling higher log levels at selective points has the potential to help
> > > _a lot_ with diagnostic detail, without smothering you in _every_
> > > detail.
> > 
> > Sure.  As I pointed out in the other thread, if you want to allow an app
> > to do this, you can make available a SECURITY DEFINER function that
> > performs the desired SET on its behalf.  By setting execute permissions
> > on the function and/or including restrictions in the function's code,
> > you can make this as tight or as loose a loophole as you like.  So it's
> > certainly possible to do what you want in any case.  I think the issue
> > at hand is what's appropriate to provide as hard-wired functionality.
> 
> That sounds excellent - I hadn't realised that this workaround would be
> possible, and indeed with this in place that will provide even better
> control over the facility.

OK, here is one vote for the ALTER USER/remove USERLIMIT croud, and you
were the person who originally mentioned the problem.  You don't think
the function creation is hard.  Perhaps that's the way to go then.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] GUC custom variables broken

2004-11-11 Thread Bruce Momjian
Thomas Hallgren wrote:
> Tom Lane wrote:
> > I broke it :-(.  Fix committed.
> > 
> Thanks, that was quick. You fixed it before I managed to find it.
> 
> Not that I'm in an extreme hurry, just curious. Are we using different 
> CVS repositories with some latency in replication or something? I don't 
> seem to get your fix yet.

Anonymous CVS has a lag.  I think it is 1/2 hour or an hour.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 10:07 PM, Andrew Dunstan wrote:

Patrick B Kelly wrote:

My suggestion is to simply have CopyReadLine recognize these two 
states (in-field and out-of-field) and execute the current logic only 
while in the second state. It would not be too hard but as you 
mentioned it is non-trivial.


We don't know what state we expect the end of line to be in until 
after we have actually read the line. To know how to treat the end of 
line on your scheme we would have to parse as we go rather than after 
reading the line as now. Changing this would be not only be 
non-trivial but significantly invasive to the code.


Perhaps I am misunderstanding the code. As I read it the code currently 
goes through the input character by character looking for NL and EOF 
characters. It appears to be very well structured for what I am 
proposing. The section in question is a small and clearly defined loop 
which reads the input one character at a time and decides when it has 
reached the end of the line or file. Each call of CopyReadLine attempts 
to get one more line. I would propose that each time it starts out in 
the out-of-field state and the state is toggled by each un-escaped 
quote that it encounters in the stream. When in the in-field state, it 
would only look for the next un-escaped quote and while in the 
out-of-field state, it would execute the existing logic as well as 
looking for the next un-escaped quote.

I may not be explaining myself well or I may fundamentally 
misunderstand how copy works. I would be happy to code the change and 
send it to you for review, if you would be interested in looking it 
over and it is felt to be a worthwhile capability.


Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan

Patrick B Kelly wrote:

My suggestion is to simply have CopyReadLine recognize these two 
states (in-field and out-of-field) and execute the current logic only 
while in the second state. It would not be too hard but as you 
mentioned it is non-trivial.


We don't know what state we expect the end of line to be in until after 
we have actually read the line. To know how to treat the end of line on 
your scheme we would have to parse as we go rather than after reading 
the line as now. Changing this would be not only be non-trivial but 
significantly invasive to the code.

cheers
andrew
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 6:16 PM, Tom Lane wrote:
Patrick B Kelly <[EMAIL PROTECTED]> writes:
What about just coding a FSM into
backend/commands/copy.c:CopyReadLine() that does not process any 
flavor
of NL characters when it is inside of a data field?
CopyReadLine has no business tracking that.  One reason why not is that
it is dealing with data not yet converted out of the client's encoding,
which makes matching to user-specified quote/escape characters
difficult.
regards, tom lane
---(end of 
broadcast)---
TIP 7: don't forget to increase your free space map settings


I appreciate what you are saying about the encoding and you are, of 
course, right but CopyReadLine is already processing the NL characters 
and it is doing it without considering the context in which they 
appear. Unfortunately, the same character(s) are used for two different 
purposes in the files in question. Without considering whether they 
appear inside or outside of data fields, CopyReadline will mistake one 
for the other and cannot correctly do what it is already trying to do 
which is break the input file into lines.

My suggestion is to simply have CopyReadLine recognize these two states 
(in-field and out-of-field) and execute the current logic only while in 
the second state. It would not be too hard but as you mentioned it is 
non-trivial.

Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> Oh?  How is a first() aggregate going to know what sort order you want
> >> within the group?
> 
> > It would look something like
> 
> > select x,first(a),first(b) from (select x,a,b from table order by x,y) 
> > group by x
> 
> > which is equivalent to
> 
> > select DISTINCT ON (x) x,a,b from table ORDER BY x,y
> 
> No, it is not.  The GROUP BY has no commitment to preserve order ---
> consider for example the possibility that we implement the GROUP BY by
> hashing.

It doesn't matter how the group by is implemented. The only thing that matters
is what order the tuples are presented to the aggregate function. Even hashing
presents them to the aggregate function as they're read in from the subquery.

In fact iirc hashing was why the original user that asked for this behaviour
found that it sometimes did work and wondered why it didn't work when the
planner *didn't* use hashing. You added code to allow it to function properly
for that case as well.

> The fact that an optimization is present does not make it part of the
> guaranteed semantics of the language.
>
> Basically, first() is a broken concept in SQL.  Of course DISTINCT ON
> is broken too for the same reasons, but I do not see that first() is
> one whit less of a kluge than DISTINCT ON.

first() is only a whit less of a kludge than DISTINCT ON in that it covers the
more general case of wanting both first value for each group as well as other
aggregates. Otherwise it's exactly equivalent.

Depending on ordering was a broken concept in the abstract theoretical world
of SQL as originally envisioned. But (unfortunately) we don't really live in
that world any more. There are tons of extensions that Postgres and other
databases provide that do depend on the ordering of record sets. And there are
tons of examples of real business problems that require those extensions.

It may be a broken concept but it's one that's extremely popular. Postgres
implemented DISTINCT ON independently, but mysql also has an equivalent
feature: you can include any column not included in the GROUP BY clause and
mysql will implicitly do the equivalent of first().

Oracle on the other hand has a complete suite of functions for processing
order-dependent data. They even implement lead()/lag() for allowing you to
access the data from subsequent or previous records in the data set.

-- 
greg


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] GUC custom variables broken

2004-11-11 Thread Thomas Hallgren
Tom Lane wrote:
I broke it :-(.  Fix committed.
Thanks, that was quick. You fixed it before I managed to find it.
Not that I'm in an extreme hurry, just curious. Are we using different 
CVS repositories with some latency in replication or something? I don't 
seem to get your fix yet.

Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] newbie compile question. please help

2004-11-11 Thread Dave Cramer
Check the path of pg_ctl and postmaster, you must have two versions on 
the machine somewhere.

Dave
[EMAIL PROTECTED] wrote:
Dear Folks,
Could someone please tell me what I am doing wrong here?
I made some minor changes to
main.c   then
make then
make install
when I want to start the postmaster I get
**
The program "postmaster" was found by
"/home/gevik/postgres/build2/bin/pg_ctl"
but was not the same version as pg_ctl.
Check your installation.
*
What am I doing wrong here?
Regards,
Gevik
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org
 

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] GUC custom variables broken

2004-11-11 Thread Thomas Hallgren
Thomas Hallgren wrote:
I have the following in my postgresql.conf
custom_variable_classes = 'pljava'
pljava.classpath = 
'C:\\Tada\\Workspace\\org.postgresql.pljava\\build\\pljava.jar'

It worked fine with 8.0.0beta2. The beta4 however, gives me the 
following message:

   FATAL:  unrecognized configuration parameter "pljava.classpath"
What happened?
>
Ah, I see what's wrong. It's in the guc_file.l
Previously, the "custom_variable_classes" was *set* first of all 
variables. Now, it's moved first in the linked list and then *tested* 
first of all. Subsequent tests of custom variables will of course then 
fail since the custom_variable_classes has not been set.

Perhaps this change could be reverted? Or perhaps someone has other 
preferences on implementation?

Regards,
Thomas Hallgren
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] GUC custom variables broken

2004-11-11 Thread Tom Lane
Thomas Hallgren <[EMAIL PROTECTED]> writes:
> What happened?

I broke it :-(.  Fix committed.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] A modest proposal: get rid of GUC's USERLIMIT

2004-11-11 Thread Andrew McMillan
On Wed, 2004-11-10 at 11:45 -0500, Tom Lane wrote:
> Andrew McMillan <[EMAIL PROTECTED]> writes:
> > When tracking down gnarly problems in heavily multi-user applications
> > enabling higher log levels at selective points has the potential to help
> > _a lot_ with diagnostic detail, without smothering you in _every_
> > detail.
> 
> Sure.  As I pointed out in the other thread, if you want to allow an app
> to do this, you can make available a SECURITY DEFINER function that
> performs the desired SET on its behalf.  By setting execute permissions
> on the function and/or including restrictions in the function's code,
> you can make this as tight or as loose a loophole as you like.  So it's
> certainly possible to do what you want in any case.  I think the issue
> at hand is what's appropriate to provide as hard-wired functionality.

That sounds excellent - I hadn't realised that this workaround would be
possible, and indeed with this in place that will provide even better
control over the facility.

Regards,
Andrew.
-
Andrew @ Catalyst .Net .NZ  Ltd,  PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/PHYS: Level 2, 150-154 Willis St
DDI: +64(4)803-2201  MOB: +64(272)DEBIAN  OFFICE: +64(4)499-2267
   The secret of being a bore is to say everything -- Voltaire
-



signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Patrick B Kelly <[EMAIL PROTECTED]> writes:
> What about just coding a FSM into 
> backend/commands/copy.c:CopyReadLine() that does not process any flavor 
> of NL characters when it is inside of a data field?

CopyReadLine has no business tracking that.  One reason why not is that
it is dealing with data not yet converted out of the client's encoding,
which makes matching to user-specified quote/escape characters
difficult.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan

Patrick B Kelly wrote:

What about just coding a FSM into 
backend/commands/copy.c:CopyReadLine() that does not process any 
flavor of NL characters when it is inside of a data field?


It would be a major change - the routine doesn't read data a field at a 
time, and has no idea if we are even in CSV mode at all. It would be 
rather late in the dev cycle to be making such changes, I suspect.

cheers
andrew
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[HACKERS] GUC custom variables broken

2004-11-11 Thread Thomas Hallgren
I have the following in my postgresql.conf
custom_variable_classes = 'pljava'
pljava.classpath = 
'C:\\Tada\\Workspace\\org.postgresql.pljava\\build\\pljava.jar'

It worked fine with 8.0.0beta2. The beta4 however, gives me the 
following message:

   FATAL:  unrecognized configuration parameter "pljava.classpath"
What happened?
Regards,
Thomas Hallgren

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 11, 2004, at 2:56 PM, Andrew Dunstan wrote:

Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
Patrick B Kelly wrote:
Actually, when I try to export a sheet with multi-line cells from 
excel, it tells me that this feature is incompatible with the CSV 
format and will not include them in the CSV file.


It probably depends on the version. I have just tested with Excel 
2000 on a WinXP machine and it both read and wrote these files.

I'd be inclined to define Excel 2000 as broken, honestly, if it's
writing unescaped newlines as data.  To support this would mean 
throwing
away most of our ability to detect incorrectly formatted CSV files.
A simple error like a missing close quote would look to the machine 
like
the rest of the file is a single long data line where all the newlines
are embedded in data fields.  How likely is it that you'll get a 
useful
error message out of that?  Most likely the error message would point 
to
the end of the file, or at least someplace well removed from the 
actual
mistake.

I would vote in favor of removing the current code that attempts to
support unquoted newlines, and waiting to see if there are complaints.

This feature was specifically requested when we discussed what sort of 
CSVs we would handle.

And it does in fact work as long as the newline style is the same.
I just had an idea. How about if we add a new CSV option MULTILINE. If 
absent, then on output we would not output unescaped LF/CR characters 
and on input we would not allow fields with embedded unescaped LF/CR 
characters. In both cases we could error out for now, with perhaps an 
8.1 TODO to provide some other behaviour.

Or we could drop the whole multiline "feature" for now and make the 
whole thing an 8.1 item, although it would be a bit of a pity when it 
does work in what will surely be the most common case.

What about just coding a FSM into 
backend/commands/copy.c:CopyReadLine() that does not process any flavor 
of NL characters when it is inside of a data field?

Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> Oh?  How is a first() aggregate going to know what sort order you want
>> within the group?

> It would look something like

> select x,first(a),first(b) from (select x,a,b from table order by x,y) group 
> by x

> which is equivalent to

> select DISTINCT ON (x) x,a,b from table ORDER BY x,y

No, it is not.  The GROUP BY has no commitment to preserve order ---
consider for example the possibility that we implement the GROUP BY by
hashing.

> The group by can see that the subquery is already sorted by x and
> doesn't need to be resorted. In fact I believe you added the smarts to
> detect that condition in response to a user asking about precisely
> this type of scenario.

The fact that an optimization is present does not make it part of the
guaranteed semantics of the language.

Basically, first() is a broken concept in SQL.  Of course DISTINCT ON
is broken too for the same reasons, but I do not see that first() is
one whit less of a kluge than DISTINCT ON.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > It would also make it possible to deprecate DISTINCT ON in favour of GROUP 
> > BY
> > with first() calls.
> 
> Oh?  How is a first() aggregate going to know what sort order you want
> within the group?  AFAICS first() is only useful when you honestly do
> not care which group member you get ... which is certainly not the case
> for applications of DISTINCT ON.

It would look something like

select x,first(a),first(b) from (select x,a,b from table order by x,y) group by 
x

which is equivalent to

select DISTINCT ON (x) x,a,b from table ORDER BY x,y

The group by can see that the subquery is already sorted by x and doesn't need
to be resorted. In fact I believe you added the smarts to detect that
condition in response to a user asking about precisely this type of scenario.

This is actually more general than DISTINCT ON since DISTINCT ON is basically
a degenerate case of the above where the _only_ aggregate allowed is first().
The more general case could have first() as well as other aggregates, though
obviously they would make it unlikely that any optimizations would be
applicable.

I do kind of like the DISTINCT ON syntax, but the inability to use any other
aggregate functions makes me often have to convert queries I originally wrote
to use it to use the more general GROUP BY and first() instead.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread David Fetter
On Thu, Nov 11, 2004 at 03:38:16PM -0500, Greg Stark wrote:
> 
> Tom Lane <[EMAIL PROTECTED]> writes:
> 
> > I would vote in favor of removing the current code that attempts
> > to support unquoted newlines, and waiting to see if there are
> > complaints.
> 
> Uhm. *raises hand*
> 
> I agree with your argument but one way or another I have to load
> these CSVs I'm given. And like it or not virtually all the CSVs
> people get are going to be coming from Excel. So far with 7.4 I've
> just opened them up in Emacs and removed the newlines, but it's a
> royal pain in the arse.

Meanwhile, check out dbi-link.  It lets you query against DBI data
sources including DBD::Excel :)

http://pgfoundry.org/projects/dbi-link/

Bug reports welcome.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread John Hansen
> I miss about UTF-8 :) ltree doesn't supports UTF-8 yet.

ok,. how about all the 'other' characters from us-ascii :
,[EMAIL PROTECTED]&*()_+-=[]{}\|'"?><`~

these 'should' all be valid for the ltxtquery, ltree, and ltree[] types,
except maybe for . which is used as seperator (and maybe . should be
valid too, if prepended with a '\', just as you would with a regex to
make the next character a literal).

... John



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> I would vote in favor of removing the current code that attempts to
> support unquoted newlines, and waiting to see if there are complaints.

Uhm. *raises hand*

I agree with your argument but one way or another I have to load these CSVs
I'm given. And like it or not virtually all the CSVs people get are going to
be coming from Excel. So far with 7.4 I've just opened them up in Emacs and
removed the newlines, but it's a royal pain in the arse.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan

Tom Lane wrote:
Andrew Dunstan <[EMAIL PROTECTED]> writes:
 

Patrick B Kelly wrote:
   

Actually, when I try to export a sheet with multi-line cells from 
excel, it tells me that this feature is incompatible with the CSV 
format and will not include them in the CSV file.
 

 

It probably depends on the version. I have just tested with Excel 2000 
on a WinXP machine and it both read and wrote these files.
   

I'd be inclined to define Excel 2000 as broken, honestly, if it's
writing unescaped newlines as data.  To support this would mean throwing
away most of our ability to detect incorrectly formatted CSV files.
A simple error like a missing close quote would look to the machine like
the rest of the file is a single long data line where all the newlines
are embedded in data fields.  How likely is it that you'll get a useful
error message out of that?  Most likely the error message would point to
the end of the file, or at least someplace well removed from the actual
mistake.
I would vote in favor of removing the current code that attempts to
support unquoted newlines, and waiting to see if there are complaints.
 

This feature was specifically requested when we discussed what sort of 
CSVs we would handle.

And it does in fact work as long as the newline style is the same.
I just had an idea. How about if we add a new CSV option MULTILINE. If 
absent, then on output we would not output unescaped LF/CR characters 
and on input we would not allow fields with embedded unescaped LF/CR 
characters. In both cases we could error out for now, with perhaps an 
8.1 TODO to provide some other behaviour.

Or we could drop the whole multiline "feature" for now and make the 
whole thing an 8.1 item, although it would be a bit of a pity when it 
does work in what will surely be the most common case.

cheers
andrew

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes:
> Re: knowing internal representation, I think this is required anyway;
> else the optimization would only work on a very limited numer of
> situations.

The point of my remark is that pushing this knowledge out to a function
is helpful only if you can put that function somehow at arm's length
from the main body of the optimizer.  Otherwise structuring it like that
is useless obscurantism.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> It would also make it possible to deprecate DISTINCT ON in favour of GROUP BY
> with first() calls.

Oh?  How is a first() aggregate going to know what sort order you want
within the group?  AFAICS first() is only useful when you honestly do
not care which group member you get ... which is certainly not the case
for applications of DISTINCT ON.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Tom Lane
Andrew Dunstan <[EMAIL PROTECTED]> writes:
> Patrick B Kelly wrote:
>> Actually, when I try to export a sheet with multi-line cells from 
>> excel, it tells me that this feature is incompatible with the CSV 
>> format and will not include them in the CSV file.

> It probably depends on the version. I have just tested with Excel 2000 
> on a WinXP machine and it both read and wrote these files.

I'd be inclined to define Excel 2000 as broken, honestly, if it's
writing unescaped newlines as data.  To support this would mean throwing
away most of our ability to detect incorrectly formatted CSV files.
A simple error like a missing close quote would look to the machine like
the rest of the file is a single long data line where all the newlines
are embedded in data fields.  How likely is it that you'll get a useful
error message out of that?  Most likely the error message would point to
the end of the file, or at least someplace well removed from the actual
mistake.

I would vote in favor of removing the current code that attempts to
support unquoted newlines, and waiting to see if there are complaints.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] newbie compile question. please help

2004-11-11 Thread Tom Lane
[EMAIL PROTECTED] writes:
> I made some minor changes to
> main.c   then
> make then
> make install

> when I want to start the postmaster I get

> The program "postmaster" was found by
> "/home/gevik/postgres/build2/bin/pg_ctl"
> but was not the same version as pg_ctl.

Maybe you did something that broke recognition of the "-V" command-line option?
What happens when you do "postmaster -V"?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query

2004-11-11 Thread Bruno Wolff III
On Thu, Nov 11, 2004 at 09:29:14 +,
  Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Wed, 2004-11-10 at 22:48, Mark Kirkwood wrote:
> > Planning for future note: I would like whatever mechanism that is added 
> > for this MAX/MIN stuff to be amenable to more subtle things like 
> > aggregate navigation (see R.Kimball's article 
> > http://www.dbmsmag.com/9608d54.html).
> > 
> 
> With you on that one...

I disaggree. What that author is suggesting is orthogonal to what is being
proposed by Mark. That article is really suggesting a varient of materialized
views where you can use the normal aggregate notation instead of having
to use a special column name to get access to an aggregate in the
materialized view.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Bruno Wolff III
On Thu, Nov 11, 2004 at 10:24:34 -0500,
  Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> We know how to determine that an index matches an ORDER BY clause.
> But what has an aggregate called MAX() got to do with ORDER BY?  Magic
> assumptions about operators named "<" are not acceptable answers; there
> has to be a traceable connection in the catalogs.
> 
> As a real-world example of why I won't hold still for hard-wiring this:
> a complex-number data type might have btree opclasses allowing it to be
> sorted either by real part or by absolute value.  One might then define
> max_real() and max_abs() aggregates on the type.  It should be possible
> to optimize such aggregates the same way as any other max() aggregate.

Wouldn't knowing an opclass and direction associated with an aggregrate
function allow you to do this?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] newbie compile question. please help

2004-11-11 Thread gevik
Dear Folks,

Could someone please tell me what I am doing wrong here?

I made some minor changes to
main.c   then
make then
make install

when I want to start the postmaster I get

**
The program "postmaster" was found by
"/home/gevik/postgres/build2/bin/pg_ctl"
but was not the same version as pg_ctl.
Check your installation.
*

What am I doing wrong here?

Regards,

Gevik


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> As a real-world example of why I won't hold still for hard-wiring this:
> a complex-number data type might have btree opclasses allowing it to be
> sorted either by real part or by absolute value.  One might then define
> max_real() and max_abs() aggregates on the type.  It should be possible
> to optimize such aggregates the same way as any other max() aggregate.

So if the max_real() aggregate had a field that indicated that max_real(x)
could be satisfied with only the first record from the dataset as long as it's
sorted by "real(x)" that would be enough information. 

The optimizer would still have a lot of work to combine this information for
all aggregates used and check the costs for providing sorted result sets to
the scan. There would also need new scans that could handle reading just one
record and then skipping to the next group.

It's a lot of work but it would make a lot of aggregates a lot more useful. It
would also make it possible to deprecate DISTINCT ON in favour of GROUP BY
with first() calls.

-- 
greg


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Tom Lane
Bruno Wolff III <[EMAIL PROTECTED]> writes:
>   Tom Lane <[EMAIL PROTECTED]> wrote:
>> We know how to determine that an index matches an ORDER BY clause.
>> But what has an aggregate called MAX() got to do with ORDER BY?

> Wouldn't knowing an opclass and direction associated with an aggregrate
> function allow you to do this?

That's one way you could do it.  Another possibly cleaner way is to just
supply a sort operator (with the implication "the desired value is the
one ordered first by this operator") and then let the optimizer work out
which opclass(es) are relevant.  This would win if the same operator
appears in different opclasses, which is uncommon at the moment but
would not be so if we start offering "reverse sort" opclasses.

I think we covered all this ground before, though --- have you checked
the archives from the last time this was discussed?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query

2004-11-11 Thread Greg Stark
Simon Riggs <[EMAIL PROTECTED]> writes:

> On Wed, 2004-11-10 at 22:48, Mark Kirkwood wrote:
> > Planning for future note: I would like whatever mechanism that is added 
> > for this MAX/MIN stuff to be amenable to more subtle things like 
> > aggregate navigation (see R.Kimball's article 
> > http://www.dbmsmag.com/9608d54.html).
> > 
> 
> With you on that one...

This looks like more of a materialized views application than anything to do
with planning aggregate functions. Materialized views are most useful when the
materialized view is smaller than the original data and therefore usually
involve aggregates, but it's not necessary.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Bruno Wolff III
On Thu, Nov 11, 2004 at 17:52:19 +1100,
  John Hansen <[EMAIL PROTECTED]> wrote:
> Why not just change the function all together to 'select $1 from $2
> order by $1 desc limit 1;'
> 
> Is there ANY situation where max(col) as it is, would be faster?

Yes. A couple I can think of are:
When count(col) is also being used.
When a GROUP BY is being used and there isn't an index that can both be used
to do the grouping and col order within each group.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Zeugswetter Andreas DAZ SD

> How are you planning to represent the association between MIN/MAX and
> particular index orderings in the system catalogs?

Don't we already have that info to decide whether an index handles 
an "ORDER BY" without a sort node ?

Andreas

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] [HACKERS] plperl Safe restrictions

2004-11-11 Thread Andrew Dunstan

Bruce Momjian wrote:
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.
---
Andrew Dunstan wrote:
 

...
The patch also does some other inconsequential tidying of overlong 
lines, and removes some unnecessary ops in the unsafe case. These are 
basically cosmetic - the only significant part is replacing this:

   $PLContainer->permit(':base_math');
with this:
   $PLContainer->permit(qw[:base_math !:base_io !srand sort sprintf time]);
   


As per previous discussions, please remove "!srand" and "sprintf" 
if/when applying.

cheers
andrew
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Increasing the length of

2004-11-11 Thread Andrew Sullivan
On Thu, Nov 11, 2004 at 10:20:43AM -0500, Robert Treat wrote:
> 
> I believe the geeky non-helpful answer is to attach to the process with
> gdb and do p debug_query_string which I believe will show you said long
> running query. 

Yes, this will work, & I've used it.  But of course, you don't
actually _know_ it's a long-running query.  All you know is that the
back end which has that pid has a query now.  So you have to connect
several times and look a thte query to see this.  This strategy is
also what pgmonitor and friends do.

> Of course that idea lead me to wondering why we couldn't have a function
> that could look at a connection (well, either by way of pid or possibly
> transaction id) and show the current query being executed. 

I think that's what's being requested.  I've no clue how to do it,
though.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Increasing the length of

2004-11-11 Thread Robert Treat
On Wed, 2004-11-10 at 17:57, Andrew Sullivan wrote:
> On Wed, Nov 10, 2004 at 05:51:01PM -0500, Andrew Sullivan wrote:
> > log_statement_after_min_duration (integer) -- which did what Simon
> > wants.  
> 
> Uh, well, not what Simon wants, of course, but which gave us a useful
> capability anyway.  I agree that the full-bore profiling for the DBA
> would be awful nice.  But in its absence, if you could see your
> long-running query in the log after a minute, and then go do an
> EXPLAIN and realise "uh-oh, that's gonna take 3 days to complete" and
> kill it, it would be a big help.  
> 

I believe the geeky non-helpful answer is to attach to the process with
gdb and do p debug_query_string which I believe will show you said long
running query. (Someone who actually hacks C can correct me on that, but
I believe I've done it that way before).  

Of course that idea lead me to wondering why we couldn't have a function
that could look at a connection (well, either by way of pid or possibly
transaction id) and show the current query being executed. 


Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] plperl Safe restrictions

2004-11-11 Thread Bruce Momjian
> >Andrew Dunstan wrote:
> >  
> >
> >>...
> >>
> >>The patch also does some other inconsequential tidying of overlong 
> >>lines, and removes some unnecessary ops in the unsafe case. These are 
> >>basically cosmetic - the only significant part is replacing this:
> >>
> >>$PLContainer->permit(':base_math');
> >>
> >>with this:
> >>
> >>$PLContainer->permit(qw[:base_math !:base_io !srand sort sprintf time]);
> >>
> >>
> >>
> >
> 
> As per previous discussions, please remove "!srand" and "sprintf" 
> if/when applying.

OK.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Vacuum info clarification

2004-11-11 Thread Dave Cramer
While we are at it
An explanation of what
DETAIL:  0 dead row versions cannot be removed yet
There were 1 unused item pointers.
mean would be useful.
Dave
Bruce Momjian wrote:
The last two lines of VACUUM VERBOSE are:
 INFO:  free space map: 49 relations, 32 pages stored; 784 total pages needed
 DETAIL:  Allocated FSM size: 1000 relations + 2 pages = 182 kB shared 
memory.
I think "total pages needed" should be "total pages used" or "total
pages requested".
I am confused about the difference between "stored" and "requested".  I
read the comments in freespace.c but am still confused.
 

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread Oleg Bartunov
John,
I miss about UTF-8 :) ltree doesn't supports UTF-8 yet.
Oleg
On Thu, 11 Nov 2004, John Hansen wrote:
Oleg,
For example:
The DMOZ topic: 
Top/World/Espa??ol/Pa??ses/M?©xico/Referencia/Bibliotecas/Nacionales
select 
text2ltree(replace('Top/World/Espa??ol/Pa??ses/M?©xico/Referencia/Bibliotecas/Nacionales','/','.'));
ERROR:  syntax error at position 14 near "?"
I've also found that topics contain , as in the DMOZ Topic: 
Top/Arts/Music/Composition/Composers/F/Fasch,_Johann_Friedrich
Kind regards
John
-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 8:48 PM
To: John Hansen
Cc: Teodor Sigaev; Pgsql Hackers
Subject: Re: ltree PostgreSQL Module
John,
On Thu, 11 Nov 2004, John Hansen wrote:
Hello folks,
It seems that the ltree module does not support the data
for which is
was (aparantly) created.
The DMOZ data is encoded in UTF8, but ISALNUM() does not support
unicode characters...
In fact, it does not support any database encoding.
Is there any plans to change this?
it's very difficult to see your problem without any examples !
Many people are lucky users of ltree module, so you probably
have some other problem. Any test demonstrating your probelm
would be very useful.
Also, posting to mailing lists would help other people help  you.
Kind Regards,
John Hansen
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Andrew Dunstan

Patrick B Kelly wrote:
On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote:
The last really isn't an option, because the whole point of CSVs is 
to play with other programs, and my understanding is that those that 
understand multiline fields (e.g. Excel) expect them not to be 
escaped, and do not produce them escaped.

Actually, when I try to export a sheet with multi-line cells from 
excel, it tells me that this feature is incompatible with the CSV 
format and will not include them in the CSV file.


It probably depends on the version. I have just tested with Excel 2000 
on a WinXP machine and it both read and wrote these files.

cheers
andrew
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-11 Thread Bruno Wolff III
On Thu, Nov 11, 2004 at 01:18:05 -0600,
  "Jim C. Nasby" <[EMAIL PROTECTED]> wrote:
> Certainly handling only one case is better than none. I just wanted to
> bring up the multiple aggregate scenario. Also, consider that
> 
> SELECT min(a), max(a), min(b), max(c) FROM table
> 
> could be optimized as well (into 4 index scans, assuming a, b, and c all
> had indexes).
> 
> I don't think any other aggregates are candidates for optimization right
> now, though I guess I could be wrong.

Remember that max and min are a number of aggregates, as each datatype
which have max and min functions have different ones from those used
by other datatypes.
I think someone added boolean aggregates for and and or in version 8.
If so, those can also use indexes in the same way.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Alvaro Herrera
On Thu, Nov 11, 2004 at 01:08:39AM -0500, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > What about having a new column in pg_aggregate which would point to a
> > function that would try to optimize the aggregate's handling?
> 
> I can't get very excited about this, because how would you make a
> reasonably stable/narrow API for such a thing?  The function as you
> propose it would have to know everything about not only the planner's
> data representations but the N specific places it would be called from.

No, the function would discard all calls except the one it knows how
to optimize.  The point in having multiple call places is that some
aggregates will likely be optimized in some place, and others somewhere
else.  Most likely, a first patch would include only the call site that
would help in optimizing min() and max().

Of course, an aggregate could specify no optimizing function, which
would be the current situation, where no aggregate knows how to optimize
itself.

Re: knowing internal representation, I think this is required anyway;
else the optimization would only work on a very limited numer of
situations.

-- 
Alvaro Herrera ()
"A wizard is never late, Frodo Baggins, nor is he early.
He arrives precisely when he means to."  (Gandalf, en LoTR FoTR)


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[HACKERS] newbie compile question. please help

2004-11-11 Thread Gevik Babakhani








Dear Folks,

Could someone please tell what I am doing wrong?

 

I made some minor changes to 

main.c   then

make then

make install

 

when I want to start the postmaster I get

 

The program "postmaster" was found by "/home/gevik/postgres/build2/bin/pg_ctl"

but was not the same version as pg_ctl.

Check your installation.

 

What am I doing wrong here?

 

Regards,

Gevik

 

 

 








Re: [HACKERS] MAX/MIN optimization via rewrite (plus query

2004-11-11 Thread Simon Riggs
On Wed, 2004-11-10 at 22:48, Mark Kirkwood wrote:
> Planning for future note: I would like whatever mechanism that is added 
> for this MAX/MIN stuff to be amenable to more subtle things like 
> aggregate navigation (see R.Kimball's article 
> http://www.dbmsmag.com/9608d54.html).
> 

With you on that one...

-- 
Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread Oleg Bartunov
John,
On Thu, 11 Nov 2004, John Hansen wrote:
Hello folks,
It seems that the ltree module does not support the data for which is
was (aparantly) created.
The DMOZ data is encoded in UTF8, but ISALNUM() does not support unicode
characters...
In fact, it does not support any database encoding.
Is there any plans to change this?
it's very difficult to see your problem without any examples !
Many people are lucky users of ltree module, so you probably have some
other problem. Any test demonstrating your probelm would be very useful.
Also, posting to mailing lists would help other people help  you.
Kind Regards,
John Hansen
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites generally)

2004-11-11 Thread Tom Lane
"Zeugswetter Andreas DAZ SD" <[EMAIL PROTECTED]> writes:
>> How are you planning to represent the association between MIN/MAX and
>> particular index orderings in the system catalogs?

> Don't we already have that info to decide whether an index handles 
> an "ORDER BY" without a sort node ?

We know how to determine that an index matches an ORDER BY clause.
But what has an aggregate called MAX() got to do with ORDER BY?  Magic
assumptions about operators named "<" are not acceptable answers; there
has to be a traceable connection in the catalogs.

As a real-world example of why I won't hold still for hard-wiring this:
a complex-number data type might have btree opclasses allowing it to be
sorted either by real part or by absolute value.  One might then define
max_real() and max_abs() aggregates on the type.  It should be possible
to optimize such aggregates the same way as any other max() aggregate.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] plperl Safe restrictions

2004-11-11 Thread Bruce Momjian

Andrew, can you or someone summarize were we left this issue and your
patch?

---

Andrew Dunstan wrote:
> 
> 
> Tom Lane wrote:
> 
> >Andrew Dunstan <[EMAIL PROTECTED]> writes:
> >  
> >
> >>The question in my mind is "What are we protecting against?" ISTM it is 
> >>the use of the pl as a vector to attack the machine and postgres. Does a 
> >>segfault come into that category? After all, isn't it one of postgres's 
> >>strengths that we can survive individual backends crashing?
> >>
> >>
> >
> >Yeah, but a repeatable segfault certainly is an adequate tool for a
> >denial-of-service attack, since it takes out everyone else's sessions
> >along with your own.  A possibly larger objection is how sure can you be
> >that the effects will *only* be a segfault, and not say the ability to
> >execute some user-injected machine code.
> >  
> >
> 
> Ok, the release notes for perl 5.005 (which is now pretty ancient) say this:
> 
> "Perl now contains its own highly optimized qsort() routine. The new 
> qsort() is resistant to inconsistent comparison functions, so Perl's 
> |sort()| will not provoke coredumps any more when given poorly written 
> sort subroutines."
> 
> Also, there were some apparent problems with sort routine reentrancy in 
> perl < 5.6.1 - see 
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=60534.
> 
> I have not found any more recent refs on Google to "sort" causing problems.
> 
> Certainly in my testing it proved totally trivial to crash the backend 
> with sprintf.
> 
> So I suggest a reasonable position w.r.t. the danger of SEGVs would be 
> to allow "sort" but disallow sprintf.
> 
> 
> cheers
> 
> andrew
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
> 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread Oleg Bartunov
Hmm,
I don't see the error comes from ltree module, not from 'replace' function.
Also, are you sure your postgresql setup is ok (locale issue).
On Thu, 11 Nov 2004, John Hansen wrote:
Oleg,
For example:
The DMOZ topic: 
Top/World/Espa??ol/Pa??ses/M?©xico/Referencia/Bibliotecas/Nacionales
select 
text2ltree(replace('Top/World/Espa??ol/Pa??ses/M?©xico/Referencia/Bibliotecas/Nacionales','/','.'));
ERROR:  syntax error at position 14 near "?"
I've also found that topics contain , as in the DMOZ Topic: 
Top/Arts/Music/Composition/Composers/F/Fasch,_Johann_Friedrich
Kind regards
John
-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 11, 2004 8:48 PM
To: John Hansen
Cc: Teodor Sigaev; Pgsql Hackers
Subject: Re: ltree PostgreSQL Module
John,
On Thu, 11 Nov 2004, John Hansen wrote:
Hello folks,
It seems that the ltree module does not support the data
for which is
was (aparantly) created.
The DMOZ data is encoded in UTF8, but ISALNUM() does not support
unicode characters...
In fact, it does not support any database encoding.
Is there any plans to change this?
it's very difficult to see your problem without any examples !
Many people are lucky users of ltree module, so you probably
have some other problem. Any test demonstrating your probelm
would be very useful.
Also, posting to mailing lists would help other people help  you.
Kind Regards,
John Hansen
Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] plperl Safe restrictions

2004-11-11 Thread Bruce Momjian

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---


Andrew Dunstan wrote:
> 
> 
> David Helgason wrote:
> 
> > On 14. okt 2004, at 21:09, Andrew Dunstan wrote:
> >
> >> It has just been brought to my attention that we are being very 
> >> restrictive about what we allow to be done in trusted plperl. 
> >> Basically we allow the :default and :base_math set of operations (run 
> >> perldoc Opcode or see 
> >> http://www.perldoc.com/perl5.8.0/lib/Opcode.html for details of what 
> >> these mean). In particular, we do not allow calls to perl's builtin 
> >> sort, which is unpleasant, and on reviewing the list it seems to me 
> >> we could quite reasonably allow access to pack and unpack also. bless 
> >> and sprintf are also likely candidates for inclusion - I have not 
> >> finished reviewing the list, and would welcome advice from perl gurus 
> >> on this.
> >
> >
> > pack and unpack are unfortunately not safe. Very useful, but they 
> > allow write/read access to random memory. It's really a shame perl 
> > doesn't have a pragma to make them safe or have safe versions of them.
> >
> > Bless should be OK, unless sensitive objects are provided to the 
> > procedures.
> >
> > A postgres question I don't know the answer to is whether allowing the 
> > user to trigger a segfault is a security problem. If it isn't, several 
> > opcodes may probably be allowed, including sort and sprintf. If it is, 
> > well, you need only follow the perl5-porters list to know that there's 
> > banal perl structures are continuously being found that will segfault 
> > perl, some at compile time, other at runtime.
> 
> 
> OK, based on this and some further thought, I have prepared the attached 
> patch which does the right thing, I think, both in terms of what we 
> allow and what we don't.
> 
> First, we tighten security by disallowing access to srand and IO 
> functions on existing filehandles (other IO ops are already disallowed).
> 
> The we relax the restrictions by allowing access to perl's sort, sprintf 
> and time routines. I decided against pack/unpack based on the above, and 
> also decided that I couldn't think of any case where bless would have 
> any practical use - although that might change later. I'm trying to keep 
> changes minimal here. I don't believe that "time" carries any 
> significant security implications, and I think the dangers from "sort" 
> and "sprintf" are not so great as to disallow them. They might cause a 
> SEGV in a pathological case, but that doesn't give the user access to 
> the machine - if they can login to postgres they can probably mount any 
> number of DOS attacks anyway.
> 
> To answer David's question, the man says this about trusted functions: 
> "the TRUSTED flag should only be given for languages that do not allow 
> access to database server internals or the file system".  I think the 
> changes I propose fit in with that statement.
> 
> The patch also does some other inconsequential tidying of overlong 
> lines, and removes some unnecessary ops in the unsafe case. These are 
> basically cosmetic - the only significant part is replacing this:
> 
> $PLContainer->permit(':base_math');
> 
> with this:
> 
> $PLContainer->permit(qw[:base_math !:base_io !srand sort sprintf time]);
> 
> I have tested and it appears to do the right thing, both for the things 
> excluded and those included.
> 
> cheers
> 
> andrew
> 
> 
> 


> 
> ---(end of broadcast)---
> TIP 5: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faqs/FAQ.html

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] ltree PostgreSQL Module

2004-11-11 Thread John Hansen
Oleg,

For example:

The DMOZ topic: 
Top/World/Español/Países/México/Referencia/Bibliotecas/Nacionales

select 
text2ltree(replace('Top/World/Español/Países/México/Referencia/Bibliotecas/Nacionales','/','.'));
ERROR:  syntax error at position 14 near "Ã"

I've also found that topics contain , as in the DMOZ Topic: 
Top/Arts/Music/Composition/Composers/F/Fasch,_Johann_Friedrich

Kind regards

John

> -Original Message-
> From: Oleg Bartunov [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, November 11, 2004 8:48 PM
> To: John Hansen
> Cc: Teodor Sigaev; Pgsql Hackers
> Subject: Re: ltree PostgreSQL Module
> 
> John,
> 
> On Thu, 11 Nov 2004, John Hansen wrote:
> 
> > Hello folks,
> >
> > It seems that the ltree module does not support the data 
> for which is 
> > was (aparantly) created.
> >
> > The DMOZ data is encoded in UTF8, but ISALNUM() does not support 
> > unicode characters...
> > In fact, it does not support any database encoding.
> >
> > Is there any plans to change this?
> 
> it's very difficult to see your problem without any examples !
> Many people are lucky users of ltree module, so you probably 
> have some other problem. Any test demonstrating your probelm 
> would be very useful.
> Also, posting to mailing lists would help other people help  you.
> 
> >
> > Kind Regards,
> >
> > John Hansen
> >
> 
>   Regards,
>   Oleg
> _
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, 
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
> 
> 

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] multiline CSV fields

2004-11-11 Thread Patrick B Kelly
On Nov 10, 2004, at 6:10 PM, Andrew Dunstan wrote:
The last really isn't an option, because the whole point of CSVs is to 
play with other programs, and my understanding is that those that 
understand multiline fields (e.g. Excel) expect them not to be 
escaped, and do not produce them escaped.

Actually, when I try to export a sheet with multi-line cells from 
excel, it tells me that this feature is incompatible with the CSV 
format and will not include them in the CSV file.

Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [HACKERS] MAX/MIN optimization via rewrite (plus query rewrites

2004-11-11 Thread Greg Stark

Bruno Wolff III <[EMAIL PROTECTED]> writes:

> I don't think you should be rewriting queries as much as providing
> alternate plans and letting the rest of the optimizer decided which
> plan to use. If you just rewrite a query you might lock yourself into
> using a poor plan.

Moreover, none of these rewritten queries would work properly for

select min(foo) from tab group by bar

This should still be aware it can use an index on  and get much
better performance. Well it can't know, but it should be able to. All it
should really need to know is that min() only needs a particular subset of the
dataset -- namely the first record, as long as the records are provided in a
particular order.

Also, the same code ought to be able to handle

select first(foo) from tab group by bar

Which is exactly equivalent to the min() case except that no particular
ordering is required.

-- 
greg


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings