from:"Jeroen T. Vermeulen"

Re: [HACKERS] Oracle indemnifies PostgreSQL on its patents

2007-04-01 Thread Jeroen T. Vermeulen

On Sun, April 1, 2007 01:32, Tom Lane wrote:

> The idea of OIN is to have a large patent pool that can be
> counter-asserted against anyone who doesn't want to play nice.
> Mutual assured destruction in the patent sphere, if you will.

And from the participants' point of view, I suppose the big attraction
must be that they do away with a threat to their patents.  If you have a
patent that matches what some open project (not worth suing) has been
doing for the past few years, then anyone else you might want to sue about
the patent could point to that project and say "if you have a valid
patent, why didn't you say something when they infringed it?"

Jeroen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] What is the maximum encoding-conversion growth rate, anyway?

2007-05-29 Thread Jeroen T. Vermeulen

On Tue, May 29, 2007 20:51, Tatsuo Ishii wrote:

> Thinking more, it striked me that users can define arbitarily growing
> rate by using CFREATE CONVERSION. So it seems we need functionality to
> define the growing rate anyway.

Would it make sense to define just the longest and shortest character
lengths for an encoding?  Then for any conversion you'd have a safe
estimate of

  ceil(target_encoding.max_char_len / source_encoding.min_char_len)

...without going through every possible conversion.

Jeroen

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Using the GPU

2007-06-09 Thread Jeroen T. Vermeulen

On Sat, June 9, 2007 07:36, Gregory Stark wrote:
> "Billings, John" <[EMAIL PROTECTED]> writes:
>
>> Does anyone think that PostgreSQL could benefit from using the video
>> card as a parallel computing device?  I'm working on a project using
>> Nvidia's CUDA with an 8800 series video card to handle non-graphical
>> algorithms.  I'm curious if anyone thinks that this technology could be
>> used to speed up a database?  If so which part of the database, and what
>> kind of parallel algorithms would be used?
>
> There has been some interesting research on sorting using the GPU which
> could be very interesting for databases.

> Perhaps this can be done using OpenGL already but I kind of doubt it.

GPUs have been used to great effect for spatial joins.  And yes, using
OpenGL so that it was portable.  I saw a paper about that as an Oracle
plugin a few years back.

It works something like this, IIRC: a spatial join looks for objects that
overlap with the query area.  Normally you go through an R-tree index to
identify objects that are in the same general area (space-filling curves
help there).  Then you filter the objects you get, to see which ones
actually overlap your query area.

The GL trick inserted an intermediate filter that set up the objects found
in the R-tree index, and the query area, as 3D objects.  Then it used GL's
collision detection as an intermediate filter to find apparent matches. 
It has to be slightly conservative because GL doesn't make the sort of
guarantees you'd want for this trick, so there's a final software pass
that only needs to look at cases where there's any doubt.

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 04:45, Chris Browne wrote:
> I'm seeing some applications where it appears that there would be
> value in introducing asynchronous messaging, ala "message queueing."
> 
>
> The "granddaddy" of message queuing systems is IBM's MQ-Series, and I
> don't see particular value in replicating its functionality.

I'm quite interested in this.  Maybe I'm thinking of something too
complex, but I do think there are some "oh it'll need to do that too"
pitfalls that are best considered up front.

The big thing about MQ is that it participates as a resource manager in
two-phase commits (and optionally a transaction manager as well).  That
means that you get atomic processing steps: application takes message off
a queue, processes it, commits its changes to the database, replies to
message.  The queue manager then does a second-phase commit for all of
those steps, and that's when the reply really goes out.  If the
application fails, none of this will have happened so you get ACID over
the complete cycle.  That's something we should have free software for.

Perhaps the time is right for something new.  A lot of the complexity
inside MQ comes from data representation issues like encodings and
fixed-length strings, as I recall, and things have changed since MQ was
designed.  I agree it could be useful (and probably not hard either) to
have a transactional messaging system inside the database.  It saves you
from having to do two-phase commits.

But it does tie everything to postgres to some extent, and you lose the
interesting featuresatomicity and assured, single deliveryas soon as
anything in the chain does anything persistent that does not participate
in the postgres transaction.  Perhaps what we really need is more mature
components, with a unified control layer on top.  That's how a lot of
successful free software grows.  See below.

> On the other side, the "big names" these days are:
>
> a) The Java Messaging Service, which seems to implement *way* more
>options than I'm even vaguely interested in having (notably, lots
>that involve data stores or lack thereof that I do not care to use);

Far as I know, JMS is an API, not a product.  You'd still slot some
messaging middleware underneath, such as MQ.  That is why MQSeries was
renamed: it fits into the WebSphere suite as the implementing engine
underneath the JMS API.  From what I understand MQ is one of the
"best-of-breed" products that JMS was designed around.  (Sun's term, bit
hypey for my taste).

In one way, Java is easy: the last thing you want to get into is yet
another marshaling standard.  There are plenty of "standards" to choose
from already, each married to one particular communications mechanism:
RPC, EDI, CORBA, D-Bus, XMLRPC, what have you.  Even postgres has its own.
 I'd say the most successful mechanism is TCP itself, because it isolates
itself from content representation so effectively.

It's hard not to get into marshaling: someone has to do it, and it's often
a drag to do it in the application, but the way things stand now *any*
choice limits the usefulness of what you're building.  That's something
I'd like to see change.

Personally I'd love to see marshaling or low-level data representation
isolated into a mature component that speaks multiple programming
languages on the one hand and multiple data representation formats on the
other.  Something the implementers of some of these messaging standards
would want to use to compose their messages, isolating their format
definitions into plugins.  Something that would make application writers
stop composing messages in finicky ad-hoc code that fails with unexpected
locales or has trouble with different line breaks.

If we had a component like that, combining it with existing transactional
variants of TCP and [S]HTTP might even be enough to build a usable
messaging system.  I haven't looked at them enough to know.  Of course
we'd need implementations of those protocols; see
http://ttcplinux.sourceforge.net/ and http://www.csn.ul.ie/~heathclf/fyp/
for example.

Another box of important tools, and I have no idea where we stand with
this one, is transaction management.  We have 2-phase commit in postgres
now.  But do we have interoperability with existing transaction managers? 
Is there a decent free, portable, everything-agnostic transaction manager?
 With those, the sphere of reliability of a database-driven messaging
package could extend much further.

A free XA-capable filesystem would be great too, but I guess I'm daydreaming.

> There tend to be varying semantics out there:
>
> - Some queues may represent "subscriptions" where a whole bunch of
>   listeners want to get all the messages;

The two simplest models that offer something more than TCP/UDP are 1:n
reliable publish-subscribe without persistence, and 1:1 request-reply with
persistent storage.  D-Bus does them both; IIRC MQ does 1:1 and has
add-ons on top for publish

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 18:18, Heikki Linnakangas wrote:
> Marko Kreen wrote:
>> As I understand, JMS does not have a concept
>> of transactions, probably also other solutions mentioned before,
>> so to use PgQ as backend for them should be much simpler...
>
> JMS certainly does have the concept of transactions. Both distributed
> ones through XA and two-phase commit, and local involving just one JMS
> provider. I don't know about others, but would be surprised if they
> didn't.

Wait...  I thought XA did two-phase commit, and then there was XA+ for
*distributed* two-phase commit, which is much harder?


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] PG-MQ?

2007-06-20 Thread Jeroen T. Vermeulen

On Wed, June 20, 2007 19:42, Rob Butler wrote:
> Do you guys need something PG specific or built into PG?
>
> ActiveMQ is very nice, speaks multiple languages, protocols and supports a
> ton of features.  Could you simply use that?
>
> http://activemq.apache.org/

Looks very nice indeed!


Jeroen



---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

Hi all,

I've run into a case where I get bad performance that doesn't sound too
hard to solve.  Question is: is it worth solving?

The situation is this: I have a table that can grow to a large number of
rows, then shrink to zero over a large number of quick, consecutive
transactions.  The primary key index for the table is getting a lot of use
in the process.

But whenever perhaps one-third or so of the rows have been deleted, the
planner stops using that index and resorts to sequential scans.  I tried
suppressing that by toggling enable_seqscan: works as advertised, but
performance is still terrible until (as far as I can make out) the next
analyze run has completed!

So I suppose the planner has a good reason to ignore the index at that
point.  I'm assuming that this is something to do with the correlation
between the index and the column's statistics degrading in some way.

I also tried doing my own analyze runs on just the primary key index. 
That will complete very quickly, and performance is restored for a while. 
But as far as I can tell, a regular automatic analyze run will block my
own, more limited one on the same table.  So performance is still bad, and
now it's irregular to boot.

This makes me wonder: when the planner finds that an index is no longer
worth using because its corresponding statistics are out of date, and it's
cheap to update those same stats, maybe it should do so?  Even if there's
also going to be a full analyze on the table, it could be worthwhile to do
this quick limited run first.  (Though not if one is already underway, of
course).

All this is based largely on guesswork, so if I've got it all wrong,
please enlighten me!


Jeroen



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 18:15, Gregory Stark wrote:

>> So I suppose the planner has a good reason to ignore the index at that
>> point.  I'm assuming that this is something to do with the correlation
>> between the index and the column's statistics degrading in some way.
>
> Best to post "explain analyze " for when the performance is good
> and
> bad. Perhaps also an explain analyze for the query with enable_seqscan off
> when it's bad.

Can't easily do that anymore...  AFAIR the plans were all identical
anyway, except in the "enable_seqscan bad" case which used a sequential
scan instead of using the index.  The queries are very simple, along the
lines of "select * from foo where id >= x and id < y".

> Also, which version of Postgres is this?

It was an 8.2 version.

> It's possible you just need vacuum to run more frequently on this table
> and
> autovacuum isn't doing it often enough. In which case you might have a
> cron
> job run vacuum (or vacuum analyze) on this table more frequently.

Actually, come to think of it, I don't think I'd want any vacuums at all
on this particular table.  Just the analyze on the primary key, no
vacuums, no statistics on anything else.  Unfortunately it's not just one
table, but a set of tables that can be created dynamically.  I could
change that, but in this particular case I don't think I should.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] ANALYZE and index/stats degradation

2007-07-02 Thread Jeroen T. Vermeulen

On Mon, July 2, 2007 22:17, Gregory Stark wrote:

> The way you described it there were records being inserted and later
> deleted.
> Why wouldn't you need vacuums?
>
> Or are all the records eventually deleted and then the table truncated or
> dropped before the next batch of inserts?

In a nuthshell, yes.  The problem is I can't delete them all at once; it
happens in batches, and that means that stats degrade in the meantime.


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql compile problem

2007-07-17 Thread Jeroen T. Vermeulen

On Wed, July 18, 2007 11:07, [EMAIL PROTECTED] wrote:

> Eecently, I have downloaded the postgresql-8.1.9.tar.gz from the official
> website,and then I install in my linux System ,whose gcc version is
> 2.9.6.Although I can install it successfully,then result version I check
> is 7.2.1~£¬and how can this happen,can u tell me the reason?

You are on the wrong mailing list.  The pgsql-hackers list is for
discussion related to the development of postgres.  Try pgsql-general.

For the record, gcc 2.9.6 does not exist.  If it did, it would be very old.

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] PAM authentication fails for local UNIX users

2007-08-20 Thread Jeroen T. Vermeulen

On Mon, August 20, 2007 19:52, Andrew Dunstan wrote:

> I'd rather see an HBA fallback mechanism, which I suspect might overcome
> most of the  problems being encountered here.

I implemented a form of that once, so on local connections you could do
ident mapping with fallback to PAM or some other password authentication. 
That seemed useful, e.g. for granting non-interactive access to a program
running under a dedicated user and requiring a password from everyone
else.  The implementation also allowed for a bit more flexibility in the
auth mechanism.

The main objections I recall were:

1. The protocol doesn't allow for multiple authentication prompts.  My own
proposal didn't have that problem since it only introduced an "optional
ident" authentication that continued looking for a matching rule if the
current user name was not in the given map, but it's a problem for more
general approaches.

2. For real, fully generalized fallback, you'd also need to overhaul the
HBA config file format completely.

3. It was considered unsafe to add even the most limited of fallback
options, because the HBA config is designed to pick just one auth
mechanism for any connection attempt, based on only the first three
columns of the config file.  An admin who didn't understand the new auth
mechanism could use it to write an unsafe HBA configuration, provided it
also broke the existing "go from specific-permissive to
general-restrictive" design guideline.

Personally I think it'd take some careful aim to shoot yourself in the
foot like that, but IIRC it was enough for an "I don't like it" vote.

Jeroen

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Patch to update libpqxx's homepage in README

2008-03-20 Thread Jeroen T. Vermeulen

On Sun, February 10, 2008 18:35, Gurjeet Singh wrote:
> libpqxx seems to have moved around quite a bit. The attached patch
> corrects
> libpqxx's homepage.

Thanks for that.  However just http://pqxx.org/ would be best.  I'm just
setting up new hosting, and I may not get everything completely
link-compatible.

Jeroen

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Integrating libpqxx

2002-06-12 Thread Jeroen T. Vermeulen


I think libpqxx, the alternative to libpq++, is just about ready for
prime time.  That means integrating it with the main source tree, I
suppose, but I have no idea where to start--particularly because libpqxx
has its own configure setup.

Anyone who can help me with this?


Jeroen

PS: find libpqxx source & description at
http://members.ams.chello.nl/j.vermeulen31/



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Integrating libpqxx

2002-06-12 Thread Jeroen T. Vermeulen

On Wed, Jun 12, 2002 at 04:04:36PM -0400, Neil Conway wrote:
>
> Otherwise, if you put the code into src/interfaces/libpqxx and modify
> the PostgreSQL build system to be aware of it (as well as removing
> libpqxx's autoconf stuff), it shouldn't be too difficult.

One concern I have on this point is that not all platforms are going to
be able to build libpqxx.  Also, there'd have to be a lot of C++ stuff
in the existing config.h which I guess was meant to be C.  

Anyway, I found I'm not much good with automake and so on.  I'm trying
to merge the two configure.ins, but I feel I must be missing a lot of
details.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Integrating libpqxx

2002-06-12 Thread Jeroen T. Vermeulen


On Wed, Jun 12, 2002 at 05:48:46PM -0400, Bruce Momjian wrote:
> 
> I can add it to CVS as interfaces/libpqxx and we can then let others
> merge your configure tests into our main configure.  Let me know when
> you want it dumped into CVS.

Might as well do it right now, with 0.5.2.  We'll call that 1.0, and 
leave the more radical future plans for 2.0.  

There are some things I'd like to do in future 1.x releases that will 
affect the interface:
 - nonblocking operation, probably as a latency-hiding tuple stream;
 - change the way you select the quality of service for your transactor;
 - allow notice processors to have C++ linkage;
 - addtional bits & bobs like field and column iterators.

OTOH there's no point in delaying 1.0 forever I guess.

FWIW, I'm thinking of doing at least one of the following in 2.0:
 - an easy-to-use but intrusive object persistence layer; 
 - offload some of the work to BOOST if possible;
 - adapt the interface to be more database-portable.

But back to 1.0...  Would it be a useful idea to also integrate my own
CVS history into the main tree?  Or should I just keep developing in
my local tree and submit from there?


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Integrating libpqxx

2002-06-13 Thread Jeroen T. Vermeulen


On Wed, Jun 12, 2002 at 10:41:32PM -0400, Tom Lane wrote:
> 
> I'm thinking we should just import the current state of the files
> and not worry about preserving their change history.

Fine with me, if that's easier.  I just thought it might be "nice to have"
but I can't think of any compelling reason to go to any trouble. 


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Integrating libpqxx

2002-06-13 Thread Jeroen T. Vermeulen


On Thu, Jun 13, 2002 at 09:15:05AM -0300, Marc G. Fournier wrote:
> 
> Jeroen ... can you send me a copy of the CVSROOT for this?  Email will
> work ... if we can, I would like to save the development history, and I
> *think* I can ...

I already sent one to Bruce last night, IIRC.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] Case sensitive searches

2002-06-27 Thread Jeroen T. Vermeulen


I've just come across a case in Oracle 8.0.6 where important queries
could have been several orders of magnitude faster if only the optimizer
had realized that it was doing case-insensitive comparisons against a
constant that wasn't affected by case (a string of all digits).

The query was of the general form

SELECT * FROM table
WHERE upper(id) = '001234'

...causing a full index scan (there was a non-unique index on id).  What
the optimizer could perhaps have done was something like

if (upper('001234') == lower('001234'))
SELECT * FROM table
WHERE id = '001234';
else
SELECT * FROM table
WHERE upper(id) = '001234';

Even without the index I guess that would have saved it a lot of work.
In this case, of course, the user wasn't doing the smartest thing by
giving millions of records a numerical id but storing it as varchar.
OTOH there may also be a lot of cases like

SELECT * FROM table
WHERE upper(name) LIKE '%'

being generated by not-too-bright applications out there.

Does PostgreSQL do this kind of optimization?  If not, how easy and how
useful would it be to build it?  I suppose this sort of thing ought to
be in src/backend/optimizer/prep/ somewhere, but I couldn't find
anything like it.


Jeroen





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] Integrating libpqxx

2002-07-03 Thread Jeroen T. Vermeulen


On Tue, Jul 02, 2002 at 02:05:57PM -0400, Bruce Momjian wrote:
> 
> Jeroen, do you have PostgreSQL CVS access yet?  If not, we need to get
> you that.

Don't have it yet, so please do!


Jeroen




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: Trim the Fat (Was: Re: [HACKERS] Open 7.3 items )

2002-08-02 Thread Jeroen T. Vermeulen

On Thu, Aug 01, 2002 at 09:07:46PM -0300, Marc G. Fournier wrote:
> 
> > Of course my humble but thoroughly biased opinion is that libpq++ be
> > marked "legacy."
> 
> No doubt, but, if we didn't "push" one interface over another, then it
> would be up to the end-users themselves to decide which one to install ...

In theory, yes.  In this case, however, I see two arguments for making
the distinction anyway:

1. Some people won't want to go to the trouble of comparing available 
interfaces, so they may default to libpq++ because it's what they found
first, or because they find mentions of it as the official C++ interface.
I think it would be a shame to have new users default to libpq++ "by 
accident."  I think many users would prefer to rely on the PostgreSQL 
team's judgment--as they do by choosing Postgres in the first place.

2. I get the impression that libpq++ sort of got stuck before it was
completed.  For the time being libpqxx appears to have better maintenance
prospects.  Users will want to be aware of this before making a choice.

> Okay, then let's word it another way ... if libpq++ *wasn't* in the
> repository and part of the distribution, would you have a) started working
> on it sooner?  b) would you have made it public earlier?  c) would ppl
> have started to use it and stop'd using libpq++?

I'm not sure there's much point to going into a single example in detail,
but for completeness' sake I'll just answer these:

a) Yes.
b) No, because in my case I was encouraged by team members' endorsement of
first my suggestions for libpq++, and later a full replacement.  Without
that, and without an active libpq++ maintainer around, libpqxx might never 
have gotten off the ground.
c) I'd like to think so, yes--but exposure would have been harder.

> Basically, with libpq++ in the distribution, we are endorsing its use, so
> if we didn't put libpqxx into the repository, would ppl switch from the
> 'endorsed' to the 'unendorsed' version?
> 
> By having libpq++ in the repository, we are adding weight to it that it
> wouldn't have if it were outside of the repository, making it more
> difficult for 'alternatives' to pop in ...

I definitely agree here.  See above.

> > For the more general case, there's the problem of release management: who's
> > going to be taking care of synchronizing releases?  This may require some
> > new infrastructure, such as a mailing list dedicated to the process, or one
> > restricted to subproject maintainers.  Or something.

This reminds me of another potential complication: how would unbundling
affect the licensing situation?  Mixing and matching components is good
in many ways, but it could complicate the situation for end-users--who
probably like to be able to rely on the team's judgment on this issue as 
well.

I feel compelled at this point to admit that I prefer the GPL.  This is
a personal preference, which I set aside because I wanted and expected 
libpqxx to become the standard C++ interface.  Had these interfaces not
been bundled, I would have had less incentive to conform to Postgres'
licensing conditions.  I think having a different license would have
made everyone's life a little harder.

Jeroen

(and yes, I'm trying to repair my From: lines!)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] AnonCVS woes

2002-08-10 Thread Jeroen T. Vermeulen


On Sat, Aug 10, 2002 at 06:05:27PM -0400, Rod Taylor wrote:
> P src/interfaces/libpqxx/configure.ac
> cvs server: Updating src/interfaces/libpqxx/config
> U src/interfaces/libpqxx/config/.cvsignore
> cvs server: Updating src/interfaces/libpqxx/debian
> cvs server: failed to create lock directory for
> `/projects/cvsroot/interfaces/libpqxx/debian'
> (/projects/cvsroot/interfaces/libpqxx/debian/#cvs.lock): Permission
> denied
> cvs server: failed to obtain dir lock in repository
> `/projects/cvsroot/interfaces/libpqxx/debian'
> cvs [server aborted]: read lock failed - giving up

I only just added that directory, so chances are you were trying to
check it out while I was committing it.

Could you try again?


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] libpqxx

2002-08-12 Thread Jeroen T. Vermeulen

On Mon, Aug 12, 2002 at 04:44:12PM -0300, Marc G. Fournier wrote:

> For this, all I've been waiting for is for J to get the standalone to
> build and then going to dive into that ...

I added Ray's changes a few days back, which may help.  My handicap is
that I appear to be on a newer version of libtoolize than you are, which
is where Marc's build appears to fail, so obviously it just builds on my 
system like it did all along.

So, any luck with the version currently in CVS?

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] anoncvs currently broken

2002-08-12 Thread Jeroen T. Vermeulen

On Mon, Aug 12, 2002 at 09:38:00PM -0300, Marc G. Fournier wrote:
> 
> should be fixed ... looks like just an ownership issue on a new directory

More like I uploaded that directory just as you were rsync'ing to
anonymous CVS and a lock file got copied along, but was never deleted
on a subsequent rsync.  Or so it's been suggested to me.

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Open 7.3 items

2002-08-19 Thread Jeroen T. Vermeulen


On Sat, Aug 17, 2002 at 11:08:45PM -0400, Bruce Momjian wrote:
> 
> integrate or move to gborg libpqxx, Pg:DBD

It's no longer my CVS home tree...  Is there something I can/should 
do for this?


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] 7.3: Change in cursor behaviour?

2002-12-02 Thread Jeroen T. Vermeulen

I've been getting reports of one of my test scenarios for libpqxx
failing with postgres 7.3.  At the moment I can't reproduce this (I'm
still on 7.2) and I can't find anything pertinent in CVS commit
messages, mailing lists etc. so I'd really appreciate any lucidity from
this list.

My problem appears to be related to moving a cursor backwards beyond its
point of origin.  I realize this is a tacky thing to do, but I sort of
got the impression that moving backwards by some number larger than the
current position would be equivalent to MOVE BACKWARD ALL.  Which makes
my code a hell of a lot simpler.  It makes a good portion of my life a
hell of a lot simpler, come to think of it, so I'd really appreciate
having this property in the future.

The scenario boils down to: Create a cursor, fetch n rows, move minus 2
billion or so rows, fetch 1 row.  That last fetch used to give me the
row I was hoping for (the original first row again), but with 7.3 it 
appears to yield nothing.

Is this intentional?  Should I change my code?  Have I been a bad boy?


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] 7.3: Change in cursor behaviour?

2002-12-02 Thread Jeroen T. Vermeulen

On Mon, Dec 02, 2002 at 02:29:03PM -0500, Rod Taylor wrote:
> 
> Seems to work the fine for me:

Puzzling...

Would you mind, if you have time, downloading libpqxx from GBorg and 
doing a ./configure; make; make check and telling me if tests 19 & 38
succeed?  I expect them to have identical results, so you're likely to
see either two failures or none.  Be aware that the test run will add
two tables, events and orgevents to your default database and delete
any contents found in them.

Download is at

  http://gborg.postgresql.org/project/libpqxx/download/download.php

and direct URL for the source tarball is

  ftp://gborg.postgresql.org/pub/libpqxx/stable/libpqxx-1.1.4.tar.gz

TIA!

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] 7.3: Change in cursor behavior?

2002-12-04 Thread Jeroen T. Vermeulen

On Wed, Dec 04, 2002 at 12:22:41AM +, Sigurdur Gunnlaugsson wrote:
>  
> test=# move -10 in test_c;
> MOVE 4
> test=# fetch 1 from test_c;
>  schemaname | tablename | tableowner | hasindexes | hasrules | hastriggers
> +---+++--+-
> (0 rows)
 
> I have not tried 7.3 final so I can't say if this happens there also.
> Seems to be fixed in 7.4devel.

Thanks for the data!  So this might be a bug in 7.3?  Perhaps the cursor
thinks it's at position -6 then and fetches no rows until it gets back
to zero...


Jeroen


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] PQnotifies() in 7.3 broken?

2002-12-04 Thread Jeroen T. Vermeulen

Thanks to Rod Taylor's kind help in donating a system account, I've been
able to test libpqxx against postgres 7.3.  Unfortunately, I'm running
into several problems.  One thing that broke libpqxx was a change in 
cursor behaviour that according to Sigurdur Gunnlaugsson seems to be
gone in the 7.4 development tree.  But I'm hitting other snags as well.

When receiving a trigger notification under 7.3, the structure returned
by PQnotifies() appears to be bogus.  In a test I ran, its be_pid was
consistently zero and its relname pointed into never-neverland.

Has anyone else come across this?


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-05 Thread Jeroen T. Vermeulen

On Thu, Dec 05, 2002 at 03:27:23PM -0500, Tom Lane wrote:
> 
> It is not real clear to me whether we need a major version bump, rather
> than a minor one.  We *do* need to signal binary incompatibility.  Who
> can clarify the rules here?

One thing I wonder about: should the rules make any distinction between
API incompatibilities and client protocol incompatibilities?  For the
former I would imagine one would like to have some "minor" version number
increase whenever features are added and a "major" number be incremented
when changes become incompatible.  For the former, one would probably 
want to have a similar rule but with a dichotomy between server-side
upgrades and client-side upgrades.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] more compile warnings

2002-12-06 Thread Jeroen T. Vermeulen

On Fri, Dec 06, 2002 at 11:16:30PM -0500, Bruce Momjian wrote:
> 
> I use:
> 
>   -Wall -Wmissing-prototypes -Wmissing-declarations -Wpointer-arith -Wcast-align
 
Some other useful ones are -pedantic -W -Wfloat-equal -Wshadow
-Wcast-qual -Wwrite-strings -Wconversion -Wsign-compare -Wsign-promo.


> You would think that would catch it.  My problem is that I am compiling
> with -O0, because I compile all day and I don't care about optimization.
> In this case, the -O3 is doing some optimization that catches the
> problem, while -O0 does not.  Interesting.  Even -O catches it.

Last time I checked (which was admittedly some time ago) all the
interesting analysis that could give you new warnings was done by -O;
-O2 mostly involves the back-end, and -O3 adds pretty much nothing
except aggressive inlining.  Which was more likely to trigger compiler
bugs at the time than to find anything in your code.


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] More on cursors in 7.3

2002-12-08 Thread Jeroen T. Vermeulen

Looking at my problem with changed cursor behaviour in 7.3 again, I
noticed something interesting: a cursor in 7.3 apparently does not let 
you scroll back to its first row at all!  Neither a "move backward all"
or a "move -n" where n is equal to or greater than the cursor's current 
position, will let you fetch any more rows from the cursor.  Scrolling 
back to the second row does work though.

I can work around this in libpqxx, without too much trouble, but I'm 
not sure I should have to.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] More on cursors in 7.3

2002-12-08 Thread Jeroen T. Vermeulen

On Sun, Dec 08, 2002 at 04:28:38PM -0500, Tom Lane wrote:
> 
> I believe it is true though that backing up a cursor only works for
> certain plan types (seqscan, indexscan, sort, maybe a couple others).
> That has always been true --- 7.3 is no better nor worse than prior
> releases.

Ah, I didn't know that.  I guess the plan for "select * from pg_tables"
must have changed in 7.3 then.

Is any of this described in the docs somewhere?


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] More on cursors in 7.3

2002-12-08 Thread Jeroen T. Vermeulen

On Sun, Dec 08, 2002 at 05:09:09PM -0500, Tom Lane wrote:
> 
> > Is any of this described in the docs somewhere?
> 
> Fraid not.

Damn & blast.  I was rather counting on cursors that could back up for
my nifty CachedResult class (which acts more or less like a normal result 
set but transparently fetches rows on demand).

Now if I understood a bit more of what's going on here, at least I could
document it...

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] More on cursors in 7.3

2002-12-08 Thread Jeroen T. Vermeulen

On Sun, Dec 08, 2002 at 05:28:22PM -0500, Tom Lane wrote:
> 
> Well, you could dig through backend/executor/node*.c and see which of
> the node types pay attention to es_direction.  To a first approximation
> it looks like these do:
 
I'll be honest with you: I don't know much about the internals and this
is pure Greek to me...  And I never was much good at Greek in school.


> although I have not thought about which other upper plan nodes might be
> okay (ie, they're safe if their input nodes are).  Also, a Material or
> Sort node will hide any direction-unsafety in its input.

More Greek, I'm afraid.  :-(


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-13 Thread Jeroen T. Vermeulen

On Fri, Dec 13, 2002 at 12:34:58AM -0500, Bruce Momjian wrote:
> 
> I did only minor, which I knew was safe.  Do folks realize this will
> require recompile of applications by 7.3 users moving to 7.3.1?  That
> seems very drastic, and there have been very few problem reports about
> the NOTIFY change.

In my case, I only know it broke some test cases.  But still, isn't it
better to err on the safe side and at least give the user a clue that
the upgrade isn't necessarily supposed to work with his existing code?

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-13 Thread Jeroen T. Vermeulen

On Fri, Dec 13, 2002 at 09:06:47PM +, Oliver Elphick wrote:
> 
> Yes.  You will have libpq.so.3.0 in 7.3.1; and you have libpq.so.2.2
> from 7.3 (and also from 7.2.x, though in fact they are different).  If
> you have installed 7.3.1 on top of 7.3, you will have libpq.so.3
> (symlinked to libpq.so.3.0) from 7.3.1, and libpq.so.2 (symlinked to
> libpq.so.2.2) from an earlier release.

Doesn't that mean that a user upgrading from 7.3 to 7.3.1 will have two
libraries, libpq.so.2.2 and libpq.so.3, that both adhere to the 7.3 ABI?

Perhaps 7.3.1 could include a true, 7.2-style libpq.so.2.2 to overwrite
any 7.3-style version that the original 7.3 may have installed under that
name?

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-15 Thread Jeroen T. Vermeulen

On Sun, Dec 15, 2002 at 11:21:07AM -0500, Tom Lane wrote:
> 
> However, an app linked against libpq++ would also be linked against
> libpq, and so the incompatibility will be flagged by the linker anyway.
> I can see no need to bump libpq++'s own number.

Plus, of course, libpq++ being a C++ library will break compatibility at
some compiler upgrades--which isn't even necessarily visible to anyone
compiling a postgres release.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-16 Thread Jeroen T. Vermeulen

On Mon, Dec 16, 2002 at 11:01:00AM -0500, Bruce Momjian wrote:
> 
> New question --- didn't we change the externally visible PGNotify
> structure in libpq-fe.h in 7.3, and as returned by PQnotifies:

Speaking of which, what if user relies on sizeof(PGnotify::relname)?
That code will recompile without any problems, but it won't actually
work as expected.  So in a way, the change may require more than a
simple recompile.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] PQnotifies() in 7.3 broken?

2002-12-16 Thread Jeroen T. Vermeulen

On Mon, Dec 16, 2002 at 05:41:06PM +0100, Jeroen T. Vermeulen wrote:
> 
> Speaking of which, what if user relies on sizeof(PGnotify::relname)?
 ^
code


Jeroen


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] EXECUTE status (was Re: [ODBC] About server side prepare)

2002-12-20 Thread Jeroen T. Vermeulen

On Fri, Dec 20, 2002 at 12:56:55PM -0500, Tom Lane wrote:
> 
> No.  It would break client libraries, which only expect command tags
> INSERT, UPDATE, DELETE to be followed by counts.

And MOVE, right?


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[HACKERS] GBorg feature requests

2002-12-25 Thread Jeroen T. Vermeulen

First off, happy holidays to all!

Next, can anyone tell me the proper channel for reporting bugs and
feature requests for GBorg itself?  I've looked around, but I may have
missed something.

One feature I'd really like, is to have its automatic email notification
(for bug reports, status changes etc.) include the name of the project
in the subject line.  That way I can tell procmail to sort the things
into their proper mailboxes.


Jeroen


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[HACKERS] MOVE strangeness

2002-12-25 Thread Jeroen T. Vermeulen

Here's something that's been bothering me for a while...  Perhaps this
is correct behaviour, but I can't quite see how.  This does not happen
if I replace the FETCHes by MOVEs.

Here's the reference case:

jtv=> begin;
BEGIN
jtv=> declare c cursor for select * from events;
DECLARE CURSOR
jtv=> fetch 3 in c;
 year |  event   
--+--
 2010 | A Space Oddyssey
 2010 | Oddyssey Two
 2038 | time_t overflow
(3 rows)

jtv=> move -3 in c;
MOVE 2
jtv=> fetch 3 in c;
 year |  event   
--+--
 2010 | A Space Oddyssey
 2010 | Oddyssey Two
 2038 | time_t overflow
(3 rows)

jtv=> end;
COMMIT


Okay, since that "move -3" claims to have moved only 2 rows backwards, I
tried the same but moving backwards by only 2 rows.  This gives me the
same response for the MOVE, but my cursor evidently doesn't end up in
the same place:


jtv=> begin;
BEGIN
jtv=> declare c cursor for select * from events;
DECLARE CURSOR
jtv=> fetch 3 in c;
 year |  event   
--+--
 2010 | A Space Oddyssey
 2010 | Oddyssey Two
 2038 | time_t overflow
(3 rows)

jtv=> move -2 in c;
MOVE 2
jtv=> fetch 3 in c;
 year |  event  
--+-
 2010 | Oddyssey Two
 2038 | time_t overflow
 1971 | jtv
(3 rows)

jtv=> end;
COMMIT


This makes it a bit hard for me to figure out just how far I moved my
cursor backwards!  Moving by BACKWARD ALL will give me the same result
as moving by -3.

Is this behaviour intentional?  If not, can it be fixed?


Jeroen



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] MOVE strangeness

2002-12-26 Thread Jeroen T. Vermeulen

On Thu, Dec 26, 2002 at 02:14:40PM -0500, Tom Lane wrote:   
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Sorry, I am not understanding.  If he does: 
> > ... 
> > here, isn't he sitting at the start of the fourth row, no?  
>   
> No.  He is sitting *on* the third row.  If he now does FETCH 1, he will   
> advance to and return the fourth row; on the other hand, if he does   
> FETCH -1, he will back up to and return the second row.   

So we're talking about pre-increment and pre-decrement, and a cursor
starting at position 0 in a 1-based array (or -1 in a 0-based one)? 

> BTW, looking at Date and the SQL spec, I now realize that the recently
> made change to convert FETCH 0 into a no-op is wrong; per spec, FETCH 
> RELATIVE 0 means "re-fetch the current row, if any".  By analogy, MOVE 0  
> should probably return "MOVE 1" if you are on a real row, "MOVE 0" if 
> you are not, corresponding to the number of rows you'd have gotten from   
> FETCH 0.  Ugly, but ...   

Okay, given that, is there really any reason why MOVE should return the 
number of rows that would have been fetched?  Why not report the number 
of rows moved?  Having two different MOVE commands from the same starting   
positions yield indistinguishable results yet bring you to different
states is very, very awkward and possibly quite useless.  Better yet,   
why should FETCH report the number of rows fetched instead of the   
number of rows moved when you're going to extract the former from the   
PQresult's PQntuples() anyway?  The current status string is completely 
redundant in that case. 

If the status string for MOVE is both inconclusive (as my case shows)   
and inconsistent (the MOVE 0 case) now, but it's linked to the status   
string for FETCH which is redundant, it seems logical to give them  
new, consistent semantics that are useful and intuitive.  Just report   
the number of rows _moved_ and we can all go back to lead rich, 
meaningful lives.  I for one will also sleep better knowing that the
number of rows reported is based on the same counting system as the 
number of rows requested in the SQL command that the status string is   
supposed to echo.  If FETCH 0 expects to fetch 1 row, why not have a
result set of 1 row and a result string that echoes the command?

Which begs the question: is there anything special about the one-before-
first row or can a cursor be moved to any arbitrary point outside its   
actual set of rows?  Oh, what odious web we weave / when first we   
practice to iteratewithoutthebenefitsofzero-basedarithmeticandhalf- 
openintervals.  

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] [GENERAL] Cast your vote ...

2003-01-02 Thread Jeroen T. Vermeulen

On Thu, Jan 02, 2003 at 03:07:48PM -0500, Dan Langille wrote:
> 
> > else to do the same.  In the big picture, marketing statements like
> > this survey mean alot more than most technical folks want to
> > acknowledge.  

The figures would be a lot more interesting anyway if people bothered
to correlate results with user bases.  A vote from someone who knows
several of the contending products means a lot more to me than one
from someone who hasn't seen aything else.

> Coincidentally, I've just started up a voting script project... see 
> http://polls.unixathome.org/

Does it support hanging chads?

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] PostgreSQL libraries - PThread Support, but not use...

2003-01-06 Thread Jeroen T. Vermeulen

On Mon, Jan 06, 2003 at 11:58:17AM -0500, Tom Lane wrote:
> 
> AFAIK, libpq is thread-safe already, it's just not thread-aware.
> What you'd presumably want is a wrapper layer that adds a mutex to
> prevent multiple threads from manipulating a PGconn at the same time.
> Couldn't this be done without touching libpq at all?

In fact it'd probably be better to do it without touching libpq at all,
so the application can tune for the level of thread-safety it needs.
There's no point in locking a PGresult for every time the application
wants to read a field--it'd be unacceptably slow yet some applications 
may want to do it.  But I'm sure this has been discussed in that other
threading thread...

Having a thread-aware wrapper multiplex a PGconn between multiple
client threads using the nonblocking functions probably isn't going to 
wash either, because nonblocking operation isn't quite the same as fully 
asynchronous.  And even if the backend protocol allows for it, there's
no great benefit since the threads would still be waiting on the same
socket and the same backend.  Better to give each thread its own PGconn 
and its own file descriptor to block on, if needed.

Jeroen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Error using cursors/fetch and execute

2003-01-07 Thread Jeroen T. Vermeulen

On Tue, Jan 07, 2003 at 02:29:30PM +0100, Magnus Naeslund(f) wrote:
> 
> mag=# create table test (id serial unique primary key, txt text);
> mag=# insert into test(txt) values('hoho1');
> mag=# prepare berra (integer) as select * from test where id = $1;
> mag=# declare berra_c cursor for execute berra(1);
> ERROR:  parser: parse error at or near "execute" at character 28
> 
> Is there any other way of fetching less than all rows at once, similar
> to that of using cursors. I don't use it for any other purpose than
> that.

Just

begin;
declare berra cursor for select * from test where id = 1;
fetch 100 from berra;
[...]
end;


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Anyone want to get involved in writing the the driver to connect Star/OpenOffice and PostgreSQL?

2003-01-19 Thread Jeroen T. Vermeulen

On Wed, Jan 15, 2003 at 01:20:45PM +1030, Justin Clift wrote:
> 
> Have been discussing what it would take to write an "SDBC" driver for 
> connecting StarOffice/OpenOffice to PostgreSQL with Frank Schönheit, a 
> senior member of the Sun StarOffice/OpenOffice DBA team, and a few 
> senior members of the OpenOffice project.

I think something like this is already being done based on libpqxx.
See http://gborg.postgresql.org/project/libpqxx/bugs/bugupdate.php?403
for a feature request related to this work.


Jeroen


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[HACKERS] Client interfaces documentation

2003-01-24 Thread Jeroen T. Vermeulen

Something that just occurred to me: should the documentation tree still
contain full documentation for the various client interfaces, now that
they have been unbundled?

If so, I'd very much like to see the part about libpq++ being "the" C++ 
interface changed to mention libpqxx as a replacement.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Contract Programmer Advice.

2003-02-11 Thread Jeroen T. Vermeulen

On Tue, Feb 11, 2003 at 08:56:56PM -0500, Lamar Owen wrote:
> 
> This company doesn't dispute any of my invoices and says they are going to pay 
> me.  But they have not yet done so.  This company is still in business, and 

I wouldn't know about your country, whichever that may be, but in the
Netherlands it only takes two unpaid creditors to request bankruptcy
for a company with a judge.  Bankruptcy is declared when a company no
longer pays its dues (note that "ability" doesn't come into this), and 
two outstanding debts are the minimum required to establish that.

If all else fails, that kind of procedure gives a very strong incentive
to pay up.  Your country may have a similar arrangement.

Jeroen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

[HACKERS] psql source suggestions

2003-02-12 Thread Jeroen T. Vermeulen

I've been playing around with the source for psql, and as a result I
did some cleaning up in common.c.  Would anyone be interested in seeing
patches for that?  The main benefit is some eliminated code duplication,
plus the removal of some warts like 'continue' and 'break' where they
weren't needed.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] regression failure - horology

2003-02-22 Thread Jeroen T. Vermeulen

On Sat, Feb 22, 2003 at 03:09:13AM -0500, Tom Lane wrote:
> 
> Mph.  It fails for me too when I use --enable-integer-datetimes.  Looks
> like that patch still needs some work...

Yeah.  I'm really, really, *really* sorry for submitting it in the state
it was in.  I shouldn't have done that just before moving to another
country.  I found the problem last night, but couldn't get to a Net
connection until now.

The problem is in src/bin/psql/common.c, around line 250-335 somewhere
depending on the version.  The 2nd and 3rd clauses of the "while" loop
condition:

(rstatus == PGRES_COPY_IN) &&
(rstatus == PGRES_COPY_OUT))

should of course be:

(rstatus != PGRES_COPY_IN) &&
(rstatus != PGRES_COPY_OUT))

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] I cant find it or I'm just lazy ?

2003-02-25 Thread Jeroen T. Vermeulen

On Tue, Feb 25, 2003 at 02:04:50PM +0100, Christoph Haller wrote:
>
> Anyway, you may MOVE until 0 instead of FETCH, or use the COUNT()
> function on the query to learn about the number of rows to be returned.

Hmm...  Wouldn't the reliability of a count() depend on the isolation
level?

OTOH the problem with MOVE ALL is that not all cursors support backward
scrolling, apparently, and there is no clear documentation (or even
diagnostics!) to determine whether they do.

Jeroen

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] I cant find it or I'm just lazy ?

2003-02-25 Thread Jeroen T. Vermeulen

On Tue, Feb 25, 2003 at 05:55:59PM +, Darko Prenosil wrote:
> 
>   I am trying to create client buffer that will show only records that are
> needed by application(visible). Data should be send to client in "pages", not
> all the data at once. The idea is not to query for data that are already in
> the buffer.

BTW, if your applications happens to be in C++, libpqxx has a class
called CachedResult that would take a lot of this work out of your
hands.  It transparently fetches rows on-demand and caches them so
they don't get read more than once.  It can also figure out the
size of your result set for you.

You can find libpqxx at http://pqxx.tk/

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [HACKERS] I cant find it or I'm just lazy ?

2003-02-25 Thread Jeroen T. Vermeulen

On Tue, Feb 25, 2003 at 07:34:12PM +, Darko Prenosil wrote:
>
> Unfortunately it is application written in QT library that should work on 
> Windows too, but I'll take a look, I'm sure I can learn something from it !

Well, libpqxx also runs on Windows but it takes a decent compiler (e.g.
Visual C++ 6.0 isn't quite good enough) and some manual labour to set up
your own project file.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] I cant find it or I'm just lazy ?

2003-02-26 Thread Jeroen T. Vermeulen

On Wed, Feb 26, 2003 at 09:44:14AM +, Darko Prenosil wrote:
>
> I got the sources yesterday. Thank you !

Let me know whether everything works for you.  There's also a mailing
list on pqxx.tk if you need it.


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] talking to postgresql from C/C++

2003-03-07 Thread Jeroen T. Vermeulen

On Fri, Mar 07, 2003 at 12:14:30PM -0500, Merlin Moncure wrote:
> 
> What about libpq++? I have not used the thing, but if he absolutely
> insists on using C++ in his database interface that's at least worth
> checking out.  Same for embedded C.

And of course there's libpqxx.  I haven't heard from anyone who's tried
it on Borland C++ Builder yet, but I'd love to.  It is known to work on
several other compilers, so why not?  It's a lot more useful than the
old libpq++ IMHO.

See http://pqxx.tk/

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] Cursors and backwards scans and SCROLL

2003-03-10 Thread Jeroen T. Vermeulen

On Sun, Mar 09, 2003 at 03:35:11PM -0500, Tom Lane wrote:
> 
> 2. Error out only if a backwards fetch is actually attempted on a plan
> tree that can't handle it (which could only happen if SCROLL wasn't
> given).  This is efficient and flexible, but it exposes implementation
> details to the user, in that whether an error occurs will depend on
> which plan the system happens to choose.  

I wouldn't worry too much about exposing implementation details at 
present--the existing situation does the same, except it pretends that
not returning data that should be there isn't an error.  Absent any
further documentation, I'd call that a bug that needs to be fixed.

> There are cases where
> identical cursor definitions might allow or not allow backwards fetch
> depending on the planner's choices.  Notice though that errors could
> occur only in cases that would silently fail in the present code; so
> existing applications that work reliably would not see such errors.

Would it be possible to give warnings in a narrow superset of the
problematic cases, something along the lines of "I'm scrolling backwards
for you now, but there's no reason why that should work on this same query 
tomorrow"?

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-12 Thread Jeroen T. Vermeulen

On Thu, Jun 12, 2003 at 08:08:28PM -0400, Bruce Momjian wrote:
> > 
> > I'm unconvinced, because I've only ever heard of the problem affecting
> > Postgres on Linux.
> 
> What I don't understand is why they just don't start failing on
> fork/malloc rather than killing things.

I may be way off the mark here, falling into the middle of this as I am,
but it may be because the kernel overcommits the memory (which is sort of
logical in a way given the way fork() works).  That may mean that malloc()
thinks it gets more memory and returns a pointer, but the kernel hasn't
actually committed that address space yet and waits to see if it's ever
going to be needed.

Given the right allocation proportions, this may mean that in the end the
kernel has no way to handle a shortage gracefully by causing fork() or
allocations to fail.  I would assume it then goes through its alternatives
like scaling back its file cache--which it'd probably start to do before
a lot of swapping was needed, so not much to scrape out of that barrel.

After that, where do you go?  Try to find a reasonable process to shoot
in the head.  From what I heard, although I haven't kept current, a lot
of work went into selecting a "reasonable" process, so there will be some
determinism.  And if you have occasion to find out in the first place,
"some determinism" usually means "suspiciously bad luck."

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Thu, Jun 12, 2003 at 07:22:14PM -0700, Ron Mayer wrote:

> I'm guessing any database backend (postgres, oracle)
> that wasn't part of a long-lived connection seems like 
> an especially attractive target to this algorithm.  

Yeah, IIRC it tries to pick daemons that can be restarted, or will be
restarted automatically, but may need a lot less memory after that.

Jeroen

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Fri, Jun 13, 2003 at 09:25:49AM -0400, Bruce Momjian wrote:
> 
> malloc() - should fail right away if it can't reserve the requested
> memory;  assuming application request memory they don't use just seems
> dumb --- fix the bad apps.
> 
> fork() - this is the tricky one because you don't know at fork time who
> is going to be sharing the data pages as read-only or doing an exec to
> overlay a new process, and who is going to be modifying them and need a
> private copy.
> 
> I think only the fork case is tricky.

But how do you tell that a malloc() can't get enough memory, once you've
had to overcommit on fork()s?  If a really large program did a regular
fork()/exec() and there wasn't enough free virtual memory to support
the full fork() "just in case the program isn't going to exec()," then
*any* malloc() occurring between the two calls would have to fail.  That
may be better than random killing in theory, but the practical effect
would be close to that.

There's other complications as well, I'm sure.  If this were easy, we
probably wouldn't be discussing this problem now.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] SOLUTION: Insert a Euro symbol as UTF-8 from a latin1 charset.

2003-06-13 Thread Jeroen T. Vermeulen

On Fri, Jun 13, 2003 at 11:28:36AM -0400, Roland Glenn McIntosh wrote:
> 
> The Euro symbol is unicode value 0x20AC.  UTF-8 encoding is a way of representing 
> most unicode characters in two bytes, and most latin characters in one byte.

More precisely, UTF-8 encodes ASCII characters in one byte.  All other
latin-1 characters take 2 bytes IIRC, with the rest taking up to 4 bytes.

> I don't know why my "20" byte turned into two bytes of E2 and 82.  

Haven't got the spec handy, but UTF-8 uses the most-significant bit(s) of
each byte as a "continuation" field.  If the upper bit is zero, the char
is a plain 7-bit ASCII value.  If it's 1, the byte is part of a multibyte
sequence with a few most-significant bits indicating the sequence's length
and the byte's position in it (IIRC it's something like a countdown to the
end of the sequence).

In a nutshell, you can't just take bits away from your Unicode value and
call it UTF-8; it's a variable-length encoding and it needs some extra
room for the length information to go.

Furthermore, I don't think the Euro symbol is in latin-1 at all.  It was
added in latin-9 (iso 8859-15) and so it's not likely to have gotten a
retroactive spot in the bottom 256 character values.  Hence it will take
UTF-8 more bytes to encode it.

> Furthermore, I was under the impression that a UTF-8 encoding of the Euro sign only 
> took two bytes.  Corroborating this assumption, upon dumping that table with pg_dump 
> and examining the resultant file in a hex editor, I see this in that character 
> position: AC 20

How does that "corroborate the assumption?"  You're looking at the Unicode
value now, in a fixed-length 16-bit encoding.

> I did try the '\0x20AC' method, and '\0x20\0xAC' without success.
> It's worth noting that the field I'm inserting into is an SQL_ASCII field, and I'm 
> reading my UTF-8 string out of it like this, via JDBC:

You can't fit UTF-8 into ASCII.  UTF-8 is an eight-byte encoding; ASCII
is a 7-bit character set.

Jeroen

---(end of broadcast)---
TIP 9: most folks find a random_page_cost between 1 or 2 is ideal

Re: [HACKERS] Pre-allocation of shared memory ...

2003-06-13 Thread Jeroen T. Vermeulen

On Fri, Jun 13, 2003 at 12:32:24PM -0400, Lamar Owen wrote:
> 
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least 
> as large as twice RAM size.  In my case on my 512MB RAM notebook, that meant 
> it wanted 1GB swap.  If you upgrade your RAM you could get into trouble.  In 
> that case, you create a swap file on one of your other partitions that the 
> kernel can use.

RedHat's position may be influenced by the fact that, AFAIR, they use
the Rik van Riel virtual memory system which is inclusive--i.e., you need
at least as much swap as you have physical memory before you really have
any virtual memory at all.  This was fixed by the competing Andrea
Arcangeli system, which became standard for the Linux kernel around
2.4.10 or so.

Jeroen

---(end of broadcast)---
TIP 10: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match

Re: [HACKERS] Access to transaction status

2003-06-19 Thread Jeroen T. Vermeulen

On Thu, Jun 19, 2003 at 05:16:10PM +0200, Christian Plattner wrote:
> 
> Do be able to handle all sorts of failures I needed two functions:
> 
> - A function to get the current XID
> - A function which I can use later to tell if a given XID
> commited/aborted/whatever

I ran into the same need (Bruce, we discussed this at FOSDEM in Brussels
this February) for libpqxx.  My code tries to compensate for the 
possibility that the backend connection is lost while waiting for a reply
to a COMMIT.  The way I worked around it was to create a special record 
at the beginning of the transaction, in a dedicated table that's 
effectively a custom transaction log.  If the record is still there when
I reconnect, the transaction committed.  If not, it didn't.

Obviously this leaves some garbage collection issues, so I'd be really
happy with a way to go back to the server after my connection is lost 
and find out whether my transaction committed or not.

Jeroen

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] Access to transaction status

2003-06-20 Thread Jeroen T. Vermeulen

On Fri, Jun 20, 2003 at 09:35:08AM +0200, Christian Plattner wrote:
> 
> I see a race condition in this approach: if you reconnect too fast, and the
> backend which actually should commit is still in progress (assume it takes a
> while to commit for whatever reasons) you get the impression that it did not
> commit - and a short time later the backend will commit... (before noticing
> that the client connection was lost).
 
Good point.  So far I assumed that a broken connection would take a while
to repair.  OTOH by the time TCP gives up due to a bad network connection,
wouldn't the server reach the same conclusion?


Jeroen


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] Access to transaction status

2003-06-20 Thread Jeroen T. Vermeulen

On Fri, Jun 20, 2003 at 10:20:14AM +0200, Christian Plattner wrote:
> 
> Well, I wouldn't rely solely on TCP when assuring consistency. Also, I don't
> think that the backend will ever inspect its TCP socket while committing.
 
No, but its underlying IP stack would.


> btw: There could be also other reasons for the client to loose the
> connection (i.e. client process crashes).

In that case the client would lose all its state as well, so not really 
a problem that can be handled client-side.


Jeroen


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Access to transaction status

2003-06-20 Thread Jeroen T. Vermeulen

On Fri, Jun 20, 2003 at 02:41:29PM +0200, Christian Plattner wrote:
> 
> Well, my client (actually it is a middleware layer which routes transactions
> to a set of replicas) keeps its own log, because it must be able to handle
> arbitary failures. So it never looses its state.

In that case perhaps we should see if there's anything we can do for
each other.  At the current rate, libpqxx is growing towards a sort of
middleware product, but obviously it's not the right place to tackle
many of the typical middleware problems.

Jeroen

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html

Re: [HACKERS] 2-phase commit

2003-08-28 Thread Jeroen T. Vermeulen

On Tue, Aug 26, 2003 at 08:04:13PM -0400, Christopher Browne wrote:
> 
> Interesting/positive news on this front; the XA specification
> documents are now all available in PDF form "freely", from the Open
> Group, where they used to be fairly pricey.

A step in the right direction, but AFAIC it's too little, too late.
The impression I get, at least, is that it's as good as dead now: Java
may use it, but it hides the details anyway so it might as well not be
there--the Java way is to standardize the API but nothing that goes "on
the wire".  

Lots of proprietary middleware uses XA, but from what I hear there are
enough subtle differences to make mixing-and-matching of products risky
at best--the proprietary way is to bundle products that will work at
least marginally together, and relegate standards to a bullshit point
in the PowerPoint presentations.  "Based on industry standard" means
about the same as "based on a true story."

Then there's the fact that the necessary followup standards never got 
anywhere, and the fact that XA doesn't cope with threading really well.

Don't get me wrong, XA support may well be a good thing.  But at this
stage, personally I'd go for a good 2PC implementation first and worry 
about supporting XA later.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] C++ and libpq

2003-09-02 Thread Jeroen T. Vermeulen

On Tue, Sep 02, 2003 at 04:09:26PM +0200, Czuczy Gergely wrote:
> 
> libpq++ is not gone, i just don't like it. I've written an other C++ API
> for databse systems, name SDBI, but i cannot use it for a stored procedure
> :)

I've never looked at writing native stored procedures, but do you think
this is something libpqxx could help with?

If so, let me know what would be necessary or helpful to add to the
library.

Jeroen

PS Find libpqxx at http://pqxx.tk/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] Transaction status in default psql prompt?

2003-09-04 Thread Jeroen T. Vermeulen

On Wed, Sep 03, 2003 at 11:31:55PM -0400, Bruce Momjian wrote:
> 
> Yes, I like the transaction status being the default prompt, but I don't
> like the prompt shifting.  Remember guys who have the current directory
> in their prompt --- the thing bounces around all over the place.

Is that avoidable if you want to be able to indicate nesting level as well?
There may be a simple but difficult choice there.  Personally I'd like to
see something like an opening brace ("{") in my prompt for a transaction;
that'd be a nice, nagging reminder that I'm in the middle of unfinished
business.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] libpq++

2003-09-09 Thread Jeroen T. Vermeulen

On Tue, Sep 09, 2003 at 01:57:47PM -0400, Bruce Momjian wrote:
> 
> Sure libpq++ and libpqpp are on http://gborg.postgresql.org.

Ahem.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [HACKERS] 2-phase commit

2003-09-10 Thread Jeroen T. Vermeulen

On Tue, Sep 09, 2003 at 08:38:41PM -0400, Bruce Momjian wrote:
> 
> Yep, I assume PITR would be the solution for most failure cases --- very
> ugly of course.

Anything can be broken in some way, if bad luck is willing to work hard
enough.  In at least one, ah, competing company I know of, employees are
allowed by the legal people to say "assured" but not "guaranteed" for
precisely this reason.

First thing is an acceptable failure mode, then you try to narrow its
chances of occurring.  And if worst comes to worst, one example of an
acceptable failure mode is "when in danger or doubt, run in circles,
scream and shout."

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[HACKERS] PQencryptPassword() and encoding

2006-12-19 Thread Jeroen T. Vermeulen

Probably a silly question, but better safe than sorry:

AFAICS there's no way for PQencryptPassword() to see what encoding
applies.  Are we quite sure that that is not a problem?  Or are passwords
ASCII-only?


Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] PQencryptPassword() and encoding

2006-12-19 Thread Jeroen T. Vermeulen

On Wed, December 20, 2006 11:08, Tom Lane wrote:
> "Jeroen T. Vermeulen" <[EMAIL PROTECTED]> writes:
>> Probably a silly question, but better safe than sorry:
>> AFAICS there's no way for PQencryptPassword() to see what encoding
>> applies.  Are we quite sure that that is not a problem?
>
> Right offhand it seems that the worst possible consequence is
> authentication failure: you originally entered your password
> as foobar in encoding X, and then when you enter foobar in
> encoding Y, you get the raspberry.  Do you see something else?

That's definitely the first thing that springs to mind.  I don't suppose
the problems we had with escaping could happen here, and there probably
aren't any security implications.

Getting different password hashes depending on your client encoding would
probably not hit a lot of people, but it would be annoying and hard to
debug where it did happen.  If it can happen in the first place, that is,
which is what I'm asking.

Jeroen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Missing directory when building 8.2.3-base

2007-02-12 Thread Jeroen T. Vermeulen

I built the "-base" version of 8.2.3 today, for installation at a company
I'm helping out.  The build (and later, the installation) gave me an error
about a missing directory "test/regress".  IIRC I downloaded
ftp://ftp.us.postgresql.org/pub/mirrors/postgresql/source/v8.2.3/postgresql-base-8.2.3.tar.bz2

I worked around the problem by creating a directory src/test/regress
containing a Makefile with inert "all" and "install" targets.  That was
enough to get a working installation.

Is this a known problem?  Is there any test procedure that builds the
"base" distribution before release?


Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Missing directory when building 8.2.3-base

2007-02-12 Thread Jeroen T. Vermeulen

On Tue, February 13, 2007 01:45, Peter Eisentraut wrote:

> Most of the core team is convinced that the postgresql-foo tarballs are
> useless, but Marc insists on keeping them.  But since they are nearly
> useless, no one tests them, so it is not surprising that they don't
> work.

Well, hurray for Marc!  I'm writing from a country where "broadband" is
still measured in kilobits per second, and the government censors (and
causes the various companies and government monopolies along the way to
censor) Internet traffic, keeping the ICT infrastructure slow and
unreliable.  International bandwidth comes at premium prices for those who
can afford serious connections.  Much hardware on sale here is either
counterfeit or export products that failed quality-control tests or
otherwise "fell of the boat."  Downloads are sometimes quietly corrupted,
without any errors at the TCP level.  Long-lived connections often time
out.

Not having to download half again the size of a "-base" tarball can make a
difference in those situations, as can not having to download it all in
one single large file.

Jeroen

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Cosmetic note: hit rates in logs

2007-02-13 Thread Jeroen T. Vermeulen

Just noticed a small cosmetic point in the logs when logging statement
performance data: if a statement accesses 0 blocks, the "hit rate" is
given as 0%.

I can see that that makes sense mathematically--insofar as 0/0 makes
mathematical sense at all--but wouldn't it be more helpful to represent
this as a 100% hit rate?

I guess computing hit rate as the limit of 0/x is as valid as computing
the limit of x/x (with x being the number of accesses that approaches
zero).  But when I look at the logs I find myself going "low hit rate
here--oh wait, that's for zero accesses" all the time.  Or would the
change make other people "good hit rate here--oh wait, that's for zero
accesses"?


Jeroen



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-26 Thread Jeroen T. Vermeulen

On Tue, February 27, 2007 06:06, Joshua D. Drake wrote:
>
>> Why do we want this?? Because some apps have *lots* of data and many
>> really don't care whether they lose a few records. Honestly, I've met
>> people that want this, even after 2 hours of discussion and
>> understanding. Plus probably lots of MySQLers also.
>
> Most users will take speed over data loss any day. Whether we want to
> admit it or not.

In that case, wouldn't it make just as much sense to have an equivalent
for this special transaction mode on individual statements, without
transaction context?  I'm guessing that who don't really know or want
transactions would never start one, running lots of loose statements
instead that otherwise get committed individually.

Jeroen

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-27 Thread Jeroen T. Vermeulen

On Wed, February 28, 2007 06:59, Jim C. Nasby wrote:
> On Tue, Feb 27, 2007 at 05:18:28PM -0500, Bruce Momjian wrote:

>> I think we will remove fsync in favor of the new delay, and allow -1 to
>> be the same behavior as fsync off.
>
> Well, presumably we'd still allow fsync for some number of versions...

I'd hate to lose the ability to disable fsync.  I run tons of tests that
don't require any protection against server crashes or hardware failures,
but their speed does matter.  I know it's not the most important
requirement in the world, but speeding up those tests means I can run more
of them, on more hardware, more often.  Test time also affects my
development cycle.

My main worry is where the option is set, though.  For my situation,
selecting a "fast and sloppy" mode when starting the server is clearly the
best choice.  It'd be possible, though awkward, to change my code to use
COMMIT NOWAIT.  But then am I really sure I'm still testing the same
thing?  Plus it introduces a risk of binaries (distributed by others)
accidentally doing COMMIT NOWAIT, as for testing, in production code.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Time-correlated columns in large tables

2007-03-05 Thread Jeroen T. Vermeulen

I'm a bit embarrassed to bring this up here because I don't know much
about storage layout and indexing.  It's probably a silly notion, but if
so, could someone please tell me how and why?

First I'll describe the situation that leads me to write this.  I'm seeing
some performance problems in an application that keeps a large table of
logged events.  Rows aren't normally ever updated, but new ones are
inserted all the time.  There are several fields containing various kinds
of timestamps, some exact and some sloppy.  These and perhaps other
columns are loosely correlated with the order in which the rows are
inserted.

[Q: Does this happen to a lot of users?  Is it worth bothering with?]

Naturally the table is frequently queried for ranges of timestamps of one
kind of another.  It's probably not atypical.  Some of the timestamp
columns are indexed, but I'm concerned about both the size of the indexes
and the cost they add to inserts.  Both queries using the indexes and
insertions can become unacceptably slow sometimes.

[Q: Are the access and insertion costs of an index really non-negligible
compared to those of the table itself?]

It sounded to me like this might possibly be a job for spatial indexes
(degraded down to a single dimension), but I couldn't find enough
documentation to figure out whether they generalize to this usage.  From
what I found, it didn't sound likely.

[Q: Do spatial indexes work on simple scalar values and operators?  If
not, any reason why not?  Otherwise, would they help?]

Bruce suggested partitioning the table, but that won't work well for
multiple independent criteria and comes with a host of scalability and
management issues.  I'd also want something that was transparent to the
application.  Perhaps I'm asking for too much but AFAIK it's exactly that
ability to separate functionality from optimization that made SQL a
success in the first place.

[Q: Is there some other transparent optimization for values that correlate
with insertion/update order?]

So I was wondering whether it would make sense to have a more compact kind
of index.  One that partitions the value range of a given column into
sub-ranges, and for each of those, merely keeps loose track of where those
value ranges might be found.  "Dates from 2006-07-15 to 2006-08-04: try
blocks 99-126 and 175."  Just a fixed maximum of two or three contiguous
block ranges per value range would probably do the trick.  The risk of
having too few, of course, is that one oddly positioned block could make
the index look as if a particular value range was spread out throughout
most of the table.

[Q: Am I re-inventing the wheel?  If not, is there really a robust, linear
correlation between a row's time of insertion or last update and its
storage block number?]

The index would not need to be exact, just a bit pessimistic.  Any
re-partitioning or re-balancing could be done offline in "vacuum analyze"
style.  AFAICS that would allow O(1) insertion cost at runtime, or more
aggressive tradeoffs to reduce run-time degradation of indexing quality. 
Most maintenance could be done incrementally to smooth out its costs over
time.

With an "index" like this, an index scan would be very cheap indeed but
you'd then also scan a small subset of the table itself for exact matches.
 I would hope that would be a win if the indexed value accounts for more
than an infinitesimal portion of table size, and is reasonably correlated
with insertion/update time.  Also I guess bitmap scans on the indexed
values could in principle used compressed bitmaps, excluding areas that
according to the index contain no matching rows.  There might be some
marginal benefit there.

[Q: Would this fit in at all?  Any particular reason why it doesn't make
sense?]


Ready and grateful for any criticism,

Jeroen



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Time-correlated columns in large tables

2007-03-05 Thread Jeroen T. Vermeulen

On Tue, March 6, 2007 03:17, Heikki Linnakangas wrote:

> I think you've just described a range-encoded bitmap index. The idea is
> to divide the range of valid values into a some smallish number of
> subranges, and for each of these boundary values you store a bitmap
> where you set the bit representing every tuple with value < boundary
> value.

That's pretty cool!  From the looks of it, what you describe would solve
my problem--but using more storage in return for more flexibility.  My
scheme really required a correlation between a value and storage order,
which can be pretty fragile.  These range-encoded bitmap indexes wouldn't
have that problem.

I guess if you did simple run-length compression on these bitmaps you'd
end up more or less where I came in.  But you wouldn't want to flip a bit
somewhere in the middle of a compressed data stream, of course. :-)

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Prepared statements considered harmful

2006-08-31 Thread Jeroen T. Vermeulen

On Thu, August 31, 2006 18:56, Peter Eisentraut wrote:

> With time, it becomes ever clearer to me that prepared SQL statements are
> just
> a really bad idea.  On some days, it seems like half the performance
> problems
> in PostgreSQL-using systems are because a bad plan was cached somewhere.

Is there any kind of pattern at all to this problem?  Anything
recognizable?  A few typical pitfalls?

Without knowing much of the internals, I could imagine [waves hands in
vague gestures] other options--something like recognizing major changes
that upset the cost functions that went into generating a plan, and
invalidating the plan based on those; or noting bad estimates somehow as
they become apparent during execution, and annotating the plan with a
"this assumption was a bad idea" marker so you'll do better next time.

I guess you can't go far wrong if you re-define "prepared" to mean merely
"pre-parsed," but it sounds like such a waste of opportunity...

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Thu, August 31, 2006 21:41, Phil Frost wrote:

>> Is there any kind of pattern at all to this problem?  Anything
>> recognizable?  A few typical pitfalls?
>
> Frequently I have found preplanning will result in a horrible plan
> because it is assumed parameters may be volatile while in practice they
> are literals. Here is a function from my database:

That's a very common thing in processor design as well, and there's a
standard trick for it: the saturating two-bit counter.  It tends to work
pretty well for branch prediction, value prediction etc.  Usually it's the
first thing you reach for, so of course somebody may already have tried it
here and found it didn't work.

In this particular case it might be applied something like this: for each
parameter in a prepared statement you cache a predictor value, plus a
"confidence counter" saying (more or less--see below) how many times in
succession that value has repeated.  Let's say each of the counters count
from 0 to 3 inclusive, with its confidence threshold right in the middle,
between 1 and 2.

On every invocation, you check each parameter value against the
corresponding predictor value.  If it's identical, you increment its
counter (provided it can be incremented any further).  If it isn't, you
decrement its counter, and if the counter ends up below its confidence
threshold, you replace the predictor value with the new parameter value.

Then, whenever any new planning needs to be done (I'll get to that in a
moment), you see which counters are above their confidence thresholds.  In
your new planning you assume that all parameters with confident
predictions will remain pseudo-constant for the next few invocations.

Of course there's a problem when parameters do not match predicted values.
 That's where having one or two backup plans could come in handy.  You
could keep your original, fully-generalized plan around.  If plans are
cheap enough to store, you could try to keep a cache of old plans for the
same query.  The great thing about keeping some backup plans around is
that a pseudo-constant parameter can have a different value once in a
while, then flick back to its old habits without invalidating all your
efforts.  Your usually-unused search fields are a good example.  You may
also have two stable parameter patterns with different sets of
pseudo-constants competing for your attention.

It's not perfect, and it clearly has its pathological cases--but if it
works well enough overall, the bad cases could be caught and handled as
exceptions.  Confidence counters can be tweaked to lose confidence more
easily than they gain it, or vice versa.  Some meta-confidence scheme may
catch the worst offenders.  I won't go into that now--first I'll shut up
and wait for others to point out what I've missed.  :)

Jeroen

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 16:53, Martijn van Oosterhout wrote:

> Interesting thought. It might be worth trying. But my big question: is
> all this testing and counting actually going to be faster than just
> replanning? Postgresql's planner is not that slow.

In the best case (which of course would have to be very frequent for any
of this to matter in the first place) it's mainly just a short loop
comparing the call's parameter values to their counterparts stored with
the plan and update those two-bit confidence counters.  You wouldn't
*believe* how simple you have to keep these things in processor
architecture.  :-)

> The thing is that number of possible plans is going to be proportional
> to factorial(number of tables). Once you have 3 tables you're going to
> have at least a dozen possible plans, probably more. What the best plan
> is depends strongly on what the parameters are.

Of course.  That's the whole point: to end up with a small but effective
subset of all those possible plans.  I'd guess that you could cover even
most of the nasty cases with a maximum of three plans or so per prepared
statement, including the original fully-generalized one.  The plans could
be replaced on an LRU basis, which isn't very costly for three or so
entries.

> Anyway, your plan assumes that you have information to work with. The
> current system plans prepared queries with no information at all about
> parameters and people are advocating to keep it that way. I think a
> good first step would be the plan on first execution, like Oracle does.

Yes, delaying things a bit can help a lot sometimes.  That's also what JIT
compilers in JVMs do, for instance.  FWIW, libpqxx doesn't prepare
statements until they're first called anyway.

But if this choice to discard parameter information is exactly what causes
a lot of the bad plans in the first place, as Peter says, what's wrong
with putting it to use instead?  For those cases, you're pretty much
screwed by definition as long as you fail to do so.  And it's not like
what I'm suggesting is very difficult!

The real question is whether it's worthwhile.  To find that out, we'd need
to estimate four factors: coverage (how often you'd get a useful
prediction), accuracy (how often that prediction would be accurate), cost
of misprediction (near-zero compared to current situation, assuming we
keep the generalized plans handy), and savings for correct prediction (in
our case, benefit of planning for a constant instead of a variable minus
the cost of re-planning which you say isn't very expensive).

Based on what Peter and you tell me about cost, the main worries here are
coverage and accuracy.  Coverage and accuracy can be extracted (and
tweaked!) relatively easily if we have logs of prepared-statement
executions in a wide variety of real-life applications.  Listings of
consecutive prepared-statement invocations (statement name plus parameter
values) are all that's needed.

Do we have any logs like that?  If we do, I'll be more than happy to run
some simulations and see if the idea shows any promise.  Like I said,
there's every chance that it doesn't.  It was just an off-the-cuff
suggestion and if it's no good I'll have no problems saying so.  But
there's not much point sitting around arguing over theoretical merits if
they're that easy to quantify!

Jeroen

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 21:30, Tom Lane wrote:

> Yeah.  One of the reasons the planner is acceptably fast is that it is
> aggressive about discarding candidate plans as soon as they are clearly
> inferior to other plans.  Tracking multiple plans that might be optimal
> under varying assumptions about the query parameters would make things
> exponentially slower.

AFAICS the planner shouldn't be affected at all--it'd just be invoked more
often as and when the need for new plans became apparent.  Not
"exponentially" (that's an overused word anyway) but "proportionally" to
that.

I've been assuming that once you have a plan, storing it is not very
expensive.  If, say, doubling the number of plans stored with a session's
prepared statements incurs some serious cost (apart from the planning
itself, of course) then that changes things.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Prepared statements considered harmful

2006-09-01 Thread Jeroen T. Vermeulen

On Fri, September 1, 2006 22:14, Gregory Stark wrote:

> I think the slow part is trying to figure out whether to count the current
> call as a hit or a miss. How do you determine whether the plan you're
> running
> is the best plan without replanning the query?

The question of knowing which plan is best _based on what's in the actual
tables_ would be unsolved just as it always was.  The scheme addresses
only the opportunity to optimize for pseudo-constant parameters.  It
treats the existing planner as a black box.  If you find a solution to the
problem of inaccurate statistics, it'll probably be more or less
orthogonal to what I'm describing: you could have one or the other, but
combining them shouldn't be much harder.

I don't think telling hits from misses would be all that hard.  Let's say
you're having a prepared statement called, and you're evaluating a
candidate plan.  Each parameter is in one of two sets: those "predicted"
by the plan to have certain values (let's call them P), and those "not
predicted" by the plan because their confidence counters were below the
threshold (I'm tempted to call this set NP, but let's make it Q instead). 
Whether a parameter is in P or in Q can be derived from its confidence
counter.  In my previous example, you just take its most-significant bit.

 * For any parameter in P, if the actual value does not match the plan's
prediction, you have a miss.  Can't use this plan.  Use another if you
have one that applies (such as your regular old non-specialized
plan--that always applies), or if not, write a new one!

If you get through this without finding a mismatch, congratulations: you
have a hit.  The plan you're looking at is applicable to your call.  But
now we see if we can do better:

 * For any parameter in Q, if its value would have been predicted
correctly but its counter was below the confidence threshold, you
increment the counter.  If that lifts the counter above the threshold,
you have room for improving on this plan.  It means there's a good chance
you can re-plan for the case that this parameter is also a
pseudo-constant, without the effort being wasted.  Of course you could
also set a minimum number of invocations between re-plannings to get a
more long-term view (e.g. different parameters being recognized as
pseudo-constants in subsequent calls--you may not want to re-plan for
each of those calls).

So which plan do you execute if you have more than one applicable
candidate?  We can see what works well.  As a starter I would definitely
pick the one with the larger P (smaller Q), breaking ties in favour of the
most recently generated plan.  I'm assuming we only want one plan for a
given P.

We'd probably want to limit the number of candidate plans per statement to
some very small, fixed number--somewhere between one and four, I'd say; or
maybe one generalized plan plus up to two specialized ones.  With numbers
like that, none of this should be very expensive.  A simple linear match
against 1-4 candidates may be more effective than any attempt to be
clever.

I must admit I haven't thought through all of the consequences of caching
more than one specialized plan per statement.  For example, we could give
every cached plan its own set of confidence counters, and match an
incoming invocation against each of those; or we could keep just one "most
likely" plan with its associated predictor state, and only consider
previously generated plans if we either miss or find room for improvement
in the predictor.

Jeroen

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

I've rigged up a simple simulator for the scheme I described for detecting
pseudo-constant parameters to prepared statements.  It withstands simple
tests, and it neatly picks up cases where some parameters are
pseudo-constants and others aren't--even if some of them are more "pseudo"
while others are more "constant."

What I need now is some realistic test data.  Does anyone have realistic
SQL logs that use prepared statements?

For now, I'll summarize some results I got from randomized input data.  I
used very simple traces, with 11 prepared statements, each taking a
different number of parameters (0 through 10, inclusive).  All calls were
uniformly randomized.  I used LRU replacement of cached plans, with up to
4 retained plans per statement.  Confidence counters ran from 0 to 3
inclusive, with the confidence threshold lying between 1 and 2.

Per simulation run, 20,000 statement invocations were processed.  Runs of
20,000 took about 3.5 seconds of wall-clock time each, or 0.175
milliseconds per statement, on a lightly-loaded 1.8 GHz 32-bit Athlon XP. 
That's for simulation in Python 2.4, with no effort to optimize and no
precompilation, and several lines of information composed and printed to
/dev/null for every invocation.  So I think the overhead of the matching
and comparing that the algorithm does is not a performance worry.

In my first test, parameters were uniformly-distributed integers in the
range [0, 999].  Over this test, 104 plans were generated for the 11
plans, for an average 192 calls per generated plan.  Only 133 calls out of
20,000 used optimized plans, in this case optimizing out only one
pseudo-constant each.

When parameters were made to follow the normal distribution with mean 500
and standard deviation 100 (rounded to integers), the number of generated
plans went up to 305 as more patterns were recognized, and of course the
number of calls per generated plan dropped to 65.  Of the 20,000
invocations here, 770 used plans with one parameter optimized away, and 2
used plans with two.

These results don't look very good, but bear in mind this is for
randomized data.  Can't expect to exploit many patterns in random inputs! 
Real-life use is probably going to be much more favourable.  If we want to
guard against fluke patterns in highly-variable parameters, we can always
increase the range of the confidence counters.  That would make the
predictor more skeptical when it comes to accepting reptitions as
patterns.  Just how we tune the counters would be a tradeoff between the
cost of additional planning and the benefits of optimizing out more
parameters.

So once again, does anyone know of any realistic logs that I can use for
more useful simulations?


Jeroen



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 18:41, Gregory Stark wrote:

> I'm confused, what exactly are you trying to predict? Whether each
> parameter
> will be some cached value? Or whether the cached plan was correct?

That's described in more detail in a separate thread ("prepared statements
considered harmful").  In a nutshell, the algorithm detects pseudoconstant
parameters to prepared statements, and keeps a small set of different
plans optimized for recurring combinations of constants.

>> So once again, does anyone know of any realistic logs that I can use for
>> more useful simulations?
>
> You might look at the DBT test suite, iirc the TPCC spec it implements
> intentionally mixes random queries with predictable queries.

I considered the TPC benchmarks, but they're still benchmarks.  When seen
from one angle they try to look like real applications, but I'm worried
that in testing this algorithm, I may be looking at them from a very
different angle.  I'd still need actual application logs to validate them!

Jeroen

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 21:52, Gregory Stark wrote:

> I read that but apparently I misunderstood it since it would not have been
> doable the way I understood it. I thought you wanted the predictor bits to
> correspond to particular plans. If a plan was "wrong" then you marked it
> as a
> bad guess. I don't think that can be made to work though for the reasons I
> stated then.

Oh, sorry--I guess I haven't been too systematic about it.  In the
algorithm's current incarnation, the confidence counters don't mark
anything as bad ideas, as such.  Instead, whenever a new invocation has a
set of parameters that are (1) correctly and (2) confidently predicted by
the counters, the predictor decides that it's probably a good idea to plan
for calls with that particular set of parameters built in as constants.

Assuming we didn't already have a plan for the particular combination of
values at hand, the algorithm generates a new one.  The new plan is
optimized on the assumption that those predicted parameters are constants.

We keep a small cache of recently-used plans, possibly including the
original plan where all parameters are truly variable.  Every plan also
remembers a list of the predicted parameter values, so on any next call,
we can check whether a particular cached plan actually applies to the
call.  If it doesn't (because of a mismatch between the incoming
parameters and the plan's assumed pseudo-constants), we just pick another
plan.

If multiple cached plans can be applied to a given call, we prefer the one
that optimizes away the most parameters.  Age is used as a tiebreaker, on
the assumption that more recent planning information is likely to be more
accurate.

The tricky part is deciding when to generate a new, more specialized plan
when we already have a matching one that may not be optimal.  Without this
step we'd never get beyond that first, completely generic plan--it applies
to every call.  The way I've approached it is this: when the predictor's
current state correctly and confidently predicts more of the invocation's
parameter values than any of the cached plans did, then it's time to
generate a new plan.

So let's say your statement has a parameter x that's always the same
value, say x==0, and another parameter y that's a randomly alternating
Boolean value, and another one z that varies randomly between lots of
values.  What order they're in doesn't matter, and they needn't be the
only parameters.  You're probably going to see four plans generated for
this example:

1. Either on the first call or during definition, you get the generic
plan.  This is the same plan you'd get in the existing backend, with
placeholders for variable x, y, and z.

2. Pretty soon, the algorithm is going to detect that x is always zero. 
It will generate a new plan, substituting the constant value 0 for x, and
hopefully getting better optimization because of it.

3. Sooner or later (probably fairly soon) you'll see a run of consecutive
calls where y happens to be "true."  A new plan is generated with the
assumption that y==true.  The new plan will also still assume that x==0.

4. The same is going to happen for y==false.  Yet another specialized plan
is generated.  If we keep up to 3 plans per statement, say, then this new
plan overflows the cache.  The least recently used plan is flushed to make
room for the new one--in this case, the generic one because we haven't
seen any cases where x!=0 recently.

More complex scenarios will also happen, of course, such as "if y==true
then x will usually be 0, but otherwise x will be highly variable" or "if
y==true then x is pseudo-constant and z is highly variable, but if
y==false then it's the other way around" or "if y==false then z is usually
the empty string," and so on.  The predictor as I've simulated it is "too
dumb to be intimidated" by the complexity.  It should work reasonably well
for all those scenarios, assuming of course that its cache is large enough
to remember the most frequent patterns.

Right now I'm using the plans' time since last use as the only eviction
criterion when the cache overflows.  There may be a better policy; the one
Achilles heel of LRU is the "big loop" where every cache entry is used
once, then evicted shortly before it is needed again.  (If the loop is so
big that entries are flushed long before they're needed again, well, then
it's just a big job and you stop blaming LRU :-)

> But if you have something working clearly that's not what you're doing. So
> what are you doing? Storing up a list of arguments seen for each parameter
> when executed and use the predictor bits to determine if any of those
> arguments are constants? Storing up a list of selectivity estimates?

The former.  I'm keeping a single predictor with a single "more or less
last-seen value" per parameter; plus a checklist of pseudoconstants for
every cached plan.  It's pretty simple, really, with no cost functions or
spanning trees or other intelligent logic--and certainly

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 23:28, Jeroen T. Vermeulen wrote:
> On Sun, September 3, 2006 21:52, Gregory Stark wrote:

>> I read that but apparently I misunderstood it since it would not have
>> been
>> doable the way I understood it. I thought you wanted the predictor bits
>> to
>> correspond to particular plans. If a plan was "wrong" then you marked it
>> as a
>> bad guess. I don't think that can be made to work though for the reasons
>> I
>> stated then.

Come to think of it, I still haven't answered your question very clearly...

I keep one predictor for any given statement.  The predictor keeps two
pieces of state *per statement parameter*:

1. The predictor value (usually the last-seen value for that parameter,
though it does take a few mismatches to make it drop a value that was
repeated a lot).

2. A saturating counter measuring confidence in the current predictor
value.  As long as this counter is below the confidence threshold, the
predictor value has no other meaning than "let's see if this value comes
back."

No information flows between these pieces of state.  All logic in the
predictor itself is entirely local to individual parameters.  But any new
plan is generated based on a "snapshot" of all predictor values that (i)
match their respective parameter values in the ongoing call, and (ii) have
their counters above the confidence threshold.  It is that combination of
parameters, for that combination of values, that is taken as a set of
pseudo-constants.

The cache can hold any number of plans optimized for the same combinations
of parameters but with different values; or for different combinations of
parameters but the same values; subsets and supersets of parameters with
either the same or different values; completely disjoint sets; anything. 
The cache neither knows nor cares.  The only thing that *won't* happen is
two plans in the same cache covering the same set of parameters with the
same set of values--because there is never any need to generate that
second plan while the first is still in cache.

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Sun, September 3, 2006 23:52, Tom Lane wrote:

> What exactly do you mean by "optimize away a parameter"?  The way you
> described the mechanism, there are no parameters that are "optimized
> away", you've merely adjusted selectivity predictions using some assumed
> values.

I'm using "optimized away" as shorthand for "replaced with a literal
constant in the statement definition used to generate the plan."  So if a
parameter $n is found to be, say, always equal to the string 'foo', then
we might want to generate a specialized plan as if the statement's
definition contained the literal string 'foo' wherever it really says $n. 
I've been calling that "optimized for $n=='foo'" or "$n is optimized away
in this plan."

>  Actually converting a parameter to a constant is a whole
> 'nother matter --- it allows constant-folding for example.  But then you
> *cannot* use the plan unless there's an exact match to the assumed
> value.  These two approaches provide very different tradeoffs of plan
> quality vs plan specificity, so it makes me uncomfortable that you're
> failing to clarify what you mean.

Right.  When I said "optimized for" a certain parameter value, I meant
actual substitution the whole time.  I'm sorry if I didn't make that
clear; it seemed so basic that I must have forgotten to mention it.  I
guess the principle would also work otherwise, but it's intended to allow
constant folding.

So for any given statement, there would be a cache of frequently-needed
plans for different sets of constant substitutions.  As you point out, a
call could only use a plan if the plan's substitutions were consistent
with the call's parameter values (but within that constraint, the more
substitutions the merrier).  That's why I talked so much about comparing
and matching: that part is for correctness, not optimization.

As I've said before, all this falls down if there is a significant cost to
keeping one or two extra plans per prepared statement.  You mentioned
something about "tracking" plans.  I don't know what that means, but it
sounded like it might impose a runtime cost on keeping plans around. 
Could you elaborate?

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Optimizing prepared statements

2006-09-03 Thread Jeroen T. Vermeulen

On Mon, September 4, 2006 03:56, Gregory Stark wrote:

> Thanks, that cleared things up enormously. I'm trying to figure how it
> would
> react to some of the queries I've written in the past. In particular I'm
> thinking of queries like
>
> WHERE (? OR category = ?)
>   AND (? OR cost < ?)
>   AND (? OR description like ?||'%')
>
> Where I then pass flags in from the application to indicate which search
> constraints to apply. If it notices that most searches are for a
> particular
> set of constraints it would be able to cache plans with the unused
> constraints
> removed.

Right.  That's pretty much the problem as Peter originally described it, I
think.

> It would not however be able to notice that the last parameter never
> contains a % and therefore can use an index scan.

If I understand you correctly, then no.  If the algorithm sees highly
variable values for that last parameter, it will never decide to assume
that that parameter will never contain '%'--and I'm not sure how that
could be done safely.

I do see two glimmers of hope, however:

1. If that last parameter is usually some specific value, then you'll
probably end up using specialized plans with that specific value in the
parameter's place.  If that value is a string without wildcards, you can
use your index on description (assuming you have one).  If it's '%' or
null, the optimizer can decide to ignore the "like" clause.  It's only
when the scheme finds that it cannot predict what the parameter's value is
going to be that you get the generic, poorly-performing code.

2. Once we have a predictor, and assuming it works, it could be tied in
with the planner a bit more.  As I believe Tom said, the planner can't
afford to chase down lots of scenarios just in case they ever happen.  But
when a parameter is used only for simple matches or inserts on non-indexed
columns, for example, the planner might find in the course of its normal
activities that there's nothing useful it can do with that parameter and
deliver this information with its plan, so that the predictor can ignore
the parameter.

> I'm also wondering how this interacts with plan stability. Obviously the
> direct effect is to throw out any chance of it. But in the long run they
> may be two complementary sides of the same thing.

Well, it'll cause some plans to be re-generated, surely.  But the
impression I've gotten from the discussion so far is some that plans were
getting too old anyway.

Jeroen

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Optimizing prepared statements

2006-09-04 Thread Jeroen T. Vermeulen

On Mon, September 4, 2006 23:03, Tom Lane wrote:

> Ah.  I think you're confusing the spectators by using "predict" when you
> should say "match".  You're looking for previously generated plans that
> have assumed parameter values matching the current query --- saying that
> the plan "predicts" a parameter value is just a really poor choice of
> wording.

I guess so...  It's been over a decade since I last busied myself with
database optimization, so I don't know any of the jargon.  Had to use
processor architecture jargon instead.

So with that out of the way, can anyone think of some good real-life
examples of prepared statement usage that I can test against?  Suggestions
I've had include TPC, DBT2 (based on TPC-C), and pgbench, but what I'm
really looking for is traces of invocations by real applications.

If we don't have a body of application traces available, can anyone think
of a convenient, nonintrusive way I could extract them out of applications
I'm running?  If there is, I could write a filter to eliminate irrelevant
information.  The filter would ignore everything other than prepared
statement invocations.  It could randomize statement names, strip out
their definitions, hide the ordering between different statements, and
replace parameter values with meaningless numbers; so it would be
relatively safe for others to volunteer traces of their own applications. 
None of those transformations would affect my simulation results.

Jeroen

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] 7.5 beta version

2004-04-11 Thread Jeroen T. Vermeulen

On Mon, Apr 05, 2004 at 09:38:13PM -0400, Bruce Momjian wrote:
> 
> I don't think you can mix libs/binaries from different compilers.

As long as it's plain old C, and the compilers adhere to the platform's
ABI standards, why not?  Even if you compile the C code using a C++
compiler, as in this case, any C structs will be PODs and so should be
compiled according to the C layout rules.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Sun, Apr 11, 2004 at 10:21:30PM -0400, Bruce Momjian wrote:
 
> I was not sure if Win32 had standard alignment for C.

Good point.  There's standards, and then there's Windows.  It's possible
that separate "tight-packing" and "regular" pragmas are used there, just
for structs that are expected to be used across linkage boundaries.


Jeroen


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Mon, Apr 12, 2004 at 11:55:45AM -0700, Dann Corbit wrote:

> 1.
> The C language does not define alignment of structs.

Platform ABI standards do, though (hence the "as long as it adheres to..."
clause in my previous post).  Whether it's in the C language or in the
platform's ABI standards is important when writing portable code; it should
not matter when compiling an existing product--provided that the platform
defines the ABI unambiguously.  The latter is what's in question here.

> The C language does not specify that an integer shall be the same size
> on two different platforms.

That is not a problem when linking across compilers, only when linking
across ABIs.  I would expect that doing so would fail at link time.
Besides, as a practical matter, even the 64-bit platforms current today
normally specify that int is a 32-bit type so this should not come into
play in this case, even when linking e.g. Itanium code to x86 code.

> The C language does not specify that IEEE arithmetic must be used or
> even that a double shall be able to represent a value larger than a
> float.

Again, I doubt this is relevant in this case where the problem is linking
across compilers on the same platform, not across platforms.

> Mingw GCC is both a C and a C++ compiler.  Fortunately, the C compiler
> is used, because there is no way that the code base would compile as C
> code.  A trivial example to show why not is the frequent use of the C++
> keyword 'new' in the code base.  Here is an example:

With gcc at least, the matter is even simpler than that.  It selects an
appropriate front-end (C, C++, ...) based on the names of the source files
being compiled.  This may break down at link time however; g++ links to
the C++ standard library, which gcc by default doesn't.

Jeroen

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [HACKERS] 7.5 beta version

2004-04-12 Thread Jeroen T. Vermeulen

On Mon, Apr 12, 2004 at 12:35:15PM -0700, Dann Corbit wrote:

> I do know of important differences in compilers in this regard.  You can
> (for instance) have 80 bit floating point on one compiler using double
> but it is only 64 bits on another.

But in the case of x86 (among others) that's the in-register
representation, no?  IIRC they are stored to memory as 64-bit doubles at
best.

> The point being that there is no such thing as a binary interface for
> alignments or data types that is defined by the C or C++ ANSI/ISO
> language standards.  If there is another standard, I would like to hear
> about it.

That depends on the platform vendor.  Which depending on the platform may
actually be whoever specified the CPU architecture and/or whoever supplied
the OS.  As you say, compilers may deviate from it although in many cases
it would render them useless.

In this particular case, it would most likely be Microsoft as the dominant
{OS,compiler,everything else} vendor.  I *think* (but I'm not sure) that
Microsoft set an ABI even at the C++ level (as Intel also did with the
Itanium, BTW), although it's more common to specify the C level only.

In C++, ABI compatibility is normally protected through a side effect of
name mangling.  By maintaining different name mangling schemes for
different ABI conventions, compiler vendors ensure that object files will
refuse to link to other object files that adhere to different ABIs.

> Here is my puzzlement...
> If I compile a PostgreSQL database on some 64 bit machine, I should be
> able to access it from a 32 bit machine.  For instance, I can access
> DB/2 on our 3090 or Rdb on our Alpha from a 32 bit workstation and I
> have no problems of this nature.  Surely it is an issue with PostgreSQL
> that has been recognized before.

I would say yes, definitely!  That part is not in question here, only the
linking-across-compilers part.  But see below.

> If I change compilers or if I even completely change architectures it
> should not matter.  The interface to the database should be architecture
> independent.  Said another way:
> I should have no concerns about what sort of architecture the server is
> on or what compiler was used.

Unless you use the binary mode of data transfer perhaps; I think that's been
rationalized in 7.4 and is now portable.  No idea what happens if you
convert tables written in an older version (say, 7.3) to 7.5 and then
read them from a wildly different platform than you wrote them from, but
that may be a bit far-fetched.

Jeroen

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [HACKERS] 7.5 beta version

2004-04-14 Thread Jeroen T. Vermeulen

On Wed, Apr 14, 2004 at 12:22:18AM +0200, Kurt Roeckx wrote:

> > But in the case of x86 (among others) that's the in-register
> > representation, no?  IIRC they are stored to memory as 64-bit doubles at
> > best.
> 
> You also have "long double"s on some compilers which could be 80 bit.
 
Actually, they're a supported feature in both C99 and C++ IIRC.  But I
still suspect they're not *actually* 80 bits, at least not in memory.


> We gave up trying to make C++ dlls on windows because of ABI/name
> mangling problems, never tried it again though.
> 
> The compilers from Microsoft and Borland atleast aren't
> compatible.

But that shows up as link errors, not at runtime, right?


Jeroen


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [HACKERS] New COPY commands in libpq

2004-04-30 Thread Jeroen T. Vermeulen

On Fri, Apr 30, 2004 at 06:12:35AM -0700, Tony Reina wrote:

> CString cmd, msg;

> cmd.Format("1\t\2\t{3,4,5}\n");
> * PQputCopyData(conn, cmd, sizeof(cmd));
> cmd.Format("\\.\n");
> *   PQputCopyData(conn, cmd, sizeof(cmd));
> *   PQputCopyEnd(conn, msg);

> Old C++ code works, new stuff doesn't. Only line that have changed are
> *'d.

I'm not surprised.  CString is clearly a class, and cmd is an object of
that class.  Apparently that class has an implicit conversion operator to
char * (which is a striking example of a Bad Idea--kids, don't try this
at home!) but nonetheless, sizeof() should still give you the size of the
object, *not* the size of the string it represents!

You might try porting your code to libpqxx, which is C++-native and should
make large swathes of this sort of code unnecessary.

Jeroen

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [HACKERS] New COPY commands in libpq

2004-05-01 Thread Jeroen T. Vermeulen

On Sat, May 01, 2004 at 02:25:01AM -0700, Tony Reina wrote:
> > You might try porting your code to libpqxx, which is C++-native and should
> > make large swathes of this sort of code unnecessary.

> I've seriously considered it (along with the npgsql library), but am
> not really sure as to what the advantage(s) would be considering that
> (with the corrections suggested) my code works now with libpq.

It depends.  It takes a lot of debugging out of your hands because the
boiler-plate stuff like error checking goes into the library, rather than
being rewritten all over the place in the applications.  Of course if
your current code works just fine, there's no reason to change anything
so drastic.

> Would there be any improvement in speed with libpqxx? Does libpqxx
> make use of the SSL encryption? How hard is it to link to the SSL
> connections for Postgres commands?

Don't expect any speed improvements per se; libpqxx is a layer on top of
libpq.  OTOH some performance features like COPY and limited non-blocking
access (no support for select()ing on multiple file descriptors yet) become
much more easily accessible.

Frankly I don't recall ATM just how libpq deals with SSL.  Don't have the
opportunity to look it up just now.

Jeroen

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

[HACKERS] PREPARE and transactions

2004-06-23 Thread Jeroen T. Vermeulen

We were discussing prepared statement support for libpqxx just now (Bruce,
Peter Eisentraut & myself are manning the postgres booth at LinuxTag 2004
in Karlsruhe, Germany), when we ran into a problem that came up two months
ago.  That discussion follows:

Post by Alvaro Herrera:
> Hackers,
> 
> Is this expected?  If so, why?  I'd expect the prepared stmt to be
> deallocated.
> 
> alvherre=# begin;
> BEGIN
> alvherre=# prepare tres as select 3;
> PREPARE
> alvherre=# rollback;
> ROLLBACK
> alvherre=# execute tres;
> ?column? 
> --
> 3
> (1 fila)

Followup by Tom Lane:
> prepare.c probably should have provisions for rolling back its state to
> the start of a failed transaction ... but it doesn't.
> 
> Before jumping into doing that, though, I'd want to have some
> discussions about the implications for the V3 protocol's notion of
> prepared statements.  The protocol spec does not say anything that
> would suggest that prepared statements are lost on transaction rollback,
> and offhand it seems like they shouldn't be because the protocol is
> lower-level than transactions.

Now, here's a scenario that has us worried:

BEGIN
  PREPARE foo AS ...
  ...   [error]
  DEALLOCATE foo[fails: already aborted by previous error]
ABORT
BEGIN
  PREPARE foo AS ...[fails: foo is already defined!]
  EXECUTE foo   [fails: already aborted by previous error]
COMMIT  [fails: already aborted by previous error]
 
You could say that the DEALLOCATE in the first transaction should have
been outside the transaction, i.e. after the ABORT.  But that would mean
that the client is expected to roll back, manually, individual changes
made in an aborted transaction.  If that's what we expect from the client,
what's the point in having transactions in the first place?

Lots of variations of the scenario spring to mind.  Imagine the second
transaction were not a transaction at all: the second PREPARE would fail,
and the EXECUTE may go execute the wrong statement.

A partial fix would be to allow identical redefinitions of a prepared
statement, optionally with reference counting to determine when it should
be deallocated.  But instances of the same transaction may want to include
a pseudo-constant in the fixed part of the query text that changes between
instances of the transaction.

Even if the spec doesn't help, I think a statement prepared within a
transaction should definitely be deallocated at the end of the transaction.
If it turns out that this wastes a lot of opportunities for reuse, the
prepared plans can always be cached across definitions.


Jeroen


---(end of broadcast)---
TIP 8: explain analyze is your friend

1 2 >

1 - 100 of 176 matches

Mail list logo