Proposal: UPDATE/DELETE ... WHERE OFFSET n OF cursor_name, was: Re: New ECPG idea, was: Re: [HACKERS] ECPG FETCH readahead

2013-09-16 Thread Boszormenyi Zoltan

Hi,

2013-08-17 13:02 keltezéssel, Boszormenyi Zoltan írta:
[snip, discussion of WHERE CURRENT OF in the ECPG client lib]

I had a second thought about it and the client side caching and
parser behind the application's back seems to be an overkill.

Instead, I propose a different solution, which is a logical extension of
FETCH { FORWARD | BACKWARD } N, which is a PostgreSQL extension.

The proposed solution would be:

UPDATE / DELETE ... WHERE OFFSET SignedIconst OF cursor_name

I imagine that FETCH would keep the array of TIDs/ItemPointerDatas
of the last FETCH statement.

The argument to OFFSET would be mostly in negative terms,
with 0 being equivalent of WHERE CURRENT OF.

E.g.:

FETCH 2 FROM mycur; -- fetches two rows
UPDATE mytab SET ... WHERE OFFSET -1 OF mycur; -- updates the first row
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- updates current row

or

FETCH 3 FROM mycur; -- fetches two rows, reaches end of the cursor
UPDATE mytab SET ... WHERE OFFSET -2 OF mycur; -- updates the first row
UPDATE mytab SET ... WHERE OFFSET -1 OF mycur; -- updates the second row
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- throws an error like WHERE 
CURRENT OF

or

FETCH 3 FROM mycur; -- fetches two rows, reaches end of the cursor
MOVE BACKWARD 2 IN mycur;
UPDATE mytab SET ... WHERE OFFSET 0 OF mycur; -- updates the first row (now 
current)
UPDATE mytab SET ... WHERE OFFSET 1 OF mycur; -- updates the second row

The cached array can be kept valid until the next FETCH statement,
even if  moves out of the interval of the array, except in case the
application changes the sign of the cursor position, e.g. previously it used
MOVE ABSOLUTE with positive numbers and suddenly it switches to
backward scanning with MOVE ABSOLUTE negative number or vice-versa.

This would solve the only source of slowdown in the client side cursor caching
in ECPG present in my current ECPG cursor readahead patch, there would be
no more MOVE + UPDATE/DELETE WHERE CURRENT OF. On the other hand,
exploiting this proposed feature in ECPG would make it incompatible with older
servers unless it detects the server version it connects to and uses the current
method.

Comments?

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


New ECPG idea, was: Re: [HACKERS] ECPG FETCH readahead

2013-08-17 Thread Boszormenyi Zoltan

2013-08-17 12:08 keltezéssel, Boszormenyi Zoltan írta:

Hi,

I am restarting this old thread... :-)

2012-04-24 10:17 keltezéssel, Michael Meskes írta:

OK, I will implement #2. Another question popped up: what to do
with FETCH ALL? The current readahead window size or temporarily
bumping it to say some tens of thousands can be used. We may not
know how much is the all records. This, although lowers performance,
saves memory.

I would say doing a large fetch in two or three batches won't cost too much in
terms of performance.


Please, don't apply this patch yet. I discovered a rather big hole
that can confuse the cursor position tracking if you do this:
...
That will also need a new round of review. Sorry for that.

No problem, better to find it now instead of after release.

Anyway, I moved the patch to 2012-next (I hope I did it correctly) so 2012-1
can be closed. Let's try to get this patch done in the next commit fest.

Michael


I had time to look into this patch of mine again after about 1.5 years.
Frankly, this time was too long to remember every detail of the patch
and looking at parts of the patch as a big entity was confusing.

So I started fresh and to make review easier, I broke the patch up
into small pieces that all build on each other. I have also fixed quite
a few bugs, mostly in my code, but some in the ECPG parser and
the regression tests as well.

I have put the broken up patchset into a GIT tree of mine at GitHub:
https://github.com/zboszor/ecpg-readahead/
but the huge compressed patch is also attached for reference.
It was generated with

$ git diff 
221e92f64c6e136e550ec2592aac3ae0d4623209..870922676e6ae0faa4ebbf94b92e0b97ec418e16


ECPG regression tests are now Valgrind-clean except two of them
but both are pre-existing bugs.

1. ecpg/test/compat_informix/rfmtlong.pgc points out a problem in
   ecpg/compatlib/informix.c

==5036== 1 errors in context 1 of 4:
==5036== Invalid read of size 4
==5036==at 0x4E3453C: rfmtlong (informix.c:941)
==5036==by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
==5036==by 0x4006BE: main (rfmtlong.pgc:45)
==5036==  Address 0x60677d8 is 24 bytes inside a block of size 25 alloc'd
==5036==at 0x4C28409: malloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5036==by 0x4E34268: rfmtlong (informix.c:783)
==5036==by 0x4007DA: fmtlong.constprop.0 (rfmtlong.pgc:22)
==5036==by 0x4006BE: main (rfmtlong.pgc:45)

The same error is reported 4 times.

2. ecpg_add_mem() seems to leak memory:

==5463== 256 bytes in 16 blocks are definitely lost in loss record 1 of 1
==5463==at 0x4C2A121: calloc (in 
/usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==5463==by 0x4E3E153: ecpg_alloc (memory.c:21)
==5463==by 0x4E3E212: ecpg_add_mem (memory.c:110)
==5463==by 0x4E3542B: ecpg_store_result (execute.c:409)
==5463==by 0x4E37E5A: ecpg_process_output (execute.c:1777)
==5463==by 0x4E38CCA: ecpg_do (execute.c:2137)
==5463==by 0x4E38D8A: ECPGdo (execute.c:2159)
==5463==by 0x400A82: fn (alloc.pgc:51)
==5463==by 0x5152C52: start_thread (pthread_create.c:308)
==5463==by 0x545C13C: clone (clone.S:113)

The last two issue we talked about in this thread are also implemented:
- permanently raise the readahead window if the application sends a
  bigger FETCH command, and
- temporarily raise the readahead window for FETCH ALL commands

The cursor position tracking was completely rewritten, so the client side
properly follows the cursor position known by the backend and doesn't
skip MOVE statements where it shouldn't. The previously known
bug is completely eliminated this way.

Please, review that patch.


I have another idea to make ECPG building on this huge patch.

Currently, UPDATE/DELETE WHERE CURRENT OF has to issue a MOVE
before the command in case the cursor positions known by the application
and the backend are different.

My idea builds on the fact that UPDATE/DELETE RETURNING is present
in all supported back branches.

A mini-parser only understanding SELECT, UPDATE and DELETE should
be added to ecpglib, so

DECLARE cursor CURSOR FOR SELECT ...

and

PREPARE prepared_stmt FROM :query;
DECLARE cursor CURSOR FOR prepared_stmt;

can be analyzed and tweaked behind the application's back.

This is needed to detect whether a query is a simple updatable
scan of a table, and returning errors early to the application if it's not,
without actually sending the UPDATE/DELETE WHERE CURRENT OF
query to the backend.

For the purpose of WHERE CURRENT OF, I would add a ctid
column at the end of the targelist that is treated like resjunk
in the backend when returning data to the application.

So, SELECTs would return the ctid information of the tuples.
The cursor query was a FETCH N with abs(N)1 because of
the readahead. For this reason, the cursor positions known
by the application and the backend are different.

The extra MOVE can be eliminated by replacing

UPDATE table SET ... WHERE CURRENT