Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Hi Tom,


Attached is a completed patch, which I've had no time to test yet, but
I have to leave for the evening right now --- so here it is in case
anyone is awake and wants to poke at it.


The patch was applied correctly only when I reverted Alvaro's first 
patch, so I suppose it was meant to be an alternative to it.


Unfortunately it doesn't solve the invalid alloc request issue.

Should I try Alvaro's second patch that you said not going to work?


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Hi,


Should I try Alvaro's second patch that you said not going to work?


I'll add that this works for me, that's it prevents invalid alloc 
requests to show.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Ok.  I had hoped to reproduce the problem with pristine sources, in
  order to verify that I was able to show it not appearing with my patch.
  However I have been unable to create a situation in which the problem
  appears.  So I attach the patch that I came up with.  Please test it.
 
 On further reflection, this isn't gonna work :-(.  The problem with the
 waste-a-slot approach is that it creates an ambiguity near the offsets
 wraparound point: if you are looking at an mxid with starting offset
 just under 2^32, and you see the next mxid has start offset 1, did your
 mxid include the xid in offset 0 or not?

This is certainly a problem, but I think we can just assume that it did
and cope later with the possibility that it didn't.  Which means that we
should allow GetMultiXactIdMembers() check whether one element is
InvalidTransactionId, and skip it if so.  (AFAICS this should only happen
if the MultiXact members ends just before offset 0).

 I'm currently experimenting with an alternative approach, which leaves
 the nextOffset arithmetic as it was and instead special-cases the zero
 offset case this way:

I think I understand your approach, but I wonder why Matteo didn't find
an improvement with your patch.  Maybe there's a bug on it?

Were you able to create a test case?  I tried several things, including
stopping a backend in the middle of creating a MultiXactId, but no luck
yet.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
La Primavera ha venido. Nadie sabe como ha sido (A. Machado)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Alvaro Herrera
Matteo Beccati wrote:
 Hi,
 
 Should I try Alvaro's second patch that you said not going to work?
 
 I'll add that this works for me, that's it prevents invalid alloc 
 requests to show.

Yeah, the problem with that patch is that there's another, different
race condition, of much lower probability.  So your original problem is
fixed, but there's still a bug.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
Just treat us the way you want to be treated + some extra allowance
 for ignorance.(Michael Brusser)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I think I understand your approach, but I wonder why Matteo didn't find
 an improvement with your patch.  Maybe there's a bug on it?

Yeah, looking at it this morning, I got the retry condition wrong.
It might be fixable but I'm less enthused about it than I was last
night.  Your idea of handling the wraparound ambiguity by ignoring
InvalidTransactionId isn't bad --- I'll take a look at that.

 Were you able to create a test case?  I tried several things, including
 stopping a backend in the middle of creating a MultiXactId, but no luck
 yet.

I've had some success using Tatsuo's new scriptable pgbench:

create table t1(f1 int);
insert into t1 select * from generate_series(1,1000);

create file tscript containing

\setrandom n 1 1000
select * from t1 limit :n for share;

and do, say,

pgbench -c 10 -t 1 -n -f tscript regression

Using CVS tip, this generates failures within a few seconds for me.
If it doesn't for you, try altering the number of processes (-c) and
the setrandom bounds.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Tom Lane
I wrote:
 Your idea of handling the wraparound ambiguity by ignoring
 InvalidTransactionId isn't bad --- I'll take a look at that.

OK, I think this version may actually work, and get the wraparound
case right too.  It hasn't failed yet on the pgbench test case anyway.
Matteo, could you try it on your test case?

regards, tom lane



bintRKBhTBzqW.bin
Description: multixact-3.patch

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  Were you able to create a test case?  I tried several things, including
  stopping a backend in the middle of creating a MultiXactId, but no luck
  yet.
 
 I've had some success using Tatsuo's new scriptable pgbench:

Hmm.  I wasn't able to reproduce it with this on my desktop machine, but
maybe it's because it's slow as hell.  I plugged my notebook however and
I was able to.

Additionally, I can confirm that the problem doesn't manifest with your
latest patch.  I'm running several instances just to be sure.

Thanks,

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
Acepta los honores y aplausos y perderás tu libertad

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Matteo Beccati

Tom Lane wrote:

OK, I think this version may actually work, and get the wraparound
case right too.  It hasn't failed yet on the pgbench test case anyway.
Matteo, could you try it on your test case?


Yes, it's working. The test case ran for a several minutes without errors.

Thank you all :)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 creatingOffsetZero will be a bool that gets set before releasing
 MultiXactGenLock if offset 0 is being returned, and then we clear it
 after updating the slru data structures if we had starting offset 0.

If you're going to have a special flag indicating this couldn't you just have
a special flag indicating that the offset isn't ready yet? Loop until that
flag is cleared instead of looking for offset != 0 at all.

-- 
greg


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 If you're going to have a special flag indicating this couldn't you just have
 a special flag indicating that the offset isn't ready yet? Loop until that
 flag is cleared instead of looking for offset != 0 at all.

Well, the whole idea didn't work anyway :-(.  But I think your proposal
is equivalent to holding the lock throughout CreateMultiXactId, which is
exactly what we're trying to avoid doing ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-28 Thread Alvaro Herrera
Alvaro Herrera wrote:

 Additionally, I can confirm that the problem doesn't manifest with your
 latest patch.  I'm running several instances just to be sure.

Ok, I tested several runs and the problem didn't manifest.  Additionally
I tested that wraparound also worked on at least some cases, by doing

pg_resetxlog -O 4294967200 $PGDATA 
dd if=/dev/zero of=$PGDATA/pg_multixact/members/ bs=8192 count=32

and retrying the test.  I did this several times, with no problems
detected.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
Hay quien adquiere la mala costumbre de ser infeliz (M. A. Evans)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
On Thu, Oct 27, 2005 at 10:59:30AM +0200, Matteo Beccati wrote:
 Hi,
 
 I'm using 8.1beta4 on a development server for a rather db-intensive 
 application. This application has a multiprocess daemon which was 
 working fairly well in past. After some recent changes I started having 
 deadlock problems. While investigating to remove what was causing them I 
 removed some FOR UPDATE clauses (added on 8.0 to prevent other deadlock 
 situations), hoping that the newly added FK share locks would better 
 handle the concurrent access. In fact the deadlock errors went away, but 
 I suddenly started getting some of these:

Backtrace would be nice. I don't suppose your backend is compiled with
debugging? If so, try attaching to the backend and do:

break MemoryContextAlloc if size  10

Obviously something is trying to allocate and negative number of
bytes... 4291419108 = -3548188

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpjeliVszvq1.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Martijn,


Backtrace would be nice. I don't suppose your backend is compiled with
debugging? If so, try attaching to the backend and do:

break MemoryContextAlloc if size  10

Obviously something is trying to allocate and negative number of
bytes... 4291419108 = -3548188


Here is the backtrace, hoping I did it correctly:


Breakpoint 1, Mem
oryContextAlloc (context=0xbfbfd5d0, size=3217020368) at mcxt.c:501
501 {
(gdb) bt
#0  MemoryContextAlloc (context=0xbfbfd5d0, size=3217020368) at mcxt.c:501
#1  0x0812a586 in initStringInfo (str=0xbfbfd5d0) at stringinfo.c:50
#2  0x081303e5 in pq_beginmessage (buf=0xbfbfd5d0, msgtype=84 'T') at 
pqformat.c:92
#3  0x080778b5 in SendRowDescriptionMessage (typeinfo=0x8311420, 
targetlist=0xbfbfd5d0, formats=0x83df088) at printtup.c:170
#4  0x08117200 in ExecutorRun (queryDesc=0x83df040, 
direction=ForwardScanDirection, count=0) at execMain.c:222
#5  0x0818a16f in PortalRunSelect (portal=0x835f018, forward=32 ' ', 
count=0, dest=0x83a7448) at pquery.c:794
#6  0x0818a60e in PortalRun (portal=0x835f018, count=2147483647, 
dest=0x83a7448, altdest=0x83a7448, completionTag=0xbfbfd830 ) at 
pquery.c:646
#7  0x081868cc in exec_simple_query (query_string=0x8310228 SELECT * 
FROM gw_queue_get('7')) at postgres.c:1014
#8  0x08188e4f in PostgresMain (argc=4, argv=0x82dd3d0, 
username=0x82dd3a0 multilevel) at postgres.c:3168

#9  0x08165dbc in ServerLoop () at postmaster.c:2853
#10 0x081672bd in PostmasterMain (argc=3, argv=0xbfbfed3c) at 
postmaster.c:943

#11 0x08131092 in main (argc=3, argv=0xbfbfed3c) at main.c:256


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
On Thu, Oct 27, 2005 at 11:37:09AM +0200, Matteo Beccati wrote:
 Here is the backtrace, hoping I did it correctly:

Dagnammit. I was wondering if that was going to happen. If your
optimisation is up, the values of arguments to the functions don't
display right (look at the rest, they're obviously not correct). While
it's possible there's a bug that early in the output, I wouldn't bet on
it.

The trick (other than turning off optimisation) is to set the
breakpoint a few lines later, like say mcxt.c:504. You can find out by
simply stepping the debugger until p size displays a reasonable
value.

try with: break mcxt.c:504 if size  10

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpRHyajf9TZc.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
Belay that, you should be able to put a breakpoint on errstart or elog
or perhaps errmsg. Much easier...

(I expected the find the answer in the developer FAQ, but it's not
there).

Hope this helps,

On Thu, Oct 27, 2005 at 12:04:45PM +0200, Martijn van Oosterhout wrote:
 On Thu, Oct 27, 2005 at 11:37:09AM +0200, Matteo Beccati wrote:
  Here is the backtrace, hoping I did it correctly:
 
 Dagnammit. I was wondering if that was going to happen. If your
 optimisation is up, the values of arguments to the functions don't
 display right (look at the rest, they're obviously not correct). While
 it's possible there's a bug that early in the output, I wouldn't bet on
 it.
 
 The trick (other than turning off optimisation) is to set the
 breakpoint a few lines later, like say mcxt.c:504. You can find out by
 simply stepping the debugger until p size displays a reasonable
 value.
 
 try with: break mcxt.c:504 if size  10
 
 Hope this helps,
 -- 
 Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
  Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
  tool for doing 5% of the work and then sitting around waiting for someone
  else to do the other 95% so you can sue them.



-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpXyflTQiWOP.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Bruce Momjian
Martijn van Oosterhout wrote:
-- Start of PGP signed section.
 Belay that, you should be able to put a breakpoint on errstart or elog
 or perhaps errmsg. Much easier...
 
 (I expected the find the answer in the developer FAQ, but it's not
 there).

I removed it because it used to be in the main FAQ, and wasn't asked
frequently.  Should it be readded?

---



 
 Hope this helps,
 
 On Thu, Oct 27, 2005 at 12:04:45PM +0200, Martijn van Oosterhout wrote:
  On Thu, Oct 27, 2005 at 11:37:09AM +0200, Matteo Beccati wrote:
   Here is the backtrace, hoping I did it correctly:
  
  Dagnammit. I was wondering if that was going to happen. If your
  optimisation is up, the values of arguments to the functions don't
  display right (look at the rest, they're obviously not correct). While
  it's possible there's a bug that early in the output, I wouldn't bet on
  it.
  
  The trick (other than turning off optimisation) is to set the
  breakpoint a few lines later, like say mcxt.c:504. You can find out by
  simply stepping the debugger until p size displays a reasonable
  value.
  
  try with: break mcxt.c:504 if size  10
  
  Hope this helps,
  -- 
  Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
   Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
   tool for doing 5% of the work and then sitting around waiting for someone
   else to do the other 95% so you can sue them.
 
 
 
 -- 
 Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
  Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
  tool for doing 5% of the work and then sitting around waiting for someone
  else to do the other 95% so you can sue them.
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
On Thu, Oct 27, 2005 at 08:54:57AM -0400, Bruce Momjian wrote:
 Martijn van Oosterhout wrote:
 -- Start of PGP signed section.
  Belay that, you should be able to put a breakpoint on errstart or elog
  or perhaps errmsg. Much easier...
  
  (I expected the find the answer in the developer FAQ, but it's not
  there).
 
 I removed it because it used to be in the main FAQ, and wasn't asked
 frequently.  Should it be readded?

Hmm, depends. It's not asked often, that for sure. Yet everytime it
comes up I keep forgetting if I should be breaking on errstart, errmsg
or something else. One of these days I might just write it on a post-it
note next to my computer.

Maybe not as a seperate question, but in 2.8) What debugging features
are available? add something like below. It'd be easier to point it
out to people on the web instead of saying it time.

- Additionally, if you get an unusual error message, it can be useful
- to get a stack trace to see how it got there. One trick is to attach
- gdb and tell it to break on elog and errstart/errmsg/errfinish
- (whichever is the right one) and you can get a stack trace at exactly
- the point it is dying. Note that DEBUG level message will trigger
- also so you might need to cont a few times to get the error you
- want.
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpDy9gyfTQms.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Martijn van Oosterhout wrote:

Belay that, you should be able to put a breakpoint on errstart or elog
or perhaps errmsg. Much easier...


After several tries, I finally found a way to produce a reliable 
backtrace :)



Breakpoint 4, errfinish (dummy=0) at elog.c:346
346 ImmediateInterruptOK = false;
(gdb) bt
#0  errfinish (dummy=0) at elog.c:346
#1  0x08265896 in elog_finish (elevel=20, fmt=0x831858c invalid memory 
alloc request size %lu) at elog.c:930
#2  0x0827b5cf in MemoryContextAlloc (context=0x85b2238, 
size=4279476584) at mcxt.c:505
#3  0x080b6a16 in GetMultiXactIdMembers (multi=301994, xids=0xbfbfaba4) 
at multixact.c:935

#4  0x080b6271 in MultiXactIdIsRunning (multi=301994) at multixact.c:373
#5  0x0828347d in HeapTupleSatisfiesUpdate (tuple=0x28ccbb40, curcid=13, 
buffer=756) at tqual.c:620
#6  0x0808f724 in heap_lock_tuple (relation=0x8402988, tuple=0xbfbfad10, 
buffer=0xbfbfad0c, ctid=0xbfbfacf0, update_xmax=0xbfbfacec, cid=13,

mode=LockTupleShared, nowait=0 '\0') at heapam.c:2055
#7  0x0814153d in ExecutePlan (estate=0x8574018, planstate=0x8574198, 
operation=CMD_SELECT, numberTuples=1, direction=ForwardScanDirection, 
dest=0x831b978)

at execMain.c:1188
#8  0x081404fb in ExecutorRun (queryDesc=0x8571088, 
direction=ForwardScanDirection, count=1) at execMain.c:230

#9  0x0815ad7e in _SPI_pquery (queryDesc=0x8571088, tcount=1) at spi.c:1558
#10 0x0815ab98 in _SPI_execute_plan (plan=0x8546c18, Values=0xbfbfaf70, 
Nulls=0xbfbfaf30  , snapshot=0x0, crosscheck_snapshot=0x0, read_only=0 
'\0',

tcount=1) at spi.c:1460
#11 0x08158b3c in SPI_execute_snapshot (plan=0x8546c18, 
Values=0xbfbfaf70, Nulls=0xbfbfaf30  , snapshot=0x0, 
crosscheck_snapshot=0x0, read_only=0 '\0',

tcount=1) at spi.c:379
#12 0x08248cd7 in ri_PerformCheck (qkey=0xbfbfcb80, qplan=0x8546c18, 
fk_rel=0x84bdab0, pk_rel=0x8402988, old_tuple=0x0, new_tuple=0xbfbfcef0,
detectNewRows=0 '\0', expect_OK=5, constrname=0x8547448 $2) at 
ri_triggers.c:3141

#13 0x08244576 in RI_FKey_check (fcinfo=0xbfbfcc90) at ri_triggers.c:424
#14 0x082445e5 in RI_FKey_check_ins (fcinfo=0xbfbfcc90) at ri_triggers.c:449
#15 0x0812d2a5 in ExecCallTriggerFunc (trigdata=0xbfbfcf30, tgindx=2, 
finfo=0x85475d8, instr=0x0, per_tuple_context=0x85b2568) at trigger.c:1288
#16 0x0812e349 in AfterTriggerExecute (event=0x84284e0, rel=0x84bdab0, 
trigdesc=0x85470e8, finfo=0x85475a8, instr=0x0, per_tuple_context=0x85b2568)

at trigger.c:2134
#17 0x0812e6f7 in afterTriggerInvokeEvents (events=0x8428118, 
firing_id=2, estate=0x8547018, delete_ok=1 '\001') at trigger.c:2376

#18 0x0812e993 in AfterTriggerEndQuery (estate=0x8547018) at trigger.c:2551
#19 0x0815adf9 in _SPI_pquery (queryDesc=0x8541088, tcount=0) at spi.c:1570
#20 0x0815ab98 in _SPI_execute_plan (plan=0x8531818, Values=0x85234a0, 
Nulls=0x8523490  Ï\024, snapshot=0x0, crosscheck_snapshot=0x0, 
read_only=0 '\0',

tcount=0) at spi.c:1460
#21 0x08158a82 in SPI_execute_plan (plan=0x8531818, Values=0x85234a0, 
Nulls=0x8523490  Ï\024, read_only=0 '\0', tcount=0) at spi.c:336
#22 0x2dc470a6 in exec_stmt_execsql (estate=0xbfbfd330, stmt=0x85287c8) 
at pl_exec.c:2280
#23 0x2dc45312 in exec_stmt (estate=0xbfbfd330, stmt=0x85287c8) at 
pl_exec.c:1076
#24 0x2dc45115 in exec_stmts (estate=0xbfbfd330, stmts=0x8528738) at 
pl_exec.c:991
#25 0x2dc455d5 in exec_stmt_if (estate=0xbfbfd330, stmt=0x852d828) at 
pl_exec.c:1210
#26 0x2dc45218 in exec_stmt (estate=0xbfbfd330, stmt=0x852d828) at 
pl_exec.c:1036
#27 0x2dc45115 in exec_stmts (estate=0xbfbfd330, stmts=0x852d860) at 
pl_exec.c:991
#28 0x2dc44fbb in exec_stmt_block (estate=0xbfbfd330, block=0x852d8a0) 
at pl_exec.c:936
#29 0x2dc446e1 in plpgsql_exec_trigger (func=0x8527018, 
trigdata=0xbfbfd690) at pl_exec.c:562
#30 0x2dc3fcdf in plpgsql_call_handler (fcinfo=0xbfbfd3f0) at 
pl_handler.c:120
#31 0x0812d2a5 in ExecCallTriggerFunc (trigdata=0xbfbfd690, tgindx=3, 
finfo=0x84794d0, instr=0x0, per_tuple_context=0x854fa68) at trigger.c:1288
#32 0x0812e349 in AfterTriggerExecute (event=0x84283a8, rel=0x84969b0, 
trigdesc=0x84790e8, finfo=0x8479488, instr=0x0, per_tuple_context=0x854fa68)

at trigger.c:2134
#33 0x0812e6f7 in afterTriggerInvokeEvents (events=0x8428110, 
firing_id=0, estate=0x8479018, delete_ok=1 '\001') at trigger.c:2376

#34 0x0812e993 in AfterTriggerEndQuery (estate=0x8479018) at trigger.c:2551
#35 0x0815adf9 in _SPI_pquery (queryDesc=0x84640c8, tcount=0) at spi.c:1570
#36 0x0815ab98 in _SPI_execute_plan (plan=0x8467418, Values=0x843c248, 
Nulls=0x843c238   \024, snapshot=0x0, crosscheck_snapshot=0x0, 
read_only=0 '\0',

tcount=0) at spi.c:1460
#37 0x08158a82 in SPI_execute_plan (plan=0x8467418, Values=0x843c248, 
Nulls=0x843c238   \024, read_only=0 '\0', tcount=0) at spi.c:336
#38 0x2dc470a6 in exec_stmt_execsql (estate=0xbfbfda40, stmt=0x8449088) 
at pl_exec.c:2280
#39 0x2dc45312 in exec_stmt (estate=0xbfbfda40, stmt=0x8449088) at 
pl_exec.c:1076
#40 0x2dc45115 in exec_stmts (estate=0xbfbfda40, 

Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 Hmm, depends. It's not asked often, that for sure. Yet everytime it
 comes up I keep forgetting if I should be breaking on errstart, errmsg
 or something else. One of these days I might just write it on a post-it
 note next to my computer.

I always break on errfinish myself.  At one time elog didn't go through
errstart (it may still not, I forget) so errfinish was the only
certainly common point for catching both elog and ereport.  Another
advantage is that by that point, all the error info is set up and you
can inspect it if you want to.

 - Note that DEBUG level message will trigger
 - also so you might need to cont a few times to get the error you
 - want.

Also, control doesn't come to errfinish at all unless the message is
going to be printed, so the DEBUG-message problem goes away.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Matteo Beccati [EMAIL PROTECTED] writes:
 (gdb) bt
 #0  errfinish (dummy=0) at elog.c:346
 #1  0x08265896 in elog_finish (elevel=20, fmt=0x831858c invalid memory 
 alloc request size %lu) at elog.c:930
 #2  0x0827b5cf in MemoryContextAlloc (context=0x85b2238, 
 size=4279476584) at mcxt.c:505
 #3  0x080b6a16 in GetMultiXactIdMembers (multi=301994, xids=0xbfbfaba4) 
 at multixact.c:935
 #4  0x080b6271 in MultiXactIdIsRunning (multi=301994) at multixact.c:373
 #5  0x0828347d in HeapTupleSatisfiesUpdate (tuple=0x28ccbb40, curcid=13, 
 buffer=756) at tqual.c:620

Well, this apparently indicates a bug in the new multixact code, but
there's not enough info here to figure out what went wrong.  Can you
create a test case that will let someone else reproduce the problem?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Tom,


Well, this apparently indicates a bug in the new multixact code, but
there's not enough info here to figure out what went wrong.  Can you
create a test case that will let someone else reproduce the problem?


Unfortunately the error pops up randomly in a very complex app/db and I 
am unable to produce a test case :(


Lat me know what other I can do to help fixing the bug.


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Bruce Momjian

OK, developer's FAQ updated to mention errfinish,

---

Tom Lane wrote:
 Martijn van Oosterhout kleptog@svana.org writes:
  Hmm, depends. It's not asked often, that for sure. Yet everytime it
  comes up I keep forgetting if I should be breaking on errstart, errmsg
  or something else. One of these days I might just write it on a post-it
  note next to my computer.
 
 I always break on errfinish myself.  At one time elog didn't go through
 errstart (it may still not, I forget) so errfinish was the only
 certainly common point for catching both elog and ereport.  Another
 advantage is that by that point, all the error info is set up and you
 can inspect it if you want to.
 
  - Note that DEBUG level message will trigger
  - also so you might need to cont a few times to get the error you
  - want.
 
 Also, control doesn't come to errfinish at all unless the message is
 going to be printed, so the DEBUG-message problem goes away.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
On Thu, Oct 27, 2005 at 03:45:16PM +0200, Matteo Beccati wrote:
 Hi Tom,
 
 Well, this apparently indicates a bug in the new multixact code, but
 there's not enough info here to figure out what went wrong.  Can you
 create a test case that will let someone else reproduce the problem?
 
 Unfortunately the error pops up randomly in a very complex app/db and I 
 am unable to produce a test case :(

Go up a few levels to GetMultiXactIdMembers and type info locals, see
if we can get the values of some of the variables there. Also, if you
can turn the debugging down to -O0, that will make the results in gdb
much more reliable.

It's clear at least that length is negative, but what about the other
variables...

Do you use a lot of subtransactions, function, savepoints, anything
like that?

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpUJL31BntMM.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Matteo Beccati wrote:
 Hi Tom,
 
 Well, this apparently indicates a bug in the new multixact code, but
 there's not enough info here to figure out what went wrong.  Can you
 create a test case that will let someone else reproduce the problem?
 
 Unfortunately the error pops up randomly in a very complex app/db and I 
 am unable to produce a test case :(
 
 Lat me know what other I can do to help fixing the bug.

It would be good to see the contents of MultiXactState.  I suspect
there's a race condition in the MultiXact code.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 17.7, W 73º 14' 26.8
El realista sabe lo que quiere; el idealista quiere lo que sabe (Anónimo)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi,


Go up a few levels to GetMultiXactIdMembers and type info locals, see
if we can get the values of some of the variables there. Also, if you
can turn the debugging down to -O0, that will make the results in gdb
much more reliable.

It's clear at least that length is negative, but what about the other
variables...


I already recompiled all with -O0 to be sure that I was able to have a 
backtrace. This is the full bt:


#2  0x0827b5cf in MemoryContextAlloc (context=0x856bcc8, 
size=4278026492) at mcxt.c:505

__func__ = MemoryContextAlloc
#3  0x080b6a16 in GetMultiXactIdMembers (multi=320306, xids=0xbfbfaba4) 
at multixact.c:935

pageno = 156
prev_pageno = 156
entryno = 819
slotno = 2
offptr = (MultiXactOffset *) 0x286536ac
offset = 4235201
length = -4235201
i = 138425096
nextMXact = 320308
tmpMXact = 320307
nextOffset = 4235265
ptr = (TransactionId *) 0xbfbfab78



Do you use a lot of subtransactions, function, savepoints, anything
like that?


I just removed a subtransaction that I put in a function that was used 
to capture the deadlock errors. That subtransaction was actually useless 
because I removed the FOR UPDATE clause that was causing the deadlock, 
but the alloc error is still there. I'll try to search through the code 
to find some other subtransactions.



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Alvaro,


It would be good to see the contents of MultiXactState.  I suspect
there's a race condition in the MultiXact code.


Good, but... where do I find the contents of MultiXactState? ;)


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Matteo Beccati wrote:
 Hi Alvaro,
 
 It would be good to see the contents of MultiXactState.  I suspect
 there's a race condition in the MultiXact code.
 
 Good, but... where do I find the contents of MultiXactState? ;)

Huh, it should be a global variable.  Try

p *MultiXactState

-- 
Alvaro Herrera  http://www.amazon.com/gp/registry/5ZYLFMCVHXC
Aprender sin pensar es inútil; pensar sin aprender, peligroso (Confucio)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Matteo Beccati wrote:

 #2  0x0827b5cf in MemoryContextAlloc (context=0x856bcc8, 
 size=4278026492) at mcxt.c:505
 __func__ = MemoryContextAlloc
 #3  0x080b6a16 in GetMultiXactIdMembers (multi=320306, xids=0xbfbfaba4) 
 at multixact.c:935
 pageno = 156
 prev_pageno = 156
 entryno = 819
 slotno = 2
 offptr = (MultiXactOffset *) 0x286536ac
 offset = 4235201
 length = -4235201
 i = 138425096
 nextMXact = 320308
 tmpMXact = 320307
 nextOffset = 4235265
 ptr = (TransactionId *) 0xbfbfab78

Whoa.  length = *offptr - offset, which means that the datum stored at
offptr is 0.  I think the problem is that CreateMultiXactId calls
GetNewMultiXactId and then RecordNewMultiXact, and the lock is released
between the calls.  So one backend could try to read the offset before
another one had the time to finish writing it.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
No single strategy is always right (Unless the boss says so)
  (Larry Wall)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Hi Alvaro,


It would be good to see the contents of MultiXactState.  I suspect
there's a race condition in the MultiXact code.

Good, but... where do I find the contents of MultiXactState? ;)


Huh, it should be a global variable.  Try

p *MultiXactState


Done:

(gdb) p *MultiXactState
$1 = {nextMXact = 320308, nextOffset = 4235265, lastTruncationPoint = 
302016, perBackendXactIds = {0}}



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I think the problem is that CreateMultiXactId calls
 GetNewMultiXactId and then RecordNewMultiXact, and the lock is released
 between the calls.  So one backend could try to read the offset before
 another one had the time to finish writing it.

Ugh, yes, that is clearly a hole :-( even if it turns out not to explain
Matteo's observation.

I don't see any easy way to fix this except by introducing a lot more
locking than is there now --- ie, holding the MultiXactGenLock until the
new mxact's starting offset has been written to disk.  Any better ideas?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Martijn van Oosterhout
On Thu, Oct 27, 2005 at 10:23:07AM -0400, Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I think the problem is that CreateMultiXactId calls
  GetNewMultiXactId and then RecordNewMultiXact, and the lock is released
  between the calls.  So one backend could try to read the offset before
  another one had the time to finish writing it.
 
 Ugh, yes, that is clearly a hole :-( even if it turns out not to explain
 Matteo's observation.
 
 I don't see any easy way to fix this except by introducing a lot more
 locking than is there now --- ie, holding the MultiXactGenLock until the
 new mxact's starting offset has been written to disk.  Any better ideas?

I don't see immediatly if it's feasible or not. But another approach is
to detect when it happened, and retry. Parts of the buffer code do this
for example...

Hope this helps,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgpmVhI1j4eti.pgp
Description: PGP signature


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I don't see any easy way to fix this except by introducing a lot more
 locking than is there now --- ie, holding the MultiXactGenLock until the
 new mxact's starting offset has been written to disk.  Any better ideas?

 Well, it isn't a very good solution because it requires us to retain the
 MultiXactGenLock past a XLogInsert and some I/O on SLRU pages.

Yeah :-(.  If MultiXactGenLock wasn't a bottleneck before, it will soon
become one.

 I confess being attracted to Martijn's idea of looping until the correct
 answer is obtained.  I don't think it's even too difficult to implement.
 But I wonder if there's some hidden pitfall.

I've been looking at that and I think it can work.  The key point is
that GetNewMultiXactId() does guarantee that space has been allocated
for the new mxact's offset before it releases the lock (else we'd risk
trying to read a nonexistent slru page when we fetch the offset in
GetMultiXactIdMembers).  And we are careful to zero out newly allocated
space.  So it should be safe to assume that the offset will be zero if
it hasn't been initialized yet.  So we could loop if we see zero.

We'd have to make sure zero is never the *correct* value of the offset,
but that just means wasting one word, which seems no problem.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I think the problem is that CreateMultiXactId calls
  GetNewMultiXactId and then RecordNewMultiXact, and the lock is released
  between the calls.  So one backend could try to read the offset before
  another one had the time to finish writing it.
 
 Ugh, yes, that is clearly a hole :-( even if it turns out not to explain
 Matteo's observation.
 
 I don't see any easy way to fix this except by introducing a lot more
 locking than is there now --- ie, holding the MultiXactGenLock until the
 new mxact's starting offset has been written to disk.  Any better ideas?

Well, it isn't a very good solution because it requires us to retain the
MultiXactGenLock past a XLogInsert and some I/O on SLRU pages.
Previously the lock was mostly only used in short operations and very
rarely held during I/O.  But I don't see any other solution either.
Patch attached.

I confess being attracted to Martijn's idea of looping until the correct
answer is obtained.  I don't think it's even too difficult to implement.
But I wonder if there's some hidden pitfall.

Thanks to Matteo for finding the bug!

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente (UPM, 1972)
Index: src/backend/access/transam/multixact.c
===
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.9
diff -c -r1.9 multixact.c
*** src/backend/access/transam/multixact.c  15 Oct 2005 02:49:09 -  
1.9
--- src/backend/access/transam/multixact.c  27 Oct 2005 15:45:24 -
***
*** 633,638 
--- 633,639 
/*
 * OK, assign the MXID and offsets range to use
 */
+   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
multi = GetNewMultiXactId(nxids, offset);
  
debug_elog4(DEBUG2, Create: assigned id %u offset %u, multi, offset);
***
*** 665,671 
  
(void) XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_CREATE_ID, rdata);
  
!   /* Now enter the information into the OFFSETs and MEMBERs logs */
RecordNewMultiXact(multi, offset, nxids, xids);
  
/* Store the new MultiXactId in the local cache, too */
--- 666,675 
  
(void) XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_CREATE_ID, rdata);
  
!   /*
!* Now enter the information into the OFFSETs and MEMBERs logs.
!* MultiXactGenLock is released here.
!*/
RecordNewMultiXact(multi, offset, nxids, xids);
  
/* Store the new MultiXactId in the local cache, too */
***
*** 681,686 
--- 685,693 
   *Write info about a new multixact into the offsets and members 
files
   *
   * This is broken out of CreateMultiXactId so that xlog replay can use it.
+  *
+  * The caller is assumed to hold the MultiXactGenLock, which will be
+  * released by this routine.
   */
  static void
  RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
***
*** 713,718 
--- 720,731 
  
MultiXactOffsetCtl-shared-page_status[slotno] = SLRU_PAGE_DIRTY;
  
+   /*
+* Now that the offset has been written, we can release the
+* MultiXactGenLock.
+*/
+   LWLockRelease(MultiXactGenLock);
+ 
/* Exchange our lock */
LWLockRelease(MultiXactOffsetControlLock);
  
***
*** 756,761 
--- 769,777 
   * files.  Unfortunately, we have to do that while holding MultiXactGenLock
   * to avoid race conditions --- the XLOG record for zeroing a page must appear
   * before any backend can possibly try to store data in that page!
+  *
+  * The caller is assumed to hold the MultiXactGenLock, and it will be held
+  * at exit.
   */
  static MultiXactId
  GetNewMultiXactId(int nxids, MultiXactOffset *offset)
***
*** 767,774 
/* MultiXactIdSetOldestMember() must have been called already */
Assert(MultiXactIdIsValid(OldestMemberMXactId[MyBackendId]));
  
-   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
- 
/* Handle wraparound of the nextMXact counter */
if (MultiXactState-nextMXact  FirstMultiXactId)
MultiXactState-nextMXact = FirstMultiXactId;
--- 783,788 
***
*** 800,807 
  
MultiXactState-nextOffset += nxids;
  
-   LWLockRelease(MultiXactGenLock);
- 
debug_elog4(DEBUG2, GetNew: returning %u offset %u, result, *offset);
return result;
  }
--- 814,819 
***
*** 1777,1782 
--- 1789,1795 
int i;
  
/* Store the data back into the SLRU files */
+   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
RecordNewMultiXact(xlrec-mid, xlrec-moff, xlrec-nxids, xids);
  
/* Make sure 

Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Alvaro Herrera wrote:

I don't see any easy way to fix this except by introducing a lot more
locking than is there now --- ie, holding the MultiXactGenLock until the
new mxact's starting offset has been written to disk.  Any better ideas?


Well, it isn't a very good solution because it requires us to retain the
MultiXactGenLock past a XLogInsert and some I/O on SLRU pages.
Previously the lock was mostly only used in short operations and very
rarely held during I/O.  But I don't see any other solution either.
Patch attached.


The patch works wonderfully. I'm trying to stress the whole app and with 
no errors until now.




Thanks to Matteo for finding the bug!


Thanks to you all for helping out and fixing it :)



Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:

  I confess being attracted to Martijn's idea of looping until the correct
  answer is obtained.  I don't think it's even too difficult to implement.
  But I wonder if there's some hidden pitfall.
 
 I've been looking at that and I think it can work.  The key point is
 that GetNewMultiXactId() does guarantee that space has been allocated
 for the new mxact's offset before it releases the lock (else we'd risk
 trying to read a nonexistent slru page when we fetch the offset in
 GetMultiXactIdMembers).

The remaining question for me is, how do we sleep until the correct
offset has been stored?

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
Nunca se desea ardientemente lo que solo se desea por razón (F. Alexandre)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 The remaining question for me is, how do we sleep until the correct
 offset has been stored?

I was thinking of just pg_usleep for some nominal time (1ms maybe)
and try to read the offsets page again.  This is a corner case so
the performance doesn't have to be great.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Matteo Beccati

Tom, Alvaro


The remaining question for me is, how do we sleep until the correct
offset has been stored?


I was thinking of just pg_usleep for some nominal time (1ms maybe)
and try to read the offsets page again.  This is a corner case so
the performance doesn't have to be great.


Let me know if you need to test some other patches.

Again, thank you


Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

 We'd have to make sure zero is never the *correct* value of the offset,
 but that just means wasting one word, which seems no problem.

In theory it's possible for only half the word to be written or even to have
outright garbage show up. In practice I think there are no actual
architectures where this can really happen though.

-- 
greg


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 We'd have to make sure zero is never the *correct* value of the offset,
 but that just means wasting one word, which seems no problem.

 In theory it's possible for only half the word to be written or even to have
 outright garbage show up. In practice I think there are no actual
 architectures where this can really happen though.

Not an issue, because we have a lock around the read or write of the
slru buffer page.  The problem is that there's no lock continuously held
through the creation of a multixact entry, and we don't really wish to
add one ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Alvaro Herrera
Matteo Beccati wrote:
 Tom, Alvaro
 
 The remaining question for me is, how do we sleep until the correct
 offset has been stored?
 
 I was thinking of just pg_usleep for some nominal time (1ms maybe)
 and try to read the offsets page again.  This is a corner case so
 the performance doesn't have to be great.
 
 Let me know if you need to test some other patches.

Ok.  I had hoped to reproduce the problem with pristine sources, in
order to verify that I was able to show it not appearing with my patch.
However I have been unable to create a situation in which the problem
appears.  So I attach the patch that I came up with.  Please test it.

I added a loop counter, to verify that we don't loop indefinitely.  I'm
not sure that it's the best way to do it, but I'm too coward to leave it
without any check.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
La soledad es compañía
Index: src/backend/access/transam/multixact.c
===
RCS file: /home/alvherre/cvs/pgsql/src/backend/access/transam/multixact.c,v
retrieving revision 1.9
diff -c -r1.9 multixact.c
*** src/backend/access/transam/multixact.c  15 Oct 2005 02:49:09 -  
1.9
--- src/backend/access/transam/multixact.c  27 Oct 2005 21:39:04 -
***
*** 798,804 
--- 798,807 
  
ExtendMultiXactMember(*offset, nxids);
  
+   /* Advance the offset counter, but don't leave it at 0. */
MultiXactState-nextOffset += nxids;
+   if (MultiXactState-nextOffset == 0)
+   MultiXactState-nextOffset = 1;
  
LWLockRelease(MultiXactGenLock);
  
***
*** 829,834 
--- 832,838 
MultiXactId tmpMXact;
MultiXactOffset nextOffset;
TransactionId *ptr;
+   int j = 0;
  
debug_elog3(DEBUG2, GetMembers: asked for %u, multi);
  
***
*** 922,932 
--- 926,975 
pageno = MultiXactIdToOffsetPage(tmpMXact);
entryno = MultiXactIdToOffsetEntry(tmpMXact);
  
+ retry:
if (pageno != prev_pageno)
slotno = SimpleLruReadPage(MultiXactOffsetCtl, pageno, 
tmpMXact);
  
offptr = (MultiXactOffset *) 
MultiXactOffsetCtl-shared-page_buffer[slotno];
offptr += entryno;
+ 
+   /*
+* Note a possible race condition: when we create a new 
MultiXact and
+* store its info in MultiXactState, we release the 
MultiXactGenLock
+* before storing the offset in the SLRU area.  It's thus 
possible that
+* we just got the offset that some other backend has been 
assigned,
+* but hasn't written on the SLRU page yet.
+*
+* One way to close this hole is to make the creating backend 
hold
+* MultiXactGenLock until the offset is stored.  That would be 
too bad
+* a performance hit, however, so instead we choose to check 
for this
+* situation here: if we read a zero offset, sleep and retry, 
until the
+* other backend has had a chance to write the true offset.
+*
+* Because of this, we have to make sure offset 0 is never used.
+*/
+   if (*offptr == 0)
+   {
+   LWLockRelease(MultiXactOffsetControlLock);
+   pg_usleep(1000);
+ 
+   /*
+* Note that since we released the OffsetControlLock, 
we cannot be
+* sure that the page we read is still on the buffer, 
so we must
+* force it to be read again.
+*/
+   prev_pageno = -1;
+   /*
+* We are not sure that there aren't other bugs in this 
code, so
+* we refuse to iterate more than a minute's worth.
+*/
+ #define MAX_OFFSET_ITERATIONS (60 * 1000)
+   if (j++  MAX_OFFSET_ITERATIONS)
+   elog(PANIC, too many GetMultiXactIdMembers 
iterations);
+ 
+   LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
+   goto retry;
+   }
length = *offptr - offset;
}
  
***
*** 1200,1205 
--- 1243,1254 
  
/* Make sure we zero out the per-backend state */
MemSet(MultiXactState, 0, SHARED_MULTIXACT_STATE_SIZE);
+ 
+   /*
+* zero is not a valid offset, so skip it.  See notes in 
+* GetMultiXactIdMembers.
+*/
+   MultiXactState-nextOffset = 1;
}
else
Assert(found);

---(end of broadcast)---
TIP 6: 

Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Ok.  I had hoped to reproduce the problem with pristine sources, in
 order to verify that I was able to show it not appearing with my patch.
 However I have been unable to create a situation in which the problem
 appears.  So I attach the patch that I came up with.  Please test it.

On further reflection, this isn't gonna work :-(.  The problem with the
waste-a-slot approach is that it creates an ambiguity near the offsets
wraparound point: if you are looking at an mxid with starting offset
just under 2^32, and you see the next mxid has start offset 1, did your
mxid include the xid in offset 0 or not?

We could possibly fix that by decreeing that wrapped-around mxids never
use slot 0, but it seems pretty darn messy: that would affect fetching
and storing loops as well as the code that allocates space.

I'm currently experimenting with an alternative approach, which leaves
the nextOffset arithmetic as it was and instead special-cases the zero
offset case this way:

 * 2. The next multixact may still be in process of being filled in:
 * that is, another process may have done GetNewMultiXactId but not yet
 * written the offset entry for that ID.  In that scenario, it is
 * guaranteed that the offset entry for that multixact exists (because
 * GetNewMultiXactId won't release MultiXactGenLock until it does)
 * but contains zero (because we are careful to pre-zero offset pages).
 * So, if we read zero as the next multixact offset, we have to treat
 * it with suspicion.  It could be valid, though.  We deal with this
 * ambiguity by requiring processes that are creating a multixact with
 * starting offset zero to set the creatingOffsetZero flag in the shared
 * data structure; we sleep until we see that cleared before trusting
 * a zero offset.  This is all pretty messy, but the mess occurs only
 * in infrequent corner cases, so it seems better than holding the
 * MultiXactGenLock for a long time on every multixact creation.

creatingOffsetZero will be a bool that gets set before releasing
MultiXactGenLock if offset 0 is being returned, and then we clear it
after updating the slru data structures if we had starting offset 0.

Thoughts?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ERROR: invalid memory alloc request size a_big_number_here

2005-10-27 Thread Tom Lane
I wrote:
 I'm currently experimenting with an alternative approach, which leaves
 the nextOffset arithmetic as it was and instead special-cases the zero
 offset case this way:

Attached is a completed patch, which I've had no time to test yet, but
I have to leave for the evening right now --- so here it is in case
anyone is awake and wants to poke at it.


regards, tom lane



binBrekpOhBzr.bin
Description: mxact-2.patch

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match