Re: [GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? --+- Reporter: jeffz | Owner: Type: bug | Status: closed Priority: normal| Milestone: 6.8.3 Component: Compiler |Version: 6.8.2 Severity: normal| Resolution: worksforme Keywords:| Difficulty: Unknown Testcase:| Architecture: x86 Os: Windows | --+- Changes (by igloo): * status: new = closed * resolution: = worksforme Comment: I can't reproduce this with Debian's wine 0.9.41-1, so it sounds like a wine bug to me. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? --+- Reporter: jeffz | Owner: Type: bug | Status: new Priority: normal| Milestone: 6.8.3 Component: Compiler |Version: 6.8.2 Severity: normal| Resolution: Keywords:| Difficulty: Unknown Testcase:| Architecture: x86 Os: Windows | --+- Changes (by igloo): * milestone: = 6.8.3 -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? --+- Reporter: jeffz | Owner: Type: bug | Status: new Priority: normal| Milestone: Component: Compiler |Version: 6.8.2 Severity: normal| Resolution: Keywords:| Difficulty: Unknown Testcase:| Architecture: x86 Os: Windows | --+- Comment (by simonmar): The valgrind error is this: {{{ ==1418== Thread 4: ==1418== Invalid read of size 4 ==1418==at 0xE3E0D3: ??? ==1418==by 0xE22109: ??? ==1418==by 0x7BC6AAD1: call_thread_func (thread.c:398) ==1418==by 0x7BC6AD91: start_thread (thread.c:472) ==1418==by 0x491A31A: start_thread (in /lib32/libpthread-2.5.so) ==1418==by 0x4A0179D: clone (in /lib32/libc-2.5.so) ==1418== Address 0xb52299c is not stack'd, malloc'd or (recently) free'd }}} there are a few others similar to this. The ??? probably indicates that the error occurred somewhere within GHC and the symbols aren't available. This may well be a real problem, although I can't repeat it with valgrind on Linux. It doesn't look like Purify is reporting the same thing - the purify error is an invalid free, which doesn't appear in the Valgrind log, AFAICS. Also it is in `IsValidLocale`, which is something we don't call anywhere in GHC. However, we do call `MultiByteToWideChar`, which conceivably might call `IsValidLocale` inside Wine. I can't see anything wrong with the way we call `MultiByteToWideChar` though - all the alloc/dealloc is supposed to be done by the caller. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? --+- Reporter: jeffz | Owner: Type: bug | Status: new Priority: normal| Milestone: Component: Compiler |Version: 6.8.2 Severity: normal| Resolution: Keywords:| Difficulty: Unknown Testcase:| Architecture: x86 Os: Windows | --+- Changes (by simonmar): * difficulty: = Unknown Comment: Could you explain which part(s) of the logs lead you to believe the bug is in GHC? The logs are huge, and I couldn't find any clues pointing at GHC with a quick scan, but maybe I missed something. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: [GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? --+- Reporter: jeffz | Owner: Type: bug | Status: new Priority: normal| Milestone: Component: Compiler |Version: 6.8.2 Severity: normal| Resolution: Keywords:| Difficulty: Unknown Testcase:| Architecture: x86 Os: Windows | --+- Comment (by jeffz): Replying to [comment:1 simonmar]: Could you explain which part(s) of the logs lead you to believe the bug is in GHC? The logs are huge, and I couldn't find any clues pointing at GHC with a quick scan, but maybe I missed something. Line 2962 of ghc-valgrind4.txt looks suspicious, but the purify log is more precise, specifying this exactly: [E] FIM: Freeing invalid memory in LocalFree {36 occurrences} Address 0x00265650 points into a HeapAlloc'd block in unallocated region of the default heap Location of free attempt LocalFree [C:\WINDOWS\system32\KERNEL32.dll] IsValidLocale [C:\WINDOWS\system32\kernel32.dll] -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
[GHC] #2091: heap corruption in ghc-6.8.2?
#2091: heap corruption in ghc-6.8.2? ---+ Reporter: jeffz | Owner: Type: bug | Status: new Priority: normal | Component: Compiler Version: 6.8.2 |Severity: normal Keywords: |Testcase: Architecture: x86 | Os: Windows ---+ ghc-6.8.2-i386-unknown-mingw32 appears to suffer from heap corruption I encountered this problem when running ghc on wine-0.9.55, I have run runghc on wine through valgrind and also used Purify on Windows XP SP2 which confirmed that something was wrong. My initial reaction was to first file this bug with the wine project but on later investigation decided it was appropriate to open the bug here. Please see http://bugs.winehq.org/show_bug.cgi?id=11547 for logs and further details. -- Ticket URL: http://hackage.haskell.org/trac/ghc/ticket/2091 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
Re: heap corruption?
Hi Simon, I'm sorry I don't have a test case, aside from what I've sent already, and it's sometimes difficult to reproduce. (though note that the GSLHaskell library can still be downloaded at the provided URL) However, I've checked and double-checked our code and can't find anything on our side which could result in the reported bug. Since it's happening only for allocations above 2^20-8128 bytes, I would recommend creating a ticket and some time checking the heap code for any special cases that happen for allocations near 1MB. Frederik On Tue, Aug 08, 2006 at 03:18:34PM +0100, Simon Marlow wrote: (cleaning up old mail). Frederik: did you ever get to the bottom of this? Do you have a test case, and should we create a ticket for it? Cheers, Simon Frederik Eaton wrote: I'm now using 6.4.2. The bug still persists, but I can't immediately reproduce it with basic GSLHaskell operations, nor with raw memory operations. It is even difficult to reproduce as it is, the examples I gave don't trigger it anymore (I was running them under 6.4.1). For instance, this gives a correct result: dot (useFast $ $(dim 100) (2*vconst 2)) (vconst 4) and this gives an incorrect result: dot (useFast $ $(dim 100) (2.*vconst 2)) (vconst 4) (the first one creates a new matrix of 2's and does element-wise multiplication; the second just scales by 2) It could be a bug in our code, but I'd be surprised. It only triggers for vectors of length 130056 or greater (that's 130056 8-byte doubles, or 2^20-8128 bytes). If something were being prematurely finalized, one would imagine that shorter vectors would also expose the problem. Most of all, it seems very strange that in certain situations two different objects would get the same address when they're large, but not when they're small. But I'll keep looking for the source of the problem. Frederik On Fri, Jun 02, 2006 at 05:11:24PM +0200, Alberto Ruiz wrote: Hi Frederik, I will be out of town this weekend (I really need it!), I will take a look at the problem on Monday. By now my recommendation is: 1) try to reproduce the problem without GSLHaskell or any other library, using just mallocArray, newArray, etc. with those big sizes. (Use ghc 6.4.2 or better..) 2) If this is not possible, try to reproduce the problem with the simplest possible example using the unmodified GSLHaskell. Alberto On Friday 02 June 2006 16:15, Frederik Eaton wrote: Hi Alberto, I'm experiencing a problem which may be a bug in GHC. Here I take the dot product of two vectors. I put debugging traces in the 'dotR' C function (which I call in 'dot'), so one can see the addresses of the memory blocks. When the vectors are below a certain size, the function seems to give the correct result. However, when they are above a certain size, the same pointers are sometimes used for both arguments (ap and bp), and the result is of course incorrect. The incorrect result is consistent with both pointers referring to the first vector (vconst 2). Also, with other arguments I sometimes get heap corruption. Any idea what might be the problem? Prelude Vector print $ (dot $ useFast $ $(dim 1000) $ (vconst 2)) (ones) ap=0xb1b0a008, bp=0xb1b0c008, rp=0xb1b036c8 an=1000, bn=1000, rn=1 res=2000.00 2000.0 Prelude Vector print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (ones) ap=0xb3402008, bp=0xb3402008, rp=0xb1b0c770 an=100, bn=100, rn=1 res=400.00 400.0 Prelude Vector print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (vconst 2) ap=0xb2c02008, bp=0xb2302008, rp=0xb1b08780 an=100, bn=100, rn=1 ap[130055]=2.00 bp[130055]=-0.00 ap[130056]=2.00 bp[130056]=2.01 ap[130058]=2.00 bp[130058]=2.00 ap[130059]=2.00 bp[130059]=0.00 ap[130060]=2.00 bp[130060]=2.01 ... res=3992888.004728 3992888.004727577 Here is dotR: int dotR(DVEC(a), DVEC(b), DVEC(r)) { REQUIRES(an == bn rn == 1, BAD_SIZE); DEBUGMSG(dotR); fprintf(stderr,ap=%p, bp=%p, rp=%p\n,ap,bp,rp); fprintf(stderr,an=%d, bn=%d, rn=%d\n,an,bn,rn); double res=0.0; int i; for(i=0; ian; i++) { if(ap[i] != floor(ap[i]) || bp[i] != floor(bp[i])) fprintf(stderr,ap[%d]=%lf bp[%d]=%lf\n,i,ap[i],i,bp[i]); res += ap[i]*bp[i]; } fprintf(stderr,res=%lf\n,res); rp[0] = res; OK } I switched to not using gsl here, so that I could be sure the bug wasn't in gsl. However, when using gsl the same problems occur. So I think it may be heap corruption in ghc. But I don't know how to debug that. I tried running in valgrind but it doesn't show anything obvious. If you want to run what I have, here is my code: http://ofb.net/~frederik/GSLHaskell2.tar.gz The problem is exposed by building and running: print -l ':m Vector' 'print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (vconst 2)' | =ghci -package GSL -fglasgow-exts Thanks, Frederik
Re: heap corruption?
Hi Alberto, No, nothing new. I talked to John Meacham and he said that using unsafePerformIO with ForeignPtr is asking for trouble; but then when I showed him the code he was like oh that's probably OK. So I guess you've actually managed to reproduce it then? Since you use withForeignPtr everywhere, it should be impossible for any C function to get an unallocated piece of memory, right? I don't ever allocate my own memory in my code. I imagine it's just exposing a race condition in GHC. Frederik On Mon, Jun 05, 2006 at 09:54:27PM +0200, Alberto Ruiz wrote: Hi Frederik, have you discovered something new? Curiously, when I print the foreign pointer in createV, the problem disappears (in ghc6.4.1, tomorrow I will try this in other machine with 6.4.2)... Alberto createV s n f = unsafePerformIO $ do p - mallocForeignPtrArray n --print p prot s $ withForeignPtr p (f n) --print p return (V n p) Loading package base-1.0 ... linking ... done. Prelude :m Vector Prelude Vector dot (useFast $ $(dim 100) (2*vconst 2)) (vconst 4) Loading package bla bla ap=0x2202008, bp=0x2202008, rp=0x6446c8 an=100, bn=100, rn=1 res=1600.00 1.6e7 Prelude Vector dot (useFast $ $(dim 100) (2.*vconst 2)) (vconst 4) ap=0x3602008, bp=0x1502008, rp=0x404770 an=100, bn=100, rn=1 res=15971552.000417 1.597155200041718e7 createV s n f = unsafePerformIO $ do p - mallocForeignPtrArray n print p prot s $ withForeignPtr p (f n) print p return (V n p) Prelude :m Vector Prelude Vector dot (useFast $ $(dim 100) (2*vconst 2)) (vconst 4) Loading package bla bla 0x007066c8 0x04602008 0x04602008 0x04e02008 0x04e02008 0x04602008 0x01402008 0x01402008 0x04602008 ap=0x4602008, bp=0x4602008, rp=0x7066c8 an=100, bn=100, rn=1 res=1600.00 0x007066c8 1.6e7 Prelude Vector dot (useFast $ $(dim 100) (2.*vconst 2)) (vconst 4) 0x00412770 0x02102008 0x02102008 0x02902008 0x02902008 0x02102008 0x02102008 ap=0x2102008, bp=0x2102008, rp=0x412770 an=100, bn=100, rn=1 res=1600.00 0x00412770 1.6e7 On Friday 02 June 2006 22:00, Frederik Eaton wrote: I'm now using 6.4.2. The bug still persists, but I can't immediately reproduce it with basic GSLHaskell operations, nor with raw memory operations. It is even difficult to reproduce as it is, the examples I gave don't trigger it anymore (I was running them under 6.4.1). For instance, this gives a correct result: dot (useFast $ $(dim 100) (2*vconst 2)) (vconst 4) and this gives an incorrect result: dot (useFast $ $(dim 100) (2.*vconst 2)) (vconst 4) (the first one creates a new matrix of 2's and does element-wise multiplication; the second just scales by 2) It could be a bug in our code, but I'd be surprised. It only triggers for vectors of length 130056 or greater (that's 130056 8-byte doubles, or 2^20-8128 bytes). If something were being prematurely finalized, one would imagine that shorter vectors would also expose the problem. Most of all, it seems very strange that in certain situations two different objects would get the same address when they're large, but not when they're small. But I'll keep looking for the source of the problem. Frederik On Fri, Jun 02, 2006 at 05:11:24PM +0200, Alberto Ruiz wrote: Hi Frederik, I will be out of town this weekend (I really need it!), I will take a look at the problem on Monday. By now my recommendation is: 1) try to reproduce the problem without GSLHaskell or any other library, using just mallocArray, newArray, etc. with those big sizes. (Use ghc 6.4.2 or better..) 2) If this is not possible, try to reproduce the problem with the simplest possible example using the unmodified GSLHaskell. Alberto On Friday 02 June 2006 16:15, Frederik Eaton wrote: Hi Alberto, I'm experiencing a problem which may be a bug in GHC. Here I take the dot product of two vectors. I put debugging traces in the 'dotR' C function (which I call in 'dot'), so one can see the addresses of the memory blocks. When the vectors are below a certain size, the function seems to give the correct result. However, when they are above a certain size, the same pointers are sometimes used for both arguments (ap and bp), and the result is of course incorrect. The incorrect result is consistent with both pointers referring to the first vector (vconst 2). Also, with other arguments I sometimes get heap corruption. Any idea what might be the problem? Prelude Vector print $ (dot $ useFast $ $(dim 1000) $ (vconst 2)) (ones) ap=0xb1b0a008, bp=0xb1b0c008, rp=0xb1b036c8 an=1000, bn=1000, rn=1 res=2000.00 2000.0 Prelude Vector print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (ones) ap=0xb3402008, bp=0xb3402008, rp=0xb1b0c770 an=100, bn=100, rn=1 res
heap corruption?
Hi Alberto, I'm experiencing a problem which may be a bug in GHC. Here I take the dot product of two vectors. I put debugging traces in the 'dotR' C function (which I call in 'dot'), so one can see the addresses of the memory blocks. When the vectors are below a certain size, the function seems to give the correct result. However, when they are above a certain size, the same pointers are sometimes used for both arguments (ap and bp), and the result is of course incorrect. The incorrect result is consistent with both pointers referring to the first vector (vconst 2). Also, with other arguments I sometimes get heap corruption. Any idea what might be the problem? Prelude Vector print $ (dot $ useFast $ $(dim 1000) $ (vconst 2)) (ones) ap=0xb1b0a008, bp=0xb1b0c008, rp=0xb1b036c8 an=1000, bn=1000, rn=1 res=2000.00 2000.0 Prelude Vector print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (ones) ap=0xb3402008, bp=0xb3402008, rp=0xb1b0c770 an=100, bn=100, rn=1 res=400.00 400.0 Prelude Vector print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (vconst 2) ap=0xb2c02008, bp=0xb2302008, rp=0xb1b08780 an=100, bn=100, rn=1 ap[130055]=2.00 bp[130055]=-0.00 ap[130056]=2.00 bp[130056]=2.01 ap[130058]=2.00 bp[130058]=2.00 ap[130059]=2.00 bp[130059]=0.00 ap[130060]=2.00 bp[130060]=2.01 ... res=3992888.004728 3992888.004727577 Here is dotR: int dotR(DVEC(a), DVEC(b), DVEC(r)) { REQUIRES(an == bn rn == 1, BAD_SIZE); DEBUGMSG(dotR); fprintf(stderr,ap=%p, bp=%p, rp=%p\n,ap,bp,rp); fprintf(stderr,an=%d, bn=%d, rn=%d\n,an,bn,rn); double res=0.0; int i; for(i=0; ian; i++) { if(ap[i] != floor(ap[i]) || bp[i] != floor(bp[i])) fprintf(stderr,ap[%d]=%lf bp[%d]=%lf\n,i,ap[i],i,bp[i]); res += ap[i]*bp[i]; } fprintf(stderr,res=%lf\n,res); rp[0] = res; OK } I switched to not using gsl here, so that I could be sure the bug wasn't in gsl. However, when using gsl the same problems occur. So I think it may be heap corruption in ghc. But I don't know how to debug that. I tried running in valgrind but it doesn't show anything obvious. If you want to run what I have, here is my code: http://ofb.net/~frederik/GSLHaskell2.tar.gz The problem is exposed by building and running: print -l ':m Vector' 'print $ (dot $ useFast $ $(dim 100) $ (vconst 2)) (vconst 2)' | =ghci -package GSL -fglasgow-exts Thanks, Frederik -- http://ofb.net/~frederik/ ___ Glasgow-haskell-bugs mailing list Glasgow-haskell-bugs@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs