Re: ALERT - canary mismatch on efree() - heap overflow detected

Vasileios Kontorinis Tue, 22 Jun 2010 14:22:40 -0700

Anyways,
   Dropping the .jar file in the faban/lib folder seems to work. I renamed
the patched version to commons-httpclient-3.1 and gave it a shot. I ll let
you know if it solves the problem after I do a bunch of runs to make sure
the "cave" behavior is not there....
-------------------------------------------------------------------
Kontorinis Vasileios
Phd student, University of California San Diego
http://cseweb.ucsd.edu/~vkontori/
[email protected]
-------------------------------------------------------------------



2010/6/22 Vasileios Kontorinis <[email protected]>

> Akara and Shanti,
> I am not sure if this was more suited for the user or the development list.
> Anyways I cced the dev.
>
> I have the feeling that I eventually figured this one out. After long
> searches on the web I came across the following links.
> http://old.nabble.com/-jira--Created:-(9:30 AM Pacific
> HTTPCLIENT-796)-IOException-when-server-closes-connection-at-end-of-chunk-td19434908.html<http://old.nabble.com/-jira--Created:-%28HTTPCLIENT-796%29-IOException-when-server-closes-connection-at-end-of-chunk-td19434908.html>
>
> https://issues.apache.org/jira/browse/HTTPCORE-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>
> It's about the same JIRA, saying that commons-httpclient-3.1 can behave
> weirdly and kill the connection when the last packet of a response is
> empty(at least that is how I interpreted it). I am not sure if this is the
> fix I am after but the stacktrace I got from Olio and the one described in
> the links above do match. The olio stack when I get the chunked stream ended
> unexpectedly is below.
>
> I have patched commons-httpclient-3.1 myself (there is a patch for 3.1 in
> the above links, "c.patch"). However, in order to get faban to use the
> patched commons-httpclient-3.1 I ll need to recompile faban (right?). Can I
> just drop "commons-httpclient-3.1-patched" inside faban/lib and expect it to
> run?? Should I rename commons-httpclient-3.1-patched to
> commons-httpclient-3.1, in order to work??
>
> Anyways, this thing has been the source of great headaches. I have to run
> all my experiments twice. This way when I get the "cave" behavior (as Shanti
> said) I can consider the experiment hosed. Of course this doubles the time,
> plus is annoying...
>
> Let me know....
>
> Message: UIDriverAgent[0].127.doEventDetail: chunked stream ended
> unexpectedly
> ------------------------------
>
> Exception: Message: java.io.IOException: chunked stream ended unexpectedly
> Stack Trace: Class Method Line
> org.apache.commons.httpclient.ChunkedInputStream
> getChunkSizeFromInputStream 252
> org.apache.commons.httpclient.ChunkedInputStream nextChunk 221
> org.apache.commons.httpclient.ChunkedInputStream read 176
> java.io.FilterInputStream read 116
> org.apache.commons.httpclient.AutoCloseInputStream read 108
> sun.nio.cs.StreamDecoder readBytes 264 sun.nio.cs.StreamDecoder implRead
> 306 sun.nio.cs.StreamDecoder read 158 java.io.InputStreamReader read 167
> com.sun.faban.driver.transport.hc3.ApacheHC3Transport fetchResponseData
> 912 com.sun.faban.driver.transport.hc3.ApacheHC3Transport fetchResponse
> 838 com.sun.faban.driver.transport.hc3.ApacheHC3Transport fetchURL 543
> com.sun.faban.driver.transport.hc3.ApacheHC3Transport fetchURL 564
> org.apache.olio.workload.driver.UIDriver doEventDetail 611
> sun.reflect.GeneratedMethodAccessor6 invoke
> sun.reflect.DelegatingMethodAccessorImpl invoke 25
> java.lang.reflect.Method invoke 597 com.sun.faban.driver.engine.TimeThread
> doRun 169 com.sun.faban.driver.engine.AgentThread
>
> run 202
>
> --Cheers
>
>
> -------------------------------------------------------------------
> Kontorinis Vasileios
> Phd student, University of California San Diego
> http://cseweb.ucsd.edu/~vkontori/ <http://cseweb.ucsd.edu/%7Evkontori/>
> [email protected]
> -------------------------------------------------------------------
> -------------------------------------------------------------------
> Kontorinis Vasileios
> Phd student, University of California San Diego
> http://cseweb.ucsd.edu/~vkontori/ <http://cseweb.ucsd.edu/%7Evkontori/>
> [email protected]
> -------------------------------------------------------------------
>
>
> 2010/6/3 Shanti Subramanyam <[email protected]>
>
>
>>
>> On Wed, Jun 2, 2010 at 3:08 PM, Vasileios Kontorinis <
>> [email protected]> wrote:
>>
>>> 4. I am curious about this one. I have increased the number of
>>> StartServers in apache.conf to 1536...
>>>
>>> <IfModule mpm_prefork_module>
>>>     ListenBacklog 32768
>>>     StartServers 1536
>>>     MinSpareServers 20
>>>     MaxSpareServers 128
>>>     ServerLimit 16384
>>>     MaxClients 16384
>>>     MaxRequestsPerChild 0
>>> </IfModule>
>>>
>>> This should take care of it. Nope?
>>>
>>
>> No it won't. Having a worker process already in place helps but remember
>> that apache uses a single process to handle all incoming connections after
>> which it will hand off to the worker process.
>>
>> I went over the code in sun/faban/driver/engine/AgentImpl.java and if I
>>> interpret it correctly faban needs to start #_of_ users=#_of_threads. If
>>> there is one new thread every 500ms it will take quite some time (400secs).
>>> In the code it shows that it will sleep for less time if we are falling
>>> short -- 1/3 of the actual interval, so it should be around 100secs. Which
>>> means  2 mins should be ok.
>>>
>>
>> Right. But I suggest a rampup of 3-4 mins at least to get transactions
>> going before you start measuring.
>>
>>
>>> Now regarding the "cave" behavior. Is there a chance that there a bunch
>>> of users dropped their connection at the same time and apache could not
>>> handle it? Is there a way to verify that is the case? What if I increase the
>>> MinSpareServers/MaxSpareServers settings?
>>>
>>>
>> Easiest way to verify is to try the changes I suggested.
>>
>>
>>> Thansk again.
>>>
>>> -------------------------------------------------------------------
>>> Kontorinis Vasileios
>>> Phd student, University of California San Diego
>>> http://cseweb.ucsd.edu/~vkontori/ <http://cseweb.ucsd.edu/%7Evkontori/>
>>> [email protected]
>>> -------------------------------------------------------------------
>>>
>>>
>> Shanti
>>
>>
>>>
>>> 2010/6/1 Shanti Subramanyam <[email protected]>
>>>
>>> Sorry for the late reply. I just took a look at your data.
>>>> Some comments :
>>>>
>>>> 1. Your rampup time of 30 secs is too small. You may want to increase it
>>>> to a few minutes, considering the load you're targeting.
>>>> 2. I see that you're specifying a variable load - yet the thruput graph
>>>> shows a constant thruput (until it caves in of course). Did you try doing a
>>>> run with a constant 800 scale i.e. don't specify the variable load.
>>>> 3. The apache error log shows timestamps that are different from the
>>>> driver. Can you please sync the time on the systems ? If running as root,
>>>> faban does this automatically. Otherwise, once in awhile, you should check
>>>> what 'date' shows on all your systems.
>>>> 4. Your 'threadstart/delay' is very small - For this load, I would
>>>> recommend 300- 500 ms. Apache is just not capable of handling lots of
>>>> connections in a very short time. This may be the reason for the 'connect
>>>> timeout out' messages in the run log.
>>>>
>>>> Shanti
>>>>
>>>>
>>>> On Tue, May 25, 2010 at 12:12 PM, Vasileios Kontorinis <
>>>> [email protected]> wrote:
>>>>
>>>>> Just to clarify, with "package up" you mean tar/gzip the output dir of
>>>>> the specific run with the errors, right?
>>>>> If yes, I have updated the earlier webpage.
>>>>> http://cseweb.ucsd.edu/~vkontori/olio/olio.html<http://cseweb.ucsd.edu/%7Evkontori/olio/olio.html>
>>>>>
>>>>>
>>>>> There you can find a small description of the issue and the run in
>>>>> http://cseweb.ucsd.edu/~vkontori/olio/run.tgz<http://cseweb.ucsd.edu/%7Evkontori/olio/run.tgz>
>>>>>
>>>>> I am not sure if the apache.conf and php.ini is copied to the folder by
>>>>> default so I just copied in the .tgz file by myself.
>>>>> Let me know if there are any issues.
>>>>>
>>>>> Thanks!!
>>>>> -------------------------------------------------------------------
>>>>> Kontorinis Vasileios
>>>>> Phd student, University of California San Diego
>>>>> http://cseweb.ucsd.edu/~vkontori/<http://cseweb.ucsd.edu/%7Evkontori/>
>>>>> [email protected]
>>>>> -------------------------------------------------------------------
>>>>>
>>>>>
>>>>> 2010/5/25 Shanti Subramanyam <[email protected]>
>>>>>
>>>>> If you're running things locally, we can't blame nfs any longer :-)
>>>>>> If you don't mind, package up the entire run directory and I'll take a
>>>>>> look at it (I assume you can put it somewhere where I can download it).
>>>>>>
>>>>>> Shanti
>>>>>>
>>>>>>
>>>>>> On Mon, May 24, 2010 at 10:43 PM, Vasileios Kontorinis <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Well,
>>>>>>>    It feels like there are more than one things wrong in my
>>>>>>> configuration.
>>>>>>> I stopped using the thumper and went back to using the local fs of
>>>>>>> the web-server.
>>>>>>>
>>>>>>> Some of my issues went away but not all of them.
>>>>>>> Now it fails less often, every 5-10 runs.
>>>>>>>
>>>>>>> The error message in the faban logs:
>>>>>>>
>>>>>>> *UIDriverAgent[0].752.do <op> : chunked stream ended unexpectedly
>>>>>>> Note: Error not counted in result.
>>>>>>> Either transaction start or end time is not within steady state.
>>>>>>> *
>>>>>>> (if the error happens in the steady state it is counted...)
>>>>>>>
>>>>>>> I looked a little bit about it online, it seems like the server is
>>>>>>> getting a response that is not expecting. I have no clue how to debug 
>>>>>>> that.
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Just a sanity check. I am using:
>>>>>>> *php* v.5.2.4 (without Suhosin patch, I removed it)
>>>>>>> *Server* version: Apache/2.2.11 (Ubuntu)
>>>>>>> *memcached* 1.2.2*   (*from phpinfo() I read *memcache *2.2.5)
>>>>>>> *mysql*  Ver 14.12 Distrib 5.0.75, for debian-linux-gnu (x86_64)
>>>>>>> *apc* 3.0.19
>>>>>>>
>>>>>>> Is this old?? I am considering upgrading my Ubuntu version from Hardy
>>>>>>> to Lucid with all the related package updates.
>>>>>>> As far php is concerned I know the current version is 5.2.12 so mine
>>>>>>> is pretty old.
>>>>>>>
>>>>>>> Let me know
>>>>>>> Thanks again.
>>>>>>>
>>>>>>> -------------------------------------------------------------------
>>>>>>> Kontorinis Vasileios
>>>>>>> Phd student, University of California San Diego
>>>>>>> http://cseweb.ucsd.edu/~vkontori/<http://cseweb.ucsd.edu/%7Evkontori/>
>>>>>>> [email protected]
>>>>>>> -------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> 2010/5/24 Vasileios Kontorinis <[email protected]>
>>>>>>>
>>>>>>> I am using v3.
>>>>>>>>
>>>>>>>> What I observed and made me suspicious of nfs is in syslog a bunch
>>>>>>>> of messages:
>>>>>>>> May 22 00:22:37 olio-web -- MARK --
>>>>>>>> May 22 00:30:29 olio-web kernel: [2167206.713881] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:03 olio-web kernel: [2167240.993893] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:06 olio-web kernel: [2167243.193896] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:07 olio-web kernel: [2167244.349889] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:07 olio-web kernel: [2167244.357893] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:07 olio-web kernel: [2167244.669885] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:10 olio-web kernel: [2167247.781891] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:10 olio-web kernel: [2167247.785889] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:11 olio-web kernel: [2167248.725885] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:13 olio-web kernel: [2167250.153892] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.173886] nfs: server
>>>>>>>> 67.58.51.149 not responding, still trying
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.410700] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411158] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411236] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411249] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411354] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411678] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411691] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411712] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.411723] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.412047] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:31:14 olio-web kernel: [2167251.414051] nfs: server
>>>>>>>> 67.58.51.149 OK
>>>>>>>> May 22 00:42:37 olio-web -- MARK --
>>>>>>>> May 22 01:02:37 olio-web -- MARK --
>>>>>>>>
>>>>>>>> Regarding the olio run log it's weird.
>>>>>>>> Usually says nothing.  Sometimes it gives a bunch of
>>>>>>>>
>>>>>>>> UIDriverAgent[0].752.do <op> : chunked stream ended unexpectedly
>>>>>>>> Note: Error not counted in result.
>>>>>>>> Either transaction start or end time is not within steady state.
>>>>>>>>
>>>>>>>> and also
>>>>>>>>
>>>>>>>> UIDriverAgent[0].635.do <op> : connect timed out
>>>>>>>> Note: Error not counted in result.
>>>>>>>> Either transaction start or end time is not within steady state.
>>>>>>>>
>>>>>>>> where <op> different operations....
>>>>>>>>
>>>>>>>> But it does not log this every time...
>>>>>>>>
>>>>>>>> Is there a way to monitor nfs internals (buffering, throughput etc)
>>>>>>>> on the fly?
>>>>>>>> I am using sar but it does not give much helpful info?
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------
>>>>>>>> Kontorinis Vasileios
>>>>>>>> Phd student, University of California San Diego
>>>>>>>> http://cseweb.ucsd.edu/~vkontori/<http://cseweb.ucsd.edu/%7Evkontori/>
>>>>>>>> [email protected]
>>>>>>>> -------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>> 2010/5/24 Shanti Subramanyam <[email protected]>
>>>>>>>>
>>>>>>>> Which version of NFS are you using ? I suggest you try v3 - it is
>>>>>>>>> more efficient. We've run into issues with v4 causing unacceptable 
>>>>>>>>> response
>>>>>>>>> times.
>>>>>>>>> For the kind of drop you are seeing, I am surprised that you don't
>>>>>>>>> find any errors anywhere.  In the first 400 user run for example, it 
>>>>>>>>> looks
>>>>>>>>> like all the processes either exited or are stalled. In either case, 
>>>>>>>>> I would
>>>>>>>>> expect to see errors in the faban run log (either that the driver got 
>>>>>>>>> an
>>>>>>>>> error or that it timed out). Are you sure you checked the faban log ?
>>>>>>>>>
>>>>>>>>> Have you tried running nfsstat to see if you can spot anything ?
>>>>>>>>>
>>>>>>>>> Shanti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sun, May 23, 2010 at 7:07 PM, Vasileios Kontorinis <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Shanti hi again,
>>>>>>>>>>     I sort of managed to fix that. I tried upgrading my php
>>>>>>>>>> version to 5.2.6 and the alert went away. My problems though are not 
>>>>>>>>>> fixed.
>>>>>>>>>> I even tried completely removing suhosin patch (it was a huge pain
>>>>>>>>>> in ubuntu, since you need to recompile the php module by yourself)
>>>>>>>>>> Still though my proms are there.
>>>>>>>>>>
>>>>>>>>>> Now, I get no warning the logs are clean but I get weird behavior.
>>>>>>>>>> I needed to send you guys some pics so I created a related page at:
>>>>>>>>>> http://cseweb.ucsd.edu/~vkontori/olio/olio.html<http://cseweb.ucsd.edu/%7Evkontori/olio/olio.html>
>>>>>>>>>> I have comments describing the prom at the end.
>>>>>>>>>>
>>>>>>>>>> <http://cseweb.ucsd.edu/%7Evkontori/olio/olio.html>Any help would
>>>>>>>>>> be most appreciated. I ve spent so much time on it without figuring 
>>>>>>>>>> it out.
>>>>>>>>>> My configuration is 1 web server on a vm with 6GB of mem. 4 cpus
>>>>>>>>>>                                1 db server on a vm with  5GB of
>>>>>>>>>> mem. 4 cpus
>>>>>>>>>>                                1 fs server on a vm with  4GB of
>>>>>>>>>> mem. 4 cpus. (this one just exposes over NFS the filestore)
>>>>>>>>>> All on the same physical machine a nehalem based server, siting on
>>>>>>>>>> a Sun's Black box.
>>>>>>>>>> I got similar behavior when I exposed the filestore on the Sun's
>>>>>>>>>> thumper.
>>>>>>>>>>
>>>>>>>>>> Any help would be most appreciated.
>>>>>>>>>>
>>>>>>>>>> Thanks
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>> Kontorinis Vasileios
>>>>>>>>>> Phd student, University of California San Diego
>>>>>>>>>> http://cseweb.ucsd.edu/~vkontori/<http://cseweb.ucsd.edu/%7Evkontori/>
>>>>>>>>>> [email protected]
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2010/5/19 Shanti Subramanyam <[email protected]>
>>>>>>>>>>
>>>>>>>>>>  It's strange that multiple files seem to be complaining about it.
>>>>>>>>>>> Did you try disabling Suhosin ? Are you seeing a perceptible drop 
>>>>>>>>>>> in memory
>>>>>>>>>>> after reaching steady state ?
>>>>>>>>>>>
>>>>>>>>>>> shanti
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, May 19, 2010 at 4:28 PM, Vasileios Kontorinis <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Lately I get a bunch of these errors in my logs:
>>>>>>>>>>>>
>>>>>>>>>>>> [Wed May 19 22:26:37 2010] [error] [client 10.17.255.250] ALERT
>>>>>>>>>>>> - canary mismatch on efree() - heap overflow detected (attacker
>>>>>>>>>>>> '10.17.255.250', file 
>>>>>>>>>>>> '/var/www/oliophp/public_html/taggedEvents.php')
>>>>>>>>>>>> [Wed May 19 22:26:37 2010] [error] [client 10.17.255.250] ALERT
>>>>>>>>>>>> - canary mismatch on efree() - heap overflow detected (attacker
>>>>>>>>>>>> '10.17.255.250', file 
>>>>>>>>>>>> '/var/www/oliophp/public_html/taggedEvents.php')
>>>>>>>>>>>> [Wed May 19 22:26:37 2010] [error] [client 10.17.255.250] ALERT
>>>>>>>>>>>> - canary mismatch on efree() - heap overflow detected (attacker
>>>>>>>>>>>> '10.17.255.250', file '/var/www/oliophp/public_html/users.php')
>>>>>>>>>>>> [Wed May 19 22:26:37 2010] [error] [client 10.17.255.250] ALERT
>>>>>>>>>>>> - canary mismatch on efree() - heap overflow detected (attacker
>>>>>>>>>>>> '10.17.255.250', file '/var/www/oliophp/public_html/events.php')
>>>>>>>>>>>> [Wed May 19 22:26:37 2010] [error] [client 10.17.255.250] ALERT
>>>>>>>>>>>> - canary mismatch on efree() - heap overflow detected (attacker
>>>>>>>>>>>> '10.17.255.250', file 
>>>>>>>>>>>> '/var/www/oliophp/public_html/taggedEvents.php')
>>>>>>>>>>>>
>>>>>>>>>>>> According to blogs it is a php related issue. Suhosin patch
>>>>>>>>>>>> detects a memory overflow and complains.
>>>>>>>>>>>> I was just wondering if the Olio php code is having any known
>>>>>>>>>>>> mem. leaks.
>>>>>>>>>>>>
>>>>>>>>>>>> My php version on ubuntu:
>>>>>>>>>>>> PHP 5.2.4-2ubuntu5 with Suhosin-Patch 0.9.6.2 (cli) (built: Feb
>>>>>>>>>>>> 27 2008 20:46:51)
>>>>>>>>>>>> Copyright (c) 1997-2007 The PHP Group
>>>>>>>>>>>> Zend Engine v2.2.0, Copyright (c) 1998-2007 Zend Technologies
>>>>>>>>>>>>
>>>>>>>>>>>> It's too bad that I do not get a line on the php files that
>>>>>>>>>>>> cause this.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Has anyone come across this one before?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>>>> Kontorinis Vasileios
>>>>>>>>>>>> Phd student, University of California San Diego
>>>>>>>>>>>> San Diego, CA 92122
>>>>>>>>>>>> Cell. phone: (858) 717 6899
>>>>>>>>>>>> [email protected], [email protected]
>>>>>>>>>>>>
>>>>>>>>>>>> -------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: ALERT - canary mismatch on efree() - heap overflow detected

Reply via email to