Howdy,
This issue was not fixed in 4.2.176. This wasn't an already identified
bug as far as Cisco revealed to us, they tried to blame our controller
crashes on an possible SSH problem (which they've fed to us before),
but we were adamant about them finding the real reason this time since
it occurred only on our primary cluster of controllers.
We have standalone controllers that were not members of the main
mobility group that had sustained uptime for months between
maintenance/reboot, and other backup controllers with little to no
usage that also did not suffer the same memory leak. We also disabled
a number of controller features at cisco's request to help narrow the
issue. The memory leak was most prominent on controllers loaded to
near capacity (100-150 APs) that were members of our large mobility
group and can really only be seen by comparing historical memory usage
information across weeks depending on user client load.
I have found the two following bug fixes in the release notes for
4.2.205 that may or may not be our fix. Our specific tac case was SR
610781025. I'm not sure if you can view that SR w/o being me, but you
can try. We started troubleshooting this issue in February 2009 and
the bugfix was identified and released to us for testing in a developer
firmware in May and shortly thereafter in 4.2.205 also in May 2009.
CSCsm37204—Controller memory usage no longer
grows to 44% after seven days of uptime.
CSCsy50654—Controllers no longer undergo
memory leaks caused by incrementing of the mmlisten process.
I spent the better part of 3 months tracking and troubleshooting this
bug and I'm positive it wasn't my imagination, you can monitor your
memory usage via Airwave or WCS 6.x. Besides those two management
consoles, there is not a simple method that I'm aware of to see your
controller's memory usage w/ historical reference in 4.2.176 or
earlier.
If anyone requires additional information, or would like to see graphs
that show the memory leak "in action" please let me know and I will try
my best to accomodate.
-Justin Hao
Bentley, Douglas wrote:
I believe they fixed the memory issue in 4.2.176 but broke other
functionality. According to Cisco, the memory fix was in 4.2.176.
I could be wrong. I cannot find any memory bugs in 4.2.176 only in
4.2.130.
-----Original Message-----
From: The EDUCAUSE Wireless Issues Constituent Group Listserv
[mailto:[email protected]] On Behalf Of Brandon Pinsky
Sent: Tuesday, June 30, 2009 10:52 AM
To: [email protected]
Subject: Re: [WIRELESS-LAN] 3750G and 1131
Oh, OK. Sounds like Justin has a memory leak issue with 4.2.176
though. Is that correct Justin?
On Jun 30, 2009, at 10:27 AM, Bentley, Douglas wrote:
CSCsq87549/CSCsq63106(memory leaks) Sorry that is the issue within
4.2.130
CSCsx07878-Client devices no longer intermittently fail to log into
a WLAN with web authentication (webauth). This one broke us in
4.2.176. We are using the internal web re-direct for
authentication. This does not work properly in 4.2.176.
http://www.cisco.com/en/US/docs/wireless/controller/release/notes/crn422
05D3MR4.html
The fix for this is in 4.2.205.
Sorry for the confusion.
-Doug
Douglas Bentley | University IT/NC Network Engineering
University of Rochester | 727 Elmwood Ave.| Rochester, NY 14620
T: 585.275.6550 | Email: [email protected]
<image003.jpg>
-----Original Message-----
From: The EDUCAUSE Wireless Issues Constituent Group Listserv
[mailto:[email protected]
] On Behalf Of Brandon Pinsky
Sent: Tuesday, June 30, 2009 10:05 AM
To: [email protected]
Subject: Re: [WIRELESS-LAN] 3750G and 1131
Does anyone have a BugID for the memory leak issue in 4.2.176?
Thanks,
--------------
B.J. Pinsky
Manager, Core Resources
New York Presbyterian Hospital
Columbia University Medical Center
(o): 212-305-9021
(m): 917-626-9485
[email protected]
630 W. 168th Street
PH18-126
NY, NY 10032
On Jun 30, 2009, at 8:14 AM, Bentley, Douglas wrote:
I completely agree with the 4.2.176 statement. We are having the
same issue with the memory leak on our 14 controllers. After a
month of testing, we are upgrading to 4.2.205 at 5am tomorrow.
-Doug
Douglas Bentley | University IT/NC Network Engineering
University of Rochester | 727 Elmwood Ave.| Rochester, NY 14620
T: 585.275.6550 | Email: [email protected]
<image002.jpg>
From: The EDUCAUSE Wireless Issues Constituent Group Listserv
[mailto:[email protected]
] On Behalf Of Justin Hao
Sent: Monday, June 29, 2009 11:50 PM
To: [email protected]
Subject: Re: [WIRELESS-LAN] 3750G and 1131
Howdy,
regarding 4.2.176 i can absolutely positively NOT recommend it as a
stable code base, 4.2.205 is a direct release containing a critical
bugfix that we had a direct hand in "motivating" cisco to fix
promptly.
4.2.176 contains a memory leak in involving mobility groups of size
(our setup is approximately 1700 APs with 18 controllers). This
memory leak is fatal, the general gist is that stateful information
regarding mobility clients isn't released properly from memory as
clients come/go, thus driving memory usage on controllers constantly
up until they crash ungracefully with little or no crash/log
information, controllers also take an inordinate (20+ minutes)
amount of time to recover from the crash, i presume because they're
attempting to log crash data and there isn't any memory available
for that. We observed controllers cycle to 98-99% memory every 8
weeks until they crashed. Working with the dev engineer they tested
and monitored two of our controllers for several weeks to identify
the issue. I don't know the "critical" size or number of APs/
controllers required to have this be a noticeable issue, but
ungraceful controller reboots occurred every 8 weeks with our setup.
We are currently running 4.2.205 and haven't experienced an "event"
anymore, so i'd recommend 4.2.205 if you need to run 4.2.x code, we
haven't had any experience with any 5.x code at all at this time,
since cisco pulled support for the 10xx series APs from anything
newer than 4.x.
--------
Justin Hao
Texas A&M University
Network Group
(979)862-2162
[email protected]
--------
On Jun 29, 2009, at 7:50 PM, Mike King wrote:
Both 4.2.205 and 4.2.176 have been designated as:
AssureWave The Cisco(r) AssureWave program focuses on satisfying
customer quality requirements in critical vertical markets in the
wireless space. This program links and expands on product testing
conducted within development engineering, regression testing, and
systems test groups within Cisco. AssureWave certification marks the
successful completion of extensive integrity testing that validates
targeted releases. In addition, Cisco's AssureWaveprogram ensures
compatibility with major device and application vendors through
established partnerships. Cisco's partners perform extensive
testing at their own facilities to ensure broader interoperability
with our ongoing new releases. Learn More
So I'd feel safe with them. No higher releases have been certified
under this program..
(Assurewave replace Safeharbor testing for wireless)
On Mon, Jun 29, 2009 at 4:31 PM, Pham, Loc <[email protected]>
wrote:
Guys: I am ready to config and test 3750G on a 4.2.205 code
Beside this bug, any other good stuffs that I should look for ?
CSCsu02630
I am also ( tempted ) to go with the 5.2 for CWAPP functionality if
time permit....
Regards,
Loc Pham, CCIE # 17030 - Sr. Network Staff,
IT Enterprise Security & Services, UCSF Medical Center.
Office 415-353-4492 Pager 415-443-9014
********** Participation and subscription information for this
EDUCAUSE Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
********** Participation and subscription information for this
EDUCAUSE Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
********** Participation and subscription information for this
EDUCAUSE Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
********** Participation and subscription information for this
EDUCAUSE Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
--------------------
This electronic message is intended to be for the use only of the
named recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are hereby
notified that any disclosure, copying, distribution or use of the
contents of this message is strictly prohibited. If you have
received this message in error or are not the named recipient,
please notify us immediately by contacting the sender at the
electronic mail address noted above, and delete and destroy all
copies of this message. Thank you.
**********
Participation and subscription information for this EDUCAUSE
Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
********** Participation and subscription information for this
EDUCAUSE Constituent Group discussion list can be found at
http://www.educause.edu/groups/
.
--------------------
This electronic message is intended to be for the use only of the named
recipient, and may contain information that is confidential or
privileged. If you are not the intended recipient, you are hereby
notified that any disclosure, copying, distribution or use of the
contents of this message is strictly prohibited. If you have received
this message in error or are not the named recipient, please notify us
immediately by contacting the sender at the electronic mail address
noted above, and delete and destroy all copies of this message. Thank
you.
**********
Participation and subscription information for this EDUCAUSE Constituent
Group discussion list can be found at http://www.educause.edu/groups/.
**********
Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.
--
Justin Hao
Network Engineer
Texas A&M University
Networking and Information Security
[email protected]
(979)862-2162
**********
Participation and subscription information for this EDUCAUSE Constituent Group discussion list can be found at http://www.educause.edu/groups/.
|