Re: Strange ARS Timeout Problem
***I apologize if you receive the posting twice - I tried to send the posting with two pstack output attachments (less than 1 MB in size) but couldn't.*** First of all, thanks to all who responded and provided valuable suggestions to our issue. Rejesh, It happens randomly not at a fixed time. Sometime it happened early in the morning or at night when there were just a few users, while other time it happened during the peak business hours. When it happens, it affects both web user and native client user. Fred, I did steal the script (from one of your early postings) and have put it in place after I tweaked it a little since we encountered the problem. Thanks Fred. The BMC support is really looking for at this time is the pstack output for arplugin during hanging because they think it might be arplugin causing the problem (see comments form BMC support below). I will also try to get truss and dtrace (recommended by Axton) for arserverd and plugin when it happens again Bob, It's interesting you mentioned the dispatcher thread, because BMC tech support has recommended turning on dispatcher logging. I'm going to look at the RPC-Non-Blocking-IO setting. I am attaching a couple of the pstack outputs for anyone is familiar with pstack to take a look. I have received updates from BMC support and implemented some changes: Added into ar.conf: External-Authentication-Return-Data-Capabilities: 31 Plugin-Filter-API-Threads: 4 20 Approval-RPC-Socket: 390631 Private-RPC-Socket: 390631 2 4 Updated in ar.conf: Next-ID-Block-Size from 10 to 40 Delay-Recache-Time from 5 to 120 Adjusted threads numbers: - CAI Plugin threads: Private-RPC-Socket: 390680 24 24 to:Private-RPC-Socket: 390680 16 24 - RPC Plugin Loopback threads Private-RPC-Socket: 390626 8 16to: Private-RPC-Socket: 390626 4 10 Here are some comments from BMC support: It looks that the cause of the entire problem could be arplugin not responding during that time, as in the logs we saw at least two threads who were making a call to plugin server and they are waiting for a response from plugin server, one for authentication and other for getting the information via the vendor form and if other users are ITSM users, so they would be using overview console which again use a plugin. It showed that it might be waiting from the database and in one of the other call on Thread 9 , it showed that plugin call is being made, and that is taking time, that being the reason I suggested to add External Authentication parameter, so that it don't have to authenticate for everything. In one of the call, it also showed that one escalation is triggering, that is giving a call to filter and some filter operations are performed which is creating a db entry in there and that is taking time. In the plugin log, we see the last successful CAI plugin call and after that plugin server stopped responding for some reason. Can you please check how many records you have in the CAI:Events form? Is that too many? Do you see any of the old or errored records as well? If you see the old records, will that be possible to remove those records from CAI:events forms (or you can take a backup after exporting and then delete), just incase if those are bad records or very old records. - I did clean up old records in CAI:Events and CAI:EventParams. Thanks, Eric ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
The combination of symptoms: - exactly 5 minutes of inactivity from incoming clients ( time of general LIST operation) - SQL activity still occuring ( since escalation work is timed base) Sure smells of a clogged ARserverd RPC dispatcher thread. Be sure to look into setting RPC-Non-Blocking-IO: Saw this in a previous ARSList post: http://www.mail-archive.com/arslist@arslist.org/msg34499.html Get used to what thread #1 looks like under normal operation in a 'pstack' output of your arserverd process. Then when you're encountering the hanging/unresponsiveness, take a look at pstack of the arserverd again, especially thread #1 - the dispatcher. You might see some function calls listed in the stack dealing with endofrecord searching. Send some of the pstack out to the arslist if you'd like some interpretation. Bob --- From: ZHANG, ERIC L ezh...@entergy.com To: arslist@ARSLIST.ORG Sent: Thu, January 27, 2011 2:26:01 PM Subject: Re: Strange ARS Timeout Problem ** ** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 27, 20119:18 AM Subject: Re: Strange ARS Timeout Problem Ok….I just completely re-read the original post…..all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window…..so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivity….even though the escalation server is still working…it’s at least something you can try and report back. From:Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 20117:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 20118:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven’t been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 20115:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99 USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWF OK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99 USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time
Re: Strange ARS Timeout Problem
Since you are on Solaris you should be able to run with the logs on all the time. Here is a script template you can use to save log files. Just figure out how often you wish to save the files and add a call to it from cron. #! /usr/bin/ksh # # Name:save_logs.sh # Description: Save the log file(s) via a script # AR_LOG_DIR={full path where log files are stored} AR_SAVE_DIR={full path where you want to save logs at} # cur=`date +%H%M` cd ${AR_LOG_DIR} # [ -r arsql.log ] cp arsql.log${AR_SAVE_DIR}/arsql_${cur}.log ; [ -r arfilter.log ] cp arfilter.log ${AR_SAVE_DIR}/arfilter_${cur}.log ; [ -r arapi.log ] cp arapi.log${AR_SAVE_DIR}/arapi_${cur}.log ; # cd ${AR_SAVE_DIR} # netstat -a netstat_${cur}.log # echo prstat ps_${cur}.log prstat -n25,10 -a 1 1 ps_${cur}.log echops_${cur}.log echo vmstat ps_${cur}.log vmstat 1 2 ps_${cur}.log echops_${cur}.log echo ps ps_${cur}.log ps -ef ps_${cur}.log # rm -f *${cur}.log.gz /dev/null gzip *${cur}.log # exit 0 Fred -Original Message- From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Thursday, January 27, 2011 4:26 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** ** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 27, 2011 9:18 AM Subject: Re: Strange ARS Timeout Problem Ok..I just completely re-read the original post...all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window...so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivity..even though the escalation server is still working.it's at least something you can try and report back. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 2011 7:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation
Re: Strange ARS Timeout Problem
Ok..I just completely re-read the original post...all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window...so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivity..even though the escalation server is still working.it's at least something you can try and report back. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 2011 7:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We
Re: Strange ARS Timeout Problem
** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 27, 2011 9:18 AM Subject: Re: Strange ARS Timeout Problem OkI just completely re-read the original post.all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window.so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivityeven though the escalation server is still working...it's at least something you can try and report back. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 2011 7:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf
Re: Strange ARS Timeout Problem
I'm not following how you got to broken db connections. If arerror.log does not show the sql connection dropped, it didn't. Oracle connections are stateful, meaning that if that link drops, that session is dead. If arerror.log doesn't indicate broken sessions to the db, chances are things are good there. On Thu, Jan 27, 2011 at 4:26 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** ** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- *From:* LJ LongWing [mailto:lj.longw...@gmail.com] *Sent:* Thursday, January 27, 2011 9:18 AM *Subject:* Re: Strange ARS Timeout Problem Ok….I just completely re-read the original post…..all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window…..so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivity….even though the escalation server is still working…it’s at least something you can try and report back. *From:* Action Request System discussion list(ARSList) [mailto: arslist@ARSLIST.ORG] *On Behalf Of *ZHANG, ERIC L *Sent:* Wednesday, January 26, 2011 7:19 PM *To:* arslist@ARSLIST.ORG *Subject:* Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- *From:* Axton [mailto:axton.gr...@gmail.com] *Sent:* Wednesday, January 26, 2011 8:18 AM *Subject:* Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven’t been able to identify the cause of the problem. -Original Message- *From:* Axton [mailto:axton.gr...@gmail.com] *Sent:* Monday, January 24, 2011 5:45 PM *Subject:* Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without
Re: Strange ARS Timeout Problem
Eric, You might add a nslookup command to your cron job to see if a dns lookup is failing. A dns failure will give the same ARS symptom as a network outage because it is an operation that the server must complete before communications can happen. Good luck, Dennis ZHANG, ERIC L ezh...@entergy.com Sent by: Action Request System discussion list(ARSList) arslist@ARSLIST.ORG 01/27/2011 04:26 PM Please respond to arslist@ARSLIST.ORG To arslist@ARSLIST.ORG cc Subject Re: Strange ARS Timeout Problem ** ** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 27, 2011 9:18 AM Subject: Re: Strange ARS Timeout Problem Ok?.I just completely re-read the original post?..all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window?..so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivity?.even though the escalation server is still working?it?s at least something you can try and report back. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 2011 7:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven?t been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day
Re: Strange ARS Timeout Problem
Not sure if this can help. We had the same issue as system used to give intermediate timeout issue and only similarity was the time out used to happen at a fixed time always. Is this the case with you also? ?? We found out that some views were running @ the specified time and system used to go for a full scan on the major form. No looks we found but yes the system performance used to go down. As Axton said if there is any issue with the network the you can see that in the error log itself. Not ignoring the network issue try changing the entry in the ORA file from hostname to IP. We use it in order to minimize the issue in case if there is any DNS issue also. One more thing you need to find out is the issue happening through Midtier only or From User Client or both. With Best Regards Rajesh From: Action Request System discussion list(ARSList) [mailto:arslist@arslist.org] On Behalf Of Dennis Ruble Sent: Friday, January 28, 2011 4:08 AM To: arslist@arslist.org Subject: Re: Strange ARS Timeout Problem ** Eric, You might add a nslookup command to your cron job to see if a dns lookup is failing. A dns failure will give the same ARS symptom as a network outage because it is an operation that the server must complete before communications can happen. Good luck, Dennis ZHANG, ERIC L ezh...@entergy.com Sent by: Action Request System discussion list(ARSList) arslist@ARSLIST.ORG 01/27/2011 04:26 PM Please respond to arslist@ARSLIST.ORG To arslist@ARSLIST.ORG cc Subject Re: Strange ARS Timeout Problem ** ** Good idea. I just put a cron job on the ars server that runs traceroute db_server every minute and appends the output to an output file. Waiting for the next timeout. -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 27, 2011 9:18 AM Subject: Re: Strange ARS Timeout Problem OkI just completely re-read the original post.all indications save one are that during that 5 minute interval the application server lost connectivity with the DB server. The only exception to that appears to be the escalation thread which continued processing during that 5 minute window.so, what I would do would be to setup a cron to run every 30 seconds or every minute, something along those lines that issues a tracert between your remedy server and your db server. My primary thought is that you are losing network connectivityeven though the escalation server is still working...it's at least something you can try and report back. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Wednesday, January 26, 2011 7:19 PM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.commailto:ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.commailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.commailto:ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more
Re: Strange ARS Timeout Problem
What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven’t been able to identify the cause of the problem. -Original Message- *From:* Axton [mailto:axton.gr...@gmail.com] *Sent:* Monday, January 24, 2011 5:45 PM *Subject:* Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) – Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) – Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Yes, I did initial log analysis. As I said in the original posting, there was 5-minutes gap in the api log, while no gap/waiting/error/long operation was showing in the sql log and escalation log. All the sql queries were for user AR_ESCALATOR in the sql log. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Wednesday, January 26, 2011 8:18 AM Subject: Re: Strange ARS Timeout Problem ** What do the logs say? I haven't seen that you've done analysis with the logs. Is there a gap in time in the logs (indicating the server was not doing anything)? Is there are gap in time in the logs (indicating a long operation was running? On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
We have sent BMC tech support all the logs including api, filter, sql, escalation, thread, plug-in, arfork, even pstack output that were taken during hanging, and so far they haven't been able to identify the cause of the problem. -Original Message- From: Axton [mailto:axton.gr...@gmail.com] Sent: Monday, January 24, 2011 5:45 PM Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Interesting you asked about it. When we first encountered the problem, the Cursor Sharing was already set to FORCE both in the Oracle database and ar.conf. The DBA then changed it to be EXACT in the database while we kept it FORCE in ar.conf. This change has improved the performance noticeably, especially for the user to refresh Assigned Work table on the Incident Management console. But the change didn't eliminate the timeout problem. -Original Message- From: patchsk [mailto:vamsi...@gmail.com] Sent: Sunday, January 23, 2011 1:08 PM Subject: Re: Strange ARS Timeout Problem ** What is the value for cursor sharing at the db level and on ar.conf file. Try to make it to Forced and see if it fixes this issue. _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) – Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) – Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
You can use RunMacro with (-d) debugging for all case submit, update and delete and see the result of the debugging, general speaking the setup in the application is not completed represented in the database you can try admin-tool and see wht is not in the form and not in the database such as indexes, ... etc. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of Axton Sent: Tuesday, January 25, 2011 2:45 AM To: arslist@ARSLIST.ORG Subject: Re: Strange ARS Timeout Problem ** Try to get the api, filter, and sql logs leading up to the point where it started hanging. Those are your best indicator. Also check the arerror.log for crashes. There are things that can cause behavior like this that the logs will indicate. For example, try creating a computed group during production operations, or importing a deployable application. On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.commailto:ezh...@entergy.com wrote: ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service/* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service/* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.comhttp://www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
What is the value for cursor sharing at the db level and on ar.conf file. Try to make it to Forced and see if it fixes this issue. ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
** Thanks, Mark. I did go through the escalations that were running during the timeout and couldn't find anything out of the ordinary. The escalation log shows that all the escalation were completed in a fraction of a second and no delays are showed in the sql log either. Eric -Original Message- From: Brittain, Mark [mailto:mbritt...@navisite.com] Sent: Thursday, January 20, 2011 3:30 PM Subject: Re: Strange ARS Timeout Problem Hi Eric, Couple things you might check. Have you checked the indexing against the Run If in the escalations? NULL in the Run If ignores indexing and should be avoided. If you have a time calculation is the field on one side and the calculation on the other (Create Date $TIMESTAMP$ - 3600) vs. (Create Date +3600 $TIMESTAMP$). Calculating on the field value is slower. Is there a SQL query to an external table in the Set Field action? Could be a change/cause there. Likewise is the escalation doing a set field using information from another form that you users frequently use? If so the issue might be the indexing there. These are small things that you can get away with when there is a relatively limited number of records. Then at some magic number the warts start to show. Hope this helps and good luck. Mark From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Thursday, January 20, 2011 4:11 PM To: arslist@ARSLIST.ORG Subject: Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ This e-mail is the property of NaviSite, Inc. It is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential, or otherwise protected from disclosure. Distribution or copying of this e-mail, or the information contained herein, to anyone other than the intended recipient is prohibited. _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_
Re: Strange ARS Timeout Problem
Dennis, I have been trying to get our network guys to set up sniffer to monitor the network traffic to/from ARServer. I will pass the DNS info to them. Thanks, Eric -Original Message- From: Dennis Ruble [mailto:ddru...@rockwellcollins.com] Sent: Thursday, January 20, 2011 3:32 PM Subject: Re: Strange ARS Timeout Problem ** Eric, We had a similar symptom many years back. There were 3 DNS servers configured for our AR System server. Over time the first and second ones were retired and the DNS configuration did not get updated. So, for every DNS call the system had to wait for the first and second servers to timeout before trying the third and if the third was busy everything just went to sleep while waiting for a response. We updated our DNS config and hosts file and everything returned to normal. Suppose there might also be other resources besides DNS servers that could cause the same symptom. Our network guys sniffed the network to see what we were waiting on. HTH, Dennis ZHANG, ERIC L ezh...@entergy.com Sent by: Action Request System discussion list(ARSList) arslist@ARSLIST.ORG 01/20/2011 03:10 PM Please respond to arslist@ARSLIST.ORG To arslist@ARSLIST.ORG cc Subject Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
** Yes. The DBA said there were no locks and blocks during timeouts. -Thanks, Eric -Original Message- From: LJ LongWing [mailto:lj.longw...@gmail.com] Sent: Thursday, January 20, 2011 3:40 PM Subject: Re: Strange ARS Timeout Problem Eric, Did your DBA look for any locking? We used to experience this a lot till we figure out what was happening. I could give you SQLServer instructions on how to find it...but you aren't using that...and your DBA should be able to run a query that'll tell you about things that are causing blocking and things that are blocked. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Thursday, January 20, 2011 2:11 PM To: arslist@ARSLIST.ORG Subject: Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Hi Eric, Couple things you might check. Have you checked the indexing against the Run If in the escalations? NULL in the Run If ignores indexing and should be avoided. If you have a time calculation is the field on one side and the calculation on the other (Create Date $TIMESTAMP$ - 3600) vs. (Create Date +3600 $TIMESTAMP$). Calculating on the field value is slower. Is there a SQL query to an external table in the Set Field action? Could be a change/cause there. Likewise is the escalation doing a set field using information from another form that you users frequently use? If so the issue might be the indexing there. These are small things that you can get away with when there is a relatively limited number of records. Then at some magic number the warts start to show. Hope this helps and good luck. Mark From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Thursday, January 20, 2011 4:11 PM To: arslist@ARSLIST.ORG Subject: Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service/* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service/* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ This e-mail is the property of NaviSite, Inc. It is intended only for the person or entity to which it is addressed and may contain information that is privileged, confidential, or otherwise protected from disclosure. Distribution or copying of this e-mail, or the information contained herein, to anyone other than the intended recipient is prohibited. ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Eric, We had a similar symptom many years back. There were 3 DNS servers configured for our AR System server. Over time the first and second ones were retired and the DNS configuration did not get updated. So, for every DNS call the system had to wait for the first and second servers to timeout before trying the third and if the third was busy everything just went to sleep while waiting for a response. We updated our DNS config and hosts file and everything returned to normal. Suppose there might also be other resources besides DNS servers that could cause the same symptom. Our network guys sniffed the network to see what we were waiting on. HTH, Dennis ZHANG, ERIC L ezh...@entergy.com Sent by: Action Request System discussion list(ARSList) arslist@ARSLIST.ORG 01/20/2011 03:10 PM Please respond to arslist@ARSLIST.ORG To arslist@ARSLIST.ORG cc Subject Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) ? Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) ? Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are
Re: Strange ARS Timeout Problem
Eric, Did your DBA look for any locking? We used to experience this a lot till we figure out what was happening. I could give you SQLServer instructions on how to find it.but you aren't using that.and your DBA should be able to run a query that'll tell you about things that are causing blocking and things that are blocked. From: Action Request System discussion list(ARSList) [mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L Sent: Thursday, January 20, 2011 2:11 PM To: arslist@ARSLIST.ORG Subject: Strange ARS Timeout Problem ** Hi Listers. We are experiencing intermittent timeouts with the ARS. Without me doing anything, the AR system becomes normal again after about 5 minutes. All users are getting timeout (or hourglass) but no process is being restarted in armonitor.log. This is the message showing in arerror.log: Tue Jan 18 12:09:24 2011 Dispatch : Timeout during data retrieval due to busy server -- retry the operation (server_name) ARERR - 93 Tue Jan 18 12:10:04 2011 Approve : Timeout during database query -- consider using more specific search criteria to narrow the results, and retry the operation (ARERR 94) In the API log, it shows a 5-minute gap: API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK API TID: 04 RPC ID: 00 Queue: Admin Client-RPC: 99USER: Remedy Application Service /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF ARGetListEntryWithFields -- schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address Our DBA was monitoring the database during the time and found few activities in the database. The activities shown in SQL log during the timeout were all for user AR_ESCALATOR, which means the escalation was still running during the time. This can also be verified from the escalation log. When this occurs, the CPU and RAM utilizations are dramatically dropping to the lowest levels on both the ARS server and the database server. There was no application change in the last couple of months. The problem started about two weeks ago. It could occur 3 times a day and sometimes it works fine for days without it occurring. Our configuration/environment: ARS: 7.1 patch 7 ITSM: 7.0.03 patch 9 SLM: 7.1 patch 2 SRM: 2.2 patch 4 Midtier: 7.6.03 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - Dedicated to ARServer, ITSM, SLM, and SRM. Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by customers to submit service request. Database: Oracle: 10gR2 (remote) The following are threads settings in ar.conf: Private-RPC-Socket: 390601 2 6 Private-RPC-Socket: 390603 2 2 Private-RPC-Socket: 390620 16 24 (FAST) Private-RPC-Socket: 390626 8 16 Private-RPC-Socket: 390627 2 12 Private-RPC-Socket: 390635 24 30 (LIST) Private-RPC-Socket: 390680 24 24 Private-RPC-Socket: 390693 2 4 Private-RPC-Socket: 390698 2 4 We have about 300 concurrent Remedy users during the peak hours. ARServer is running as non-root process. The number of open file descriptors for arserverd (~700) was well below the ulimit 3072. The FAST and LIST threads never reached the maximums. I have an open ticket with BMC Support but thought I might get a solution quicker from the Arslist here. Thanks, Eric _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ ___ UNSUBSCRIBE or access ARSlist Archives at www.arslist.org attend wwrug11 www.wwrug.com ARSList: Where the Answers Are