Re: Strange ARS Timeout Problem

2011-01-31 Thread ZHANG, ERIC L
***I apologize if you receive the posting twice - I tried to send the
posting with two pstack output attachments (less than 1 MB in size) but
couldn't.***

 

First of all, thanks to all who responded and provided valuable
suggestions to our issue.

 

Rejesh,

 

It happens randomly not at a fixed time. Sometime it happened early in
the morning or at night when there were just a few users, while other
time it happened during the peak business hours. When it happens, it
affects both web user and native client user.

 

Fred,

 

I did steal the script (from one of your early postings) and have put it
in place after I tweaked it a little since we encountered the problem.
Thanks Fred.

 

The BMC support is really looking for at this time is the pstack output
for arplugin during hanging because they think it might be arplugin
causing the problem (see comments form BMC support below).  I will also
try to get truss and dtrace (recommended by Axton) for arserverd and
plugin when it happens again 

 

Bob,

 

It's interesting you mentioned the dispatcher thread, because BMC tech
support has recommended turning on dispatcher logging.  I'm going to
look at the RPC-Non-Blocking-IO setting.  I am attaching a couple of the
pstack outputs for anyone is familiar with pstack to take a look.

 

 

I have received updates from BMC support and implemented some changes:

 

Added into ar.conf:

 

External-Authentication-Return-Data-Capabilities: 31

Plugin-Filter-API-Threads: 4 20

Approval-RPC-Socket: 390631

Private-RPC-Socket:  390631   2   4

 

Updated in ar.conf:

 

Next-ID-Block-Size from 10 to 40

Delay-Recache-Time from 5 to 120

 

Adjusted threads numbers:

 

- CAI Plugin threads: 

Private-RPC-Socket:  390680  24  24   to:Private-RPC-Socket:
390680  16  24

 

- RPC Plugin Loopback threads

Private-RPC-Socket:  390626   8  16to:
Private-RPC-Socket:  390626   4  10

 

 

Here are some comments from BMC support:

 

It looks that the cause of the entire problem could be arplugin not
responding during that time, as in the logs we saw at least two threads
who were making a call to plugin server and they are waiting for a
response from plugin server, one for authentication and other for
getting the information via the vendor form and if other users are ITSM
users, so they would be using overview console which again use a
plugin.

 

It showed that it might be waiting from the database and in one of the
other call on Thread 9 , it showed that plugin call is being made, and
that is taking time, that being the reason I suggested to add External
Authentication parameter, so that it don't have to authenticate for
everything.

 

In one of the call, it also showed that one escalation is triggering,
that is giving a call to filter and some filter operations are performed
which is creating a db entry in there and that is taking time.

 

In the plugin log, we see the last successful CAI plugin call and after
that plugin server stopped responding for some reason. Can you please
check how many records you have in the CAI:Events form? Is that too
many? Do you see any of the old or errored records as well?  If you see
the old records, will that be possible to remove those records from
CAI:events forms (or you can take a backup after exporting and then
delete), just incase if those are bad records or very old records.   -
I did clean up old records in CAI:Events and CAI:EventParams.

 

 

Thanks,

Eric

 


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-29 Thread Bob Weiman
The combination of symptoms:
- exactly 5 minutes of inactivity from incoming clients ( time of general  LIST 
operation)
- SQL activity still occuring ( since escalation work is timed base)

Sure smells of a clogged ARserverd RPC dispatcher thread.
Be sure to look into setting RPC-Non-Blocking-IO:

Saw this in a previous ARSList post:
http://www.mail-archive.com/arslist@arslist.org/msg34499.html

Get used to what thread #1 looks like under normal operation in a 'pstack' 
output of your arserverd process.
Then when you're encountering the hanging/unresponsiveness, take a look at 
pstack of the arserverd again, especially thread #1 - the dispatcher.
You might see some function calls listed in the stack dealing with 
endofrecord 
searching.
Send some of the pstack out to the arslist if you'd like some interpretation.

Bob ---



From: ZHANG, ERIC L ezh...@entergy.com
To: arslist@ARSLIST.ORG
Sent: Thu, January 27, 2011 2:26:01 PM
Subject: Re: Strange ARS Timeout Problem

** ** 
Good idea.  I just put a cron job on the ars server that runs traceroute 
db_server every minute and appends the output to an output file. Waiting for 
the next timeout.
 
-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com] 
Sent: Thursday, January 27, 20119:18 AM
Subject: Re: Strange ARS Timeout Problem
 
Ok….I just completely re-read the original post…..all indications save one are 
that during that 5 minute interval the application server lost connectivity 
with 
the DB server.  The only exception to that appears to be the escalation thread 
which continued processing during that 5 minute window…..so, what I would do 
would be to setup a cron to run every 30 seconds or every minute, something 
along those lines that issues a tracert between your remedy server and your db 
server.  My primary thought is that you are losing network connectivity….even 
though the escalation server is still working…it’s at least something you can 
try and report back.
 
From:Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] 
On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 20117:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem
 
** 
Yes, I did initial log analysis. As I said in the original posting, there was 
5-minutes gap in the api log, while no gap/waiting/error/long operation was 
showing in the sql log and escalation log. All the sql queries were for user 
AR_ESCALATOR in the sql log.
 
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 20118:18 AM
Subject: Re: Strange ARS Timeout Problem
 
** What do the logs say?  I haven't seen that you've done analysis with the 
logs.  Is there a gap in time in the logs (indicating the server was not doing 
anything)?  Is there are gap in time in the logs (indicating a long operation 
was running?
On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
We have sent BMC tech support all the logs including api, filter, sql, 
escalation, thread, plug-in, arfork, even pstack output that were taken during 
hanging, and so far they haven’t been able to identify the cause of the problem.
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 20115:45 PM
Subject: Re: Strange ARS Timeout Problem
 
** Try to get the api, filter, and sql logs leading up to the point where it 
started hanging.  Those are your best indicator.  Also check the arerror.log 
for 
crashes.
 
There are things that can cause behavior like this that the logs will indicate. 
 For example, try creating a computed group during production operations, or 
importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
Hi Listers.
 
We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All users 
are getting timeout (or hourglass) but no process is being restarted in 
armonitor.log. 

 
This is the message showing in arerror.log:
 
Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to busy 
server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- consider 
using more specific search criteria to narrow the results, and retry the 
operation (ARERR 94)
 
In the API log, it shows a 5-minute gap:
 
API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99    USER: Remedy Application Service    /* Tue Jan 18 
2011 12:06:16.2224 */-GLEWF    OK
API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99    USER: Remedy Application Service    /* Tue Jan 18 
2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields -- schema OBJSTR:Class 
from Unidentified Client (protocol 12) at IP address
 
Our DBA was monitoring the database during the time

Re: Strange ARS Timeout Problem

2011-01-28 Thread Grooms, Frederick W
Since you are on Solaris you should be able to run with the logs on all the 
time.  Here is a script template you can use to save log files.  Just figure 
out how often you wish to save the files and add a call to it from cron.

   #! /usr/bin/ksh   
   #   
      
   #  Name:save_logs.sh   
   #  Description: Save the log file(s) via a script   
      
   #   
   AR_LOG_DIR={full path where log files are stored}   
   AR_SAVE_DIR={full path where you want to save logs at}   
   #   
   cur=`date +%H%M`   
   cd ${AR_LOG_DIR}   
   #   
[ -r arsql.log ] cp arsql.log${AR_SAVE_DIR}/arsql_${cur}.log ;   
[ -r arfilter.log ]  cp arfilter.log ${AR_SAVE_DIR}/arfilter_${cur}.log ; 
  
[ -r arapi.log ] cp arapi.log${AR_SAVE_DIR}/arapi_${cur}.log ;   
   #   
   cd ${AR_SAVE_DIR}   
   #   
   netstat -a  netstat_${cur}.log   
   #   
   echo prstat  ps_${cur}.log   
   prstat -n25,10 -a 1 1  ps_${cur}.log   
   echops_${cur}.log   
   echo vmstat  ps_${cur}.log   
   vmstat 1 2  ps_${cur}.log   
   echops_${cur}.log   
   echo ps  ps_${cur}.log   
   ps -ef  ps_${cur}.log   
   #   
   rm -f *${cur}.log.gz  /dev/null   
   gzip *${cur}.log   
   #   
   exit 0   

Fred

-Original Message-
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 27, 2011 4:26 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem

** ** 
Good idea.  I just put a cron job on the ars server that runs traceroute 
db_server every minute and appends the output to an output file. Waiting for 
the next timeout.
 
-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com] 
Sent: Thursday, January 27, 2011 9:18 AM
Subject: Re: Strange ARS Timeout Problem
 
Ok..I just completely re-read the original post...all indications save one are 
that during that 5 minute interval the application server lost connectivity 
with the DB server.  The only exception to that appears to be the escalation 
thread which continued processing during that 5 minute window...so, what I 
would do would be to setup a cron to run every 30 seconds or every minute, 
something along those lines that issues a tracert between your remedy server 
and your db server.  My primary thought is that you are losing network 
connectivity..even though the escalation server is still working.it's at least 
something you can try and report back.
 
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem
 
** 
Yes, I did initial log analysis. As I said in the original posting, there was 
5-minutes gap in the api log, while no gap/waiting/error/long operation was 
showing in the sql log and escalation log. All the sql queries were for user 
AR_ESCALATOR in the sql log.
 
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem
 
** What do the logs say?  I haven't seen that you've done analysis with the 
logs.  Is there a gap in time in the logs (indicating the server was not doing 
anything)?  Is there are gap in time in the logs (indicating a long operation 
was running?
On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
We have sent BMC tech support all the logs including api, filter, sql, 
escalation, thread, plug-in, arfork, even pstack output that were taken during 
hanging, and so far they haven't been able to identify the cause of the problem.
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem
 
** Try to get the api, filter, and sql logs leading up to the point where it 
started hanging.  Those are your best indicator.  Also check the arerror.log 
for crashes.
 
There are things that can cause behavior like this that the logs will indicate. 
 For example, try creating a computed group during production operations, or 
importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
Hi Listers.
 
We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All users 
are getting timeout (or hourglass) but no process is being restarted in 
armonitor.log. 
 
This is the message showing in arerror.log:
 
Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to busy 
server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- consider 
using more specific search criteria to narrow the results, and retry the 
operation

Re: Strange ARS Timeout Problem

2011-01-27 Thread LJ LongWing
Ok..I just completely re-read the original post...all indications save one
are that during that 5 minute interval the application server lost
connectivity with the DB server.  The only exception to that appears to be
the escalation thread which continued processing during that 5 minute
window...so, what I would do would be to setup a cron to run every 30
seconds or every minute, something along those lines that issues a tracert
between your remedy server and your db server.  My primary thought is that
you are losing network connectivity..even though the escalation server is
still working.it's at least something you can try and report back.

 

From: Action Request System discussion list(ARSList)
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem

 

** 

Yes, I did initial log analysis. As I said in the original posting, there
was 5-minutes gap in the api log, while no gap/waiting/error/long operation
was showing in the sql log and escalation log. All the sql queries were for
user AR_ESCALATOR in the sql log.

 

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem

 

** What do the logs say?  I haven't seen that you've done analysis with the
logs.  Is there a gap in time in the logs (indicating the server was not
doing anything)?  Is there are gap in time in the logs (indicating a long
operation was running?

On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

** 

We have sent BMC tech support all the logs including api, filter, sql,
escalation, thread, plug-in, arfork, even pstack output that were taken
during hanging, and so far they haven't been able to identify the cause of
the problem.

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem

 

** Try to get the api, filter, and sql logs leading up to the point where it
started hanging.  Those are your best indicator.  Also check the arerror.log
for crashes.

 

There are things that can cause behavior like this that the logs will
indicate.  For example, try creating a computed group during production
operations, or importing a deployable application.

On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being restarted
in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few activities
in the database. The activities shown in SQL log during the timeout were all
for user AR_ESCALATOR, which means the escalation was still running during
the time. This can also be verified from the escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping to
the lowest levels on both the ARS server and the database server. There was
no application change in the last couple of months. The problem started
about two weeks ago. It could occur 3 times a day and sometimes it works
fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) -
Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by
customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We

Re: Strange ARS Timeout Problem

2011-01-27 Thread ZHANG, ERIC L
** 

Good idea.  I just put a cron job on the ars server that runs traceroute
db_server every minute and appends the output to an output file.
Waiting for the next timeout.

 

-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com] 
Sent: Thursday, January 27, 2011 9:18 AM
Subject: Re: Strange ARS Timeout Problem

 

OkI just completely re-read the original post.all indications
save one are that during that 5 minute interval the application server
lost connectivity with the DB server.  The only exception to that
appears to be the escalation thread which continued processing during
that 5 minute window.so, what I would do would be to setup a cron to
run every 30 seconds or every minute, something along those lines that
issues a tracert between your remedy server and your db server.  My
primary thought is that you are losing network connectivityeven
though the escalation server is still working...it's at least something
you can try and report back.

 

From: Action Request System discussion list(ARSList)
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem

 

** 

Yes, I did initial log analysis. As I said in the original posting,
there was 5-minutes gap in the api log, while no gap/waiting/error/long
operation was showing in the sql log and escalation log. All the sql
queries were for user AR_ESCALATOR in the sql log.

 

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem

 

** What do the logs say?  I haven't seen that you've done analysis with
the logs.  Is there a gap in time in the logs (indicating the server was
not doing anything)?  Is there are gap in time in the logs (indicating a
long operation was running?

On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com
wrote:

** 

We have sent BMC tech support all the logs including api, filter, sql,
escalation, thread, plug-in, arfork, even pstack output that were taken
during hanging, and so far they haven't been able to identify the cause
of the problem.

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem

 

** Try to get the api, filter, and sql logs leading up to the point
where it started hanging.  Those are your best indicator.  Also check
the arerror.log for crashes.

 

There are things that can cause behavior like this that the logs will
indicate.  For example, try creating a computed group during production
operations, or importing a deployable application.

On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com
wrote:

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf

Re: Strange ARS Timeout Problem

2011-01-27 Thread Axton
I'm not following how you got to broken db connections.  If arerror.log does
not show the sql connection dropped, it didn't.  Oracle connections are
stateful, meaning that if that link drops, that session is dead.  If
arerror.log doesn't indicate broken sessions to the db, chances are things
are good there.

On Thu, Jan 27, 2011 at 4:26 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 ** **

 Good idea.  I just put a cron job on the ars server that runs traceroute
 db_server every minute and appends the output to an output file. Waiting
 for the next timeout.



 -Original Message-
 *From:* LJ LongWing [mailto:lj.longw...@gmail.com]
 *Sent:* Thursday, January 27, 2011 9:18 AM

 *Subject:* Re: Strange ARS Timeout Problem



 Ok….I just completely re-read the original post…..all indications save one
 are that during that 5 minute interval the application server lost
 connectivity with the DB server.  The only exception to that appears to be
 the escalation thread which continued processing during that 5 minute
 window…..so, what I would do would be to setup a cron to run every 30
 seconds or every minute, something along those lines that issues a tracert
 between your remedy server and your db server.  My primary thought is that
 you are losing network connectivity….even though the escalation server is
 still working…it’s at least something you can try and report back.



 *From:* Action Request System discussion list(ARSList) [mailto:
 arslist@ARSLIST.ORG] *On Behalf Of *ZHANG, ERIC L
 *Sent:* Wednesday, January 26, 2011 7:19 PM

 *To:* arslist@ARSLIST.ORG

 *Subject:* Re: Strange ARS Timeout Problem



 **

 Yes, I did initial log analysis. As I said in the original posting, there
 was 5-minutes gap in the api log, while no gap/waiting/error/long operation
 was showing in the sql log and escalation log. All the sql queries were for
 user AR_ESCALATOR in the sql log.





 -Original Message-
 *From:* Axton [mailto:axton.gr...@gmail.com]
 *Sent:* Wednesday, January 26, 2011 8:18 AM
 *Subject:* Re: Strange ARS Timeout Problem



 ** What do the logs say?  I haven't seen that you've done analysis with
 the logs.  Is there a gap in time in the logs (indicating the server was not
 doing anything)?  Is there are gap in time in the logs (indicating a long
 operation was running?

 On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 **

 We have sent BMC tech support all the logs including api, filter, sql,
 escalation, thread, plug-in, arfork, even pstack output that were taken
 during hanging, and so far they haven’t been able to identify the cause of
 the problem.



 -Original Message-
 *From:* Axton [mailto:axton.gr...@gmail.com]
 *Sent:* Monday, January 24, 2011 5:45 PM
 *Subject:* Re: Strange ARS Timeout Problem



 ** Try to get the api, filter, and sql logs leading up to the point where
 it started hanging.  Those are your best indicator.  Also check the
 arerror.log for crashes.



 There are things that can cause behavior like this that the logs will
 indicate.  For example, try creating a computed group during production
 operations, or importing a deployable application.

 On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 **

 Hi Listers.



 We are experiencing intermittent timeouts with the ARS. Without me doing
 anything, the AR system becomes normal again after about 5 minutes. All
 users are getting timeout (or hourglass) but no process is being restarted
 in armonitor.log.



 This is the message showing in arerror.log:



 Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
 busy server -- retry the operation (server_name)  ARERR - 93

 Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
 consider using more specific search criteria to narrow the results, and
 retry the operation (ARERR 94)



 In the API log, it shows a 5-minute gap:



 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
 schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address



 Our DBA was monitoring the database during the time and found few
 activities in the database. The activities shown in SQL log during the
 timeout were all for user AR_ESCALATOR, which means the escalation was still
 running during the time. This can also be verified from the escalation log.



 When this occurs, the CPU and RAM utilizations are dramatically dropping to
 the lowest levels on both the ARS server and the database server. There was
 no application change in the last couple of months. The problem started
 about two weeks ago. It could occur 3 times a day and sometimes it works
 fine for days without

Re: Strange ARS Timeout Problem

2011-01-27 Thread Dennis Ruble
Eric,
You might add a nslookup command to your cron job to see if a dns lookup 
is failing.  A dns failure will give the same ARS symptom as a network 
outage because it is an operation that the server must complete before 
communications can happen. 

Good luck,
Dennis





ZHANG, ERIC L ezh...@entergy.com 
Sent by: Action Request System discussion list(ARSList) 
arslist@ARSLIST.ORG
01/27/2011 04:26 PM
Please respond to
arslist@ARSLIST.ORG


To
arslist@ARSLIST.ORG
cc

Subject
Re: Strange ARS Timeout Problem






** ** 
Good idea.  I just put a cron job on the ars server that runs traceroute 
db_server every minute and appends the output to an output file. Waiting 
for the next timeout.
 
-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com] 
Sent: Thursday, January 27, 2011 9:18 AM
Subject: Re: Strange ARS Timeout Problem
 
Ok?.I just completely re-read the original post?..all indications save one 
are that during that 5 minute interval the application server lost 
connectivity with the DB server.  The only exception to that appears to be 
the escalation thread which continued processing during that 5 minute 
window?..so, what I would do would be to setup a cron to run every 30 
seconds or every minute, something along those lines that issues a tracert 
between your remedy server and your db server.  My primary thought is that 
you are losing network connectivity?.even though the escalation server is 
still working?it?s at least something you can try and report back.
 
From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem
 
** 
Yes, I did initial log analysis. As I said in the original posting, there 
was 5-minutes gap in the api log, while no gap/waiting/error/long 
operation was showing in the sql log and escalation log. All the sql 
queries were for user AR_ESCALATOR in the sql log.
 
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem
 
** What do the logs say?  I haven't seen that you've done analysis with 
the logs.  Is there a gap in time in the logs (indicating the server was 
not doing anything)?  Is there are gap in time in the logs (indicating a 
long operation was running?
On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
We have sent BMC tech support all the logs including api, filter, sql, 
escalation, thread, plug-in, arfork, even pstack output that were taken 
during hanging, and so far they haven?t been able to identify the cause of 
the problem.
 
-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem
 
** Try to get the api, filter, and sql logs leading up to the point where 
it started hanging.  Those are your best indicator.  Also check the 
arerror.log for crashes.
 
There are things that can cause behavior like this that the logs will 
indicate.  For example, try creating a computed group during production 
operations, or importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:
** 
Hi Listers.
 
We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All 
users are getting timeout (or hourglass) but no process is being restarted 
in armonitor.log. 
 
This is the message showing in arerror.log:
 
Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to 
busy server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- 
consider using more specific search criteria to narrow the results, and 
retry the operation (ARERR 94)
 
In the API log, it shows a 5-minute gap:
 
API  TID: 04 RPC ID: 00 Queue: Admin  
Client-RPC: 99USER: Remedy Application Service   /* Tue Jan 18 
2011 12:06:16.2224 */-GLEWFOK
API  TID: 04 RPC ID: 00 Queue: Admin  
Client-RPC: 99USER: Remedy Application Service   /* Tue Jan 18 
2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields -- schema 
OBJSTR:Class from Unidentified Client (protocol 12) at IP address
 
Our DBA was monitoring the database during the time and found few 
activities in the database. The activities shown in SQL log during the 
timeout were all for user AR_ESCALATOR, which means the escalation was 
still running during the time. This can also be verified from the 
escalation log.
 
When this occurs, the CPU and RAM utilizations are dramatically dropping 
to the lowest levels on both the ARS server and the database server. There 
was no application change in the last couple of months. The problem 
started about two weeks ago. It could occur 3 times a day

Re: Strange ARS Timeout Problem

2011-01-27 Thread Nair, Rajesh SISPL
Not sure if this can help.
We had the same issue as system used to give intermediate timeout issue and 
only similarity was the time out used to happen at a fixed time always.
Is this the case with you also?  ??

We found out that some views were running @ the specified time and system used 
to go for a full scan on the major form. No looks we found but yes the system 
performance used to go down.

As Axton said if there is any issue with the network the you can see that in 
the error log itself.

Not ignoring the network issue try changing the entry in the ORA file from 
hostname to IP. We use it in order to minimize the issue in case if there is 
any DNS issue also.

One more thing you need to find out is the issue happening through Midtier only 
or From User Client  or both.

With Best Regards

Rajesh


From: Action Request System discussion list(ARSList) 
[mailto:arslist@arslist.org] On Behalf Of Dennis Ruble
Sent: Friday, January 28, 2011 4:08 AM
To: arslist@arslist.org
Subject: Re: Strange ARS Timeout Problem

**
Eric,
You might add a nslookup command to your cron job to see if a dns lookup is 
failing.  A dns failure will give the same ARS symptom as a network outage 
because it is an operation that the server must complete before communications 
can happen.

Good luck,
Dennis



ZHANG, ERIC L ezh...@entergy.com
Sent by: Action Request System discussion list(ARSList) arslist@ARSLIST.ORG

01/27/2011 04:26 PM
Please respond to
arslist@ARSLIST.ORG


To

arslist@ARSLIST.ORG

cc



Subject

Re: Strange ARS Timeout Problem










** **
Good idea.  I just put a cron job on the ars server that runs traceroute 
db_server every minute and appends the output to an output file. Waiting for 
the next timeout.

-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com]
Sent: Thursday, January 27, 2011 9:18 AM
Subject: Re: Strange ARS Timeout Problem

OkI just completely re-read the original post.all indications save one 
are that during that 5 minute interval the application server lost connectivity 
with the DB server.  The only exception to that appears to be the escalation 
thread which continued processing during that 5 minute window.so, what I 
would do would be to setup a cron to run every 30 seconds or every minute, 
something along those lines that issues a tracert between your remedy server 
and your db server.  My primary thought is that you are losing network 
connectivityeven though the escalation server is still working...it's at 
least something you can try and report back.

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Wednesday, January 26, 2011 7:19 PM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem

**
Yes, I did initial log analysis. As I said in the original posting, there was 
5-minutes gap in the api log, while no gap/waiting/error/long operation was 
showing in the sql log and escalation log. All the sql queries were for user 
AR_ESCALATOR in the sql log.


-Original Message-
From: Axton [mailto:axton.gr...@gmail.com]
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem

** What do the logs say?  I haven't seen that you've done analysis with the 
logs.  Is there a gap in time in the logs (indicating the server was not doing 
anything)?  Is there are gap in time in the logs (indicating a long operation 
was running?
On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L 
ezh...@entergy.commailto:ezh...@entergy.com wrote:
**
We have sent BMC tech support all the logs including api, filter, sql, 
escalation, thread, plug-in, arfork, even pstack output that were taken during 
hanging, and so far they haven't been able to identify the cause of the problem.

-Original Message-
From: Axton [mailto:axton.gr...@gmail.commailto:axton.gr...@gmail.com]
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem

** Try to get the api, filter, and sql logs leading up to the point where it 
started hanging.  Those are your best indicator.  Also check the arerror.log 
for crashes.

There are things that can cause behavior like this that the logs will indicate. 
 For example, try creating a computed group during production operations, or 
importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L 
ezh...@entergy.commailto:ezh...@entergy.com wrote:
**
Hi Listers.

We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All users 
are getting timeout (or hourglass) but no process is being restarted in 
armonitor.log.

This is the message showing in arerror.log:

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to busy 
server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- consider 
using more

Re: Strange ARS Timeout Problem

2011-01-26 Thread Axton
What do the logs say?  I haven't seen that you've done analysis with the
logs.  Is there a gap in time in the logs (indicating the server was not
doing anything)?  Is there are gap in time in the logs (indicating a long
operation was running?

On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 **

 We have sent BMC tech support all the logs including api, filter, sql,
 escalation, thread, plug-in, arfork, even pstack output that were taken
 during hanging, and so far they haven’t been able to identify the cause of
 the problem.



 -Original Message-
 *From:* Axton [mailto:axton.gr...@gmail.com]
 *Sent:* Monday, January 24, 2011 5:45 PM
 *Subject:* Re: Strange ARS Timeout Problem



 ** Try to get the api, filter, and sql logs leading up to the point where
 it started hanging.  Those are your best indicator.  Also check the
 arerror.log for crashes.



 There are things that can cause behavior like this that the logs will
 indicate.  For example, try creating a computed group during production
 operations, or importing a deployable application.

 On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 **

 Hi Listers.



 We are experiencing intermittent timeouts with the ARS. Without me doing
 anything, the AR system becomes normal again after about 5 minutes. All
 users are getting timeout (or hourglass) but no process is being restarted
 in armonitor.log.



 This is the message showing in arerror.log:



 Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
 busy server -- retry the operation (server_name)  ARERR - 93

 Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
 consider using more specific search criteria to narrow the results, and
 retry the operation (ARERR 94)



 In the API log, it shows a 5-minute gap:



 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
 schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address



 Our DBA was monitoring the database during the time and found few
 activities in the database. The activities shown in SQL log during the
 timeout were all for user AR_ESCALATOR, which means the escalation was still
 running during the time. This can also be verified from the escalation log.



 When this occurs, the CPU and RAM utilizations are dramatically dropping to
 the lowest levels on both the ARS server and the database server. There was
 no application change in the last couple of months. The problem started
 about two weeks ago. It could occur 3 times a day and sometimes it works
 fine for days without it occurring.



 Our configuration/environment:



 ARS: 7.1 patch 7

 ITSM: 7.0.03 patch 9

 SLM: 7.1 patch 2

 SRM: 2.2 patch 4

 Midtier: 7.6.03



 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) –
 Dedicated to ARServer, ITSM, SLM, and SRM.

 Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) – Used only
 by customers to submit service request.

 Database: Oracle: 10gR2 (remote)



 The following are threads settings in ar.conf:



 Private-RPC-Socket:  390601   2   6

 Private-RPC-Socket:  390603   2   2

 Private-RPC-Socket:  390620  16  24  (FAST)

 Private-RPC-Socket:  390626   8  16

 Private-RPC-Socket:  390627   2  12

 Private-RPC-Socket:  390635  24  30  (LIST)

 Private-RPC-Socket:  390680  24  24

 Private-RPC-Socket:  390693   2   4

 Private-RPC-Socket:  390698   2   4



 We have about 300 concurrent Remedy users during the peak hours. ARServer
 is running as non-root process. The number of open file descriptors for
 arserverd (~700) was well below the ulimit 3072.  The FAST and LIST threads
 never reached the maximums.



 I have an open ticket with BMC Support but thought I might get a solution
 quicker from the Arslist here.



 Thanks,

 Eric



 _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_



 _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_
  _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-26 Thread ZHANG, ERIC L
Yes, I did initial log analysis. As I said in the original posting,
there was 5-minutes gap in the api log, while no gap/waiting/error/long
operation was showing in the sql log and escalation log. All the sql
queries were for user AR_ESCALATOR in the sql log.

 

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Wednesday, January 26, 2011 8:18 AM
Subject: Re: Strange ARS Timeout Problem

 

** What do the logs say?  I haven't seen that you've done analysis with
the logs.  Is there a gap in time in the logs (indicating the server was
not doing anything)?  Is there are gap in time in the logs (indicating a
long operation was running?

On Tue, Jan 25, 2011 at 5:49 PM, ZHANG, ERIC L ezh...@entergy.com
wrote:

** 

We have sent BMC tech support all the logs including api, filter, sql,
escalation, thread, plug-in, arfork, even pstack output that were taken
during hanging, and so far they haven't been able to identify the cause
of the problem.

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem

 

** Try to get the api, filter, and sql logs leading up to the point
where it started hanging.  Those are your best indicator.  Also check
the arerror.log for crashes.

 

There are things that can cause behavior like this that the logs will
indicate.  For example, try creating a computed group during production
operations, or importing a deployable application.

On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com
wrote:

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We have about 300 concurrent Remedy users during the peak hours.
ARServer is running as non-root process. The number of open file
descriptors for arserverd (~700) was well below the ulimit 3072.  The
FAST and LIST threads never reached the maximums.

 

I have an open ticket with BMC Support but thought I might get a
solution quicker from the Arslist here.

 

Thanks,

Eric

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 


_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-25 Thread ZHANG, ERIC L
We have sent BMC tech support all the logs including api, filter, sql,
escalation, thread, plug-in, arfork, even pstack output that were taken
during hanging, and so far they haven't been able to identify the cause
of the problem.

 

-Original Message-
From: Axton [mailto:axton.gr...@gmail.com] 
Sent: Monday, January 24, 2011 5:45 PM
Subject: Re: Strange ARS Timeout Problem

 

** Try to get the api, filter, and sql logs leading up to the point
where it started hanging.  Those are your best indicator.  Also check
the arerror.log for crashes.

 

There are things that can cause behavior like this that the logs will
indicate.  For example, try creating a computed group during production
operations, or importing a deployable application.

On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com
wrote:

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We have about 300 concurrent Remedy users during the peak hours.
ARServer is running as non-root process. The number of open file
descriptors for arserverd (~700) was well below the ulimit 3072.  The
FAST and LIST threads never reached the maximums.

 

I have an open ticket with BMC Support but thought I might get a
solution quicker from the Arslist here.

 

Thanks,

Eric

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-24 Thread ZHANG, ERIC L
Interesting you asked about it. When we first encountered the problem,
the Cursor Sharing was already set to FORCE both in the Oracle database
and ar.conf.  The DBA then changed it to be EXACT in the database while
we kept it FORCE in ar.conf.  This change has improved the performance
noticeably, especially for the user to refresh Assigned Work table on
the Incident Management console. But the change didn't eliminate the
timeout problem.

 

-Original Message-
From: patchsk [mailto:vamsi...@gmail.com] 
Sent: Sunday, January 23, 2011 1:08 PM
Subject: Re: Strange ARS Timeout Problem

 

** What is the value for cursor sharing at the db level and on ar.conf
file. Try to make it to Forced and see if it fixes this issue.  _attend
WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-24 Thread Axton
Try to get the api, filter, and sql logs leading up to the point where it
started hanging.  Those are your best indicator.  Also check the arerror.log
for crashes.

There are things that can cause behavior like this that the logs will
indicate.  For example, try creating a computed group during production
operations, or importing a deployable application.

On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L ezh...@entergy.com wrote:

 **

 Hi Listers.



 We are experiencing intermittent timeouts with the ARS. Without me doing
 anything, the AR system becomes normal again after about 5 minutes. All
 users are getting timeout (or hourglass) but no process is being restarted
 in armonitor.log.



 This is the message showing in arerror.log:



 Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
 busy server -- retry the operation (server_name)  ARERR - 93

 Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
 consider using more specific search criteria to narrow the results, and
 retry the operation (ARERR 94)



 In the API log, it shows a 5-minute gap:



 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

 API  TID: 04 RPC ID: 00 Queue: Admin 
 Client-RPC: 99USER: Remedy Application Service
  /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
 schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address



 Our DBA was monitoring the database during the time and found few
 activities in the database. The activities shown in SQL log during the
 timeout were all for user AR_ESCALATOR, which means the escalation was still
 running during the time. This can also be verified from the escalation log.



 When this occurs, the CPU and RAM utilizations are dramatically dropping
 to the lowest levels on both the ARS server and the database server. There
 was no application change in the last couple of months. The problem started
 about two weeks ago. It could occur 3 times a day and sometimes it works
 fine for days without it occurring.



 Our configuration/environment:



 ARS: 7.1 patch 7

 ITSM: 7.0.03 patch 9

 SLM: 7.1 patch 2

 SRM: 2.2 patch 4

 Midtier: 7.6.03



 ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) –
 Dedicated to ARServer, ITSM, SLM, and SRM.

 Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) – Used only
 by customers to submit service request.

 Database: Oracle: 10gR2 (remote)



 The following are threads settings in ar.conf:



 Private-RPC-Socket:  390601   2   6

 Private-RPC-Socket:  390603   2   2

 Private-RPC-Socket:  390620  16  24  (FAST)

 Private-RPC-Socket:  390626   8  16

 Private-RPC-Socket:  390627   2  12

 Private-RPC-Socket:  390635  24  30  (LIST)

 Private-RPC-Socket:  390680  24  24

 Private-RPC-Socket:  390693   2   4

 Private-RPC-Socket:  390698   2   4



 We have about 300 concurrent Remedy users during the peak hours. ARServer
 is running as non-root process. The number of open file descriptors for
 arserverd (~700) was well below the ulimit 3072.  The FAST and LIST
 threads never reached the maximums.



 I have an open ticket with BMC Support but thought I might get a solution
 quicker from the Arslist here.



 Thanks,

 Eric


  _attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-24 Thread Ali A. Musa
You can use RunMacro with (-d) debugging for all case submit, update and delete 
and see the result of the debugging, general speaking the setup in the 
application is not completed represented in the database you can try admin-tool 
and see wht is not in the form and not in the database such as indexes, ... etc.

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of Axton
Sent: Tuesday, January 25, 2011 2:45 AM
To: arslist@ARSLIST.ORG
Subject: Re: Strange ARS Timeout Problem

** Try to get the api, filter, and sql logs leading up to the point where it 
started hanging.  Those are your best indicator.  Also check the arerror.log 
for crashes.

There are things that can cause behavior like this that the logs will indicate. 
 For example, try creating a computed group during production operations, or 
importing a deployable application.
On Thu, Jan 20, 2011 at 3:10 PM, ZHANG, ERIC L 
ezh...@entergy.commailto:ezh...@entergy.com wrote:
**
Hi Listers.

We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All users 
are getting timeout (or hourglass) but no process is being restarted in 
armonitor.log.

This is the message showing in arerror.log:

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to busy 
server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- consider 
using more specific search criteria to narrow the results, and retry the 
operation (ARERR 94)

In the API log, it shows a 5-minute gap:

API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99USER: Remedy Application Service/* Tue Jan 18 
2011 12:06:16.2224 */-GLEWFOK
API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99USER: Remedy Application Service/* Tue Jan 18 
2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields -- schema OBJSTR:Class 
from Unidentified Client (protocol 12) at IP address

Our DBA was monitoring the database during the time and found few activities in 
the database. The activities shown in SQL log during the timeout were all for 
user AR_ESCALATOR, which means the escalation was still running during the 
time. This can also be verified from the escalation log.

When this occurs, the CPU and RAM utilizations are dramatically dropping to the 
lowest levels on both the ARS server and the database server. There was no 
application change in the last couple of months. The problem started about two 
weeks ago. It could occur 3 times a day and sometimes it works fine for days 
without it occurring.

Our configuration/environment:

ARS: 7.1 patch 7
ITSM: 7.0.03 patch 9
SLM: 7.1 patch 2
SRM: 2.2 patch 4
Midtier: 7.6.03

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - 
Dedicated to ARServer, ITSM, SLM, and SRM.
Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by 
customers to submit service request.
Database: Oracle: 10gR2 (remote)

The following are threads settings in ar.conf:

Private-RPC-Socket:  390601   2   6
Private-RPC-Socket:  390603   2   2
Private-RPC-Socket:  390620  16  24  (FAST)
Private-RPC-Socket:  390626   8  16
Private-RPC-Socket:  390627   2  12
Private-RPC-Socket:  390635  24  30  (LIST)
Private-RPC-Socket:  390680  24  24
Private-RPC-Socket:  390693   2   4
Private-RPC-Socket:  390698   2   4

We have about 300 concurrent Remedy users during the peak hours. ARServer is 
running as non-root process. The number of open file descriptors for arserverd 
(~700) was well below the ulimit 3072.  The FAST and LIST threads never reached 
the maximums.

I have an open ticket with BMC Support but thought I might get a solution 
quicker from the Arslist here.

Thanks,
Eric

_attend WWRUG11 www.wwrug.comhttp://www.wwrug.com ARSlist: Where the Answers 
Are_

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-23 Thread patchsk
What is the value for cursor sharing at the db level and on ar.conf file. 
Try to make it to Forced and see if it fixes this issue. 

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are

Re: Strange ARS Timeout Problem

2011-01-21 Thread ZHANG, ERIC L
** 

Thanks, Mark.

 

I did go through the escalations that were running during the timeout
and couldn't find anything out of the ordinary.  The escalation log
shows that all the escalation were completed in a fraction of a second
and no delays are showed in the sql log either.  

 

Eric

 

 

-Original Message-
From: Brittain, Mark [mailto:mbritt...@navisite.com] 
Sent: Thursday, January 20, 2011 3:30 PM
Subject: Re: Strange ARS Timeout Problem

 

Hi Eric,

 

Couple things you might check. 

Have you checked the indexing against the Run If in the escalations?
NULL in the Run If ignores indexing and should be avoided.

 

If you have a time calculation is the field on one side and the
calculation on the other (Create Date  $TIMESTAMP$ - 3600) vs. (Create
Date +3600  $TIMESTAMP$). Calculating on the field value is slower.

 

Is there a SQL query to an external table in the Set Field action? Could
be a change/cause there. 

 

Likewise is the escalation doing a set field using information from
another form that you users frequently use? If so the issue might be the
indexing there.

 

These are small things that you can get away with when there is a
relatively limited number of records. Then at some magic number the
warts start to show.

 

Hope this helps and good luck. 

 

Mark

 

From: Action Request System discussion list(ARSList)
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 20, 2011 4:11 PM
To: arslist@ARSLIST.ORG
Subject: Strange ARS Timeout Problem

 

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We have about 300 concurrent Remedy users during the peak hours.
ARServer is running as non-root process. The number of open file
descriptors for arserverd (~700) was well below the ulimit 3072.  The
FAST and LIST threads never reached the maximums.

 

I have an open ticket with BMC Support but thought I might get a
solution quicker from the Arslist here.

 

Thanks,

Eric

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

 



This e-mail is the property of NaviSite, Inc. It is intended only for
the person or entity to which it is addressed and may contain
information that is privileged, confidential, or otherwise protected
from disclosure. Distribution or copying of this e-mail, or the
information contained herein, to anyone other than the intended
recipient is prohibited.

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_

Re: Strange ARS Timeout Problem

2011-01-21 Thread ZHANG, ERIC L
Dennis,

 

I have been trying to get our network guys to set up sniffer to monitor
the network traffic to/from ARServer.  I will pass the DNS info to them.

 

Thanks,

Eric

 

 

-Original Message-
From: Dennis Ruble [mailto:ddru...@rockwellcollins.com] 
Sent: Thursday, January 20, 2011 3:32 PM
Subject: Re: Strange ARS Timeout Problem

 

** 
Eric, 
We had a similar symptom many years back.  There were 3 DNS servers
configured for our AR System server.  Over time the first and second
ones were retired and the DNS configuration did not get updated.  So,
for every DNS call the system had to wait for the first and second
servers to timeout before trying the third and if the third was busy
everything just went to sleep while waiting for a response.  We updated
our DNS config and hosts file and everything returned to normal.   

Suppose there might also be other resources besides DNS servers that
could cause the same symptom.  Our network guys sniffed the network to
see what we were waiting on. 

HTH, 
Dennis 





ZHANG, ERIC L ezh...@entergy.com 
Sent by: Action Request System discussion list(ARSList)
arslist@ARSLIST.ORG 

01/20/2011 03:10 PM 

Please respond to
arslist@ARSLIST.ORG

To

arslist@ARSLIST.ORG 

cc

 

Subject

Strange ARS Timeout Problem

 

 

 




** 
Hi Listers. 
  
We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 
  
This is the message showing in arerror.log: 
  
Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93 
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94) 
  
In the API log, it shows a 5-minute gap: 
  
API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK 
API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

  
Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log. 
  
When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring. 
  
Our configuration/environment: 
  
ARS: 7.1 patch 7 
ITSM: 7.0.03 patch 9 
SLM: 7.1 patch 2 
SRM: 2.2 patch 4 
Midtier: 7.6.03 
  
ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM. 
Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request. 
Database: Oracle: 10gR2 (remote) 
  
The following are threads settings in ar.conf: 
  
Private-RPC-Socket:  390601   2   6 
Private-RPC-Socket:  390603   2   2 
Private-RPC-Socket:  390620  16  24  (FAST) 
Private-RPC-Socket:  390626   8  16 
Private-RPC-Socket:  390627   2  12 
Private-RPC-Socket:  390635  24  30  (LIST) 
Private-RPC-Socket:  390680  24  24 
Private-RPC-Socket:  390693   2   4 
Private-RPC-Socket:  390698   2   4 
  
We have about 300 concurrent Remedy users during the peak hours.
ARServer is running as non-root process. The number of open file
descriptors for arserverd (~700) was well below the ulimit 3072.  The
FAST and LIST threads never reached the maximums. 
  
I have an open ticket with BMC Support but thought I might get a
solution quicker from the Arslist here. 
  
Thanks, 
Eric 
  

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-21 Thread ZHANG, ERIC L
** 

Yes.  The DBA said there were no locks and blocks during timeouts.
-Thanks, Eric

 

 

-Original Message-
From: LJ LongWing [mailto:lj.longw...@gmail.com] 
Sent: Thursday, January 20, 2011 3:40 PM
Subject: Re: Strange ARS Timeout Problem

 

Eric,

Did your DBA look for any locking?  We used to experience this a lot
till we figure out what was happening.  I could give you SQLServer
instructions on how to find it...but you aren't using that...and your
DBA should be able to run a query that'll tell you about things that are
causing blocking and things that are blocked.

 

From: Action Request System discussion list(ARSList)
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 20, 2011 2:11 PM
To: arslist@ARSLIST.ORG
Subject: Strange ARS Timeout Problem

 

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being
restarted in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due
to busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few
activities in the database. The activities shown in SQL log during the
timeout were all for user AR_ESCALATOR, which means the escalation was
still running during the time. This can also be verified from the
escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping
to the lowest levels on both the ARS server and the database server.
There was no application change in the last couple of months. The
problem started about two weeks ago. It could occur 3 times a day and
sometimes it works fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs)
- Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used
only by customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We have about 300 concurrent Remedy users during the peak hours.
ARServer is running as non-root process. The number of open file
descriptors for arserverd (~700) was well below the ulimit 3072.  The
FAST and LIST threads never reached the maximums.

 

I have an open ticket with BMC Support but thought I might get a
solution quicker from the Arslist here.

 

Thanks,

Eric

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-20 Thread Brittain, Mark
Hi Eric,

Couple things you might check.
Have you checked the indexing against the Run If in the escalations? NULL in 
the Run If ignores indexing and should be avoided.

If you have a time calculation is the field on one side and the calculation on 
the other (Create Date  $TIMESTAMP$ - 3600) vs. (Create Date +3600  
$TIMESTAMP$). Calculating on the field value is slower.

Is there a SQL query to an external table in the Set Field action? Could be a 
change/cause there.

Likewise is the escalation doing a set field using information from another 
form that you users frequently use? If so the issue might be the indexing there.

These are small things that you can get away with when there is a relatively 
limited number of records. Then at some magic number the warts start to show.

Hope this helps and good luck.

Mark

From: Action Request System discussion list(ARSList) 
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 20, 2011 4:11 PM
To: arslist@ARSLIST.ORG
Subject: Strange ARS Timeout Problem

**
Hi Listers.

We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All users 
are getting timeout (or hourglass) but no process is being restarted in 
armonitor.log.

This is the message showing in arerror.log:

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to busy 
server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- consider 
using more specific search criteria to narrow the results, and retry the 
operation (ARERR 94)

In the API log, it shows a 5-minute gap:

API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99USER: Remedy Application Service/* Tue Jan 18 
2011 12:06:16.2224 */-GLEWFOK
API  TID: 04 RPC ID: 00 Queue: Admin  Client-RPC: 
99USER: Remedy Application Service/* Tue Jan 18 
2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields -- schema OBJSTR:Class 
from Unidentified Client (protocol 12) at IP address

Our DBA was monitoring the database during the time and found few activities in 
the database. The activities shown in SQL log during the timeout were all for 
user AR_ESCALATOR, which means the escalation was still running during the 
time. This can also be verified from the escalation log.

When this occurs, the CPU and RAM utilizations are dramatically dropping to the 
lowest levels on both the ARS server and the database server. There was no 
application change in the last couple of months. The problem started about two 
weeks ago. It could occur 3 times a day and sometimes it works fine for days 
without it occurring.

Our configuration/environment:

ARS: 7.1 patch 7
ITSM: 7.0.03 patch 9
SLM: 7.1 patch 2
SRM: 2.2 patch 4
Midtier: 7.6.03

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) - 
Dedicated to ARServer, ITSM, SLM, and SRM.
Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by 
customers to submit service request.
Database: Oracle: 10gR2 (remote)

The following are threads settings in ar.conf:

Private-RPC-Socket:  390601   2   6
Private-RPC-Socket:  390603   2   2
Private-RPC-Socket:  390620  16  24  (FAST)
Private-RPC-Socket:  390626   8  16
Private-RPC-Socket:  390627   2  12
Private-RPC-Socket:  390635  24  30  (LIST)
Private-RPC-Socket:  390680  24  24
Private-RPC-Socket:  390693   2   4
Private-RPC-Socket:  390698   2   4

We have about 300 concurrent Remedy users during the peak hours. ARServer is 
running as non-root process. The number of open file descriptors for arserverd 
(~700) was well below the ulimit 3072.  The FAST and LIST threads never reached 
the maximums.

I have an open ticket with BMC Support but thought I might get a solution 
quicker from the Arslist here.

Thanks,
Eric

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_


This e-mail is the property of NaviSite, Inc. It is intended only for the 
person or entity to which it is addressed and may contain information that is 
privileged, confidential, or otherwise protected from disclosure. Distribution 
or copying of this e-mail, or the information contained herein, to anyone other 
than the intended recipient is prohibited.

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-20 Thread Dennis Ruble
Eric,
We had a similar symptom many years back.  There were 3 DNS servers 
configured for our AR System server.  Over time the first and second ones 
were retired and the DNS configuration did not get updated.  So, for every 
DNS call the system had to wait for the first and second servers to 
timeout before trying the third and if the third was busy everything just 
went to sleep while waiting for a response.  We updated our DNS config and 
hosts file and everything returned to normal. 

Suppose there might also be other resources besides DNS servers that could 
cause the same symptom.  Our network guys sniffed the network to see what 
we were waiting on.

HTH,
Dennis





ZHANG, ERIC L ezh...@entergy.com 
Sent by: Action Request System discussion list(ARSList) 
arslist@ARSLIST.ORG
01/20/2011 03:10 PM
Please respond to
arslist@ARSLIST.ORG


To
arslist@ARSLIST.ORG
cc

Subject
Strange ARS Timeout Problem






** 
Hi Listers.
 
We are experiencing intermittent timeouts with the ARS. Without me doing 
anything, the AR system becomes normal again after about 5 minutes. All 
users are getting timeout (or hourglass) but no process is being restarted 
in armonitor.log. 
 
This is the message showing in arerror.log:
 
Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to 
busy server -- retry the operation (server_name)  ARERR - 93
Tue Jan 18 12:10:04 2011  Approve : Timeout during database query -- 
consider using more specific search criteria to narrow the results, and 
retry the operation (ARERR 94)
 
In the API log, it shows a 5-minute gap:
 
API  TID: 04 RPC ID: 00 Queue: Admin  
Client-RPC: 99USER: Remedy Application Service   /* Tue Jan 18 
2011 12:06:16.2224 */-GLEWFOK
API  TID: 04 RPC ID: 00 Queue: Admin  
Client-RPC: 99USER: Remedy Application Service   /* Tue Jan 18 
2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields -- schema 
OBJSTR:Class from Unidentified Client (protocol 12) at IP address
 
Our DBA was monitoring the database during the time and found few 
activities in the database. The activities shown in SQL log during the 
timeout were all for user AR_ESCALATOR, which means the escalation was 
still running during the time. This can also be verified from the 
escalation log.
 
When this occurs, the CPU and RAM utilizations are dramatically dropping 
to the lowest levels on both the ARS server and the database server. There 
was no application change in the last couple of months. The problem 
started about two weeks ago. It could occur 3 times a day and sometimes it 
works fine for days without it occurring.
 
Our configuration/environment:
 
ARS: 7.1 patch 7
ITSM: 7.0.03 patch 9
SLM: 7.1 patch 2
SRM: 2.2 patch 4
Midtier: 7.6.03
 
ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) ? 
Dedicated to ARServer, ITSM, SLM, and SRM.
Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) ? Used only 
by customers to submit service request.
Database: Oracle: 10gR2 (remote)
 
The following are threads settings in ar.conf:
 
Private-RPC-Socket:  390601   2   6
Private-RPC-Socket:  390603   2   2
Private-RPC-Socket:  390620  16  24  (FAST)
Private-RPC-Socket:  390626   8  16
Private-RPC-Socket:  390627   2  12
Private-RPC-Socket:  390635  24  30  (LIST)
Private-RPC-Socket:  390680  24  24
Private-RPC-Socket:  390693   2   4
Private-RPC-Socket:  390698   2   4
 
We have about 300 concurrent Remedy users during the peak hours. ARServer 
is running as non-root process. The number of open file descriptors for 
arserverd (~700) was well below the ulimit 3072.  The FAST and LIST 
threads never reached the maximums.
 
I have an open ticket with BMC Support but thought I might get a solution 
quicker from the Arslist here.
 
Thanks,
Eric
 
_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 

___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are


Re: Strange ARS Timeout Problem

2011-01-20 Thread LJ LongWing
Eric,

Did your DBA look for any locking?  We used to experience this a lot till we
figure out what was happening.  I could give you SQLServer instructions on
how to find it.but you aren't using that.and your DBA should be able to run
a query that'll tell you about things that are causing blocking and things
that are blocked.

 

From: Action Request System discussion list(ARSList)
[mailto:arslist@ARSLIST.ORG] On Behalf Of ZHANG, ERIC L
Sent: Thursday, January 20, 2011 2:11 PM
To: arslist@ARSLIST.ORG
Subject: Strange ARS Timeout Problem

 

** 

Hi Listers.

 

We are experiencing intermittent timeouts with the ARS. Without me doing
anything, the AR system becomes normal again after about 5 minutes. All
users are getting timeout (or hourglass) but no process is being restarted
in armonitor.log. 

 

This is the message showing in arerror.log:

 

Tue Jan 18 12:09:24 2011  Dispatch : Timeout during data retrieval due to
busy server -- retry the operation (server_name)  ARERR - 93

Tue Jan 18 12:10:04 2011  Approve : Timeout during database query --
consider using more specific search criteria to narrow the results, and
retry the operation (ARERR 94)

 

In the API log, it shows a 5-minute gap:

 

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:06:16.2224 */-GLEWFOK

API  TID: 04 RPC ID: 00 Queue: Admin 
Client-RPC: 99USER: Remedy Application Service
 /* Tue Jan 18 2011 12:11:16.0001 */+GLEWF  ARGetListEntryWithFields --
schema OBJSTR:Class from Unidentified Client (protocol 12) at IP address

 

Our DBA was monitoring the database during the time and found few activities
in the database. The activities shown in SQL log during the timeout were all
for user AR_ESCALATOR, which means the escalation was still running during
the time. This can also be verified from the escalation log.

 

When this occurs, the CPU and RAM utilizations are dramatically dropping to
the lowest levels on both the ARS server and the database server. There was
no application change in the last couple of months. The problem started
about two weeks ago. It could occur 3 times a day and sometimes it works
fine for days without it occurring.

 

Our configuration/environment:

 

ARS: 7.1 patch 7

ITSM: 7.0.03 patch 9

SLM: 7.1 patch 2

SRM: 2.2 patch 4

Midtier: 7.6.03

 

ARS Server: Solaris 10 (16 GB of Physical Memory, 18 GB of SWAP, 8 CPUs) -
Dedicated to ARServer, ITSM, SLM, and SRM.

Midtier Server: Windows Server 2003 SP2 (2 CPUs, 2 GB of RAM) - Used only by
customers to submit service request.

Database: Oracle: 10gR2 (remote)

 

The following are threads settings in ar.conf:

 

Private-RPC-Socket:  390601   2   6

Private-RPC-Socket:  390603   2   2

Private-RPC-Socket:  390620  16  24  (FAST)

Private-RPC-Socket:  390626   8  16

Private-RPC-Socket:  390627   2  12

Private-RPC-Socket:  390635  24  30  (LIST)

Private-RPC-Socket:  390680  24  24

Private-RPC-Socket:  390693   2   4

Private-RPC-Socket:  390698   2   4

 

We have about 300 concurrent Remedy users during the peak hours. ARServer is
running as non-root process. The number of open file descriptors for
arserverd (~700) was well below the ulimit 3072.  The FAST and LIST threads
never reached the maximums.

 

I have an open ticket with BMC Support but thought I might get a solution
quicker from the Arslist here.

 

Thanks,

Eric

 

_attend WWRUG11 www.wwrug.com ARSlist: Where the Answers Are_ 


___
UNSUBSCRIBE or access ARSlist Archives at www.arslist.org
attend wwrug11 www.wwrug.com ARSList: Where the Answers Are