Re: please help - ANR0918E

2003-09-24 Thread Rushforth, Tim
Hi Chris:



The format of the option is RESOURCETIMEOUT 180 and is placed in
dsmserv.opt. You can issue q opt res* to see your current setting.  Note
that this didn't really help us at all.



We eventually split up our TSM server into 2 servers (not just because of
this) and we haven't had any problems since!  We eventually got IBM to open
an APAR on this - IC36769.



I am also sending this mail to the list so other people have this info.



This is the response from IBM in our open PMR (42130):



Action taken: I got answer from client developers that they have fix for

that in 2003 but still working on fix for 2000 clients. That will be

done in some of next releases of code.

From developers

The problem is locking on the server, the length of time the locks are

held and the resourcewait setting on the server.  This is not a problem

the server is able to resolve.  The server is working correctly.  The

system files is a long running transaction.  If another session needs to

lock the system object filespace while the system files transaction is

being committed, and that transaction takes a very long time (longer

than the resourcewait time) to commit, then this situation occurs and

there is nothing the server can do about it because the server is doing

everything correctly.

.

The long term solution is when the client is finally updated to process

the system files in multiple transactions rather than in a single

transaction.  When that update is made then there will no longer be a

transaction with tens or hundreds of thousands of files in a single

transaction causing this problem.  At the current time the transaction

commit for system files can take hours because of the number of files

involved in the single transaction.  Note, the problem is ONLY with the

commit time, not the length of the entire transaction since the locks

are only grabbed after the data movement, during end transaction

processing, to limit the length of time locks are held.  Until that

update can be made by the client team the only other possible fix is for

the backup of the SYSTEM OBJECTS filespace to be single threaded.

.

Again, I want to make it clear that this problem is not caused by the

server improperly handling something.  The server is properly handling

the backup and the server is properly terminating the backup because of

the length of time being waited on a lock caused by the length of time

it takes to commit the transaction of the system files.

.

 Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:Cross-txn

grouping for system object) sometimes ago to address this problem. This

problem is solved for Windows 2003 VSS work using the new grouping and I

think the same will be done for Windows 2000/XP.

---



I've just searched on this APAR and it is now closed as a suggestion for
future release so don't expect a fix soon!




APAR status


Closed as suggestion for future release.


Error description




The TSM backup of a Windows system object runs as a single

transaction. Because the backup of the system object can

take quite a long time, due to the number of physical

objects that make up the system object, the backup transaction

can hold locks on the TSM server for a very long time. In a

multithreaded client enviroment other client threads for this

same node may end up having their transaction time out waiting

for the lock(s) held by the system object transaction. When

this occurs the following messages are seen:

ANR0538I A resource waiter has been aborted.

ANR0918E Inventory Query Backup for node ABC terminated - lock

 conflict

While neither client nor server code logic is in error here,

a modification to the transaction processing of system

objects should be made to avoid terminating other client

sessions associated with a muilt-threaded (mult-session)

backup.


Local fix




1 - Do not include system objects in the normal backup.

They can be excluded by:

  Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS

  Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES

2 - Backing up the system objects later using dsmc -optfile=

where the optfile has a resourceutilization set to 1 so that

the backup is single threaded.







Tim Rushforth

City of Winnipeg



-Original Message-
From: Rees, Chris ( Corp ) [mailto:[EMAIL PROTECTED]
Sent: September 24, 2003 3:38 AM
To: [EMAIL PROTECTED]
Subject: please help - ANR0918E



Hi Tim



Hope you don't mind me emailing you directly. !



Just wondered if you got this sorted. I found the thread below on adsm.org.
We are having exactly the same problems, i.e lock conflict and w2k backup
sessions hanging.



I am willing to change resource timeout but can't see it in dsmserv.opt.
Where do you change it?



Any help greatly appreciated



Regards



Chris







Forum:   ADSM.ORG - ADSM / TSM Mailing List Archive
 Date:  May 20, 15:57
 From:  Rushforth, Tim  mailto

Re: please help - ANR0918E

2003-09-24 Thread Bill Boyer
Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new
shadow copy services. But you need a TSM 5.2 server to take advantage of it.

From a previous post by Andy Raibeck:

In addition, the Windows 2003 system state/service backups use a different
transaction protocol that doesn't pin the server recovery log for
extensive periods of time, as might the system object backup method.
This support required changes on the server side as well, and thus the
requirement for a 5.2 server.



Bill Boyer
DSS, Inc.


-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] Behalf Of
Rushforth, Tim
Sent: Wednesday, September 24, 2003 10:35 AM
To: [EMAIL PROTECTED]
Subject: Re: please help - ANR0918E


Hi Chris:



The format of the option is RESOURCETIMEOUT 180 and is placed in
dsmserv.opt. You can issue q opt res* to see your current setting.  Note
that this didn't really help us at all.



We eventually split up our TSM server into 2 servers (not just because of
this) and we haven't had any problems since!  We eventually got IBM to open
an APAR on this - IC36769.



I am also sending this mail to the list so other people have this info.



This is the response from IBM in our open PMR (42130):



Action taken: I got answer from client developers that they have fix for

that in 2003 but still working on fix for 2000 clients. That will be

done in some of next releases of code.

From developers

The problem is locking on the server, the length of time the locks are

held and the resourcewait setting on the server.  This is not a problem

the server is able to resolve.  The server is working correctly.  The

system files is a long running transaction.  If another session needs to

lock the system object filespace while the system files transaction is

being committed, and that transaction takes a very long time (longer

than the resourcewait time) to commit, then this situation occurs and

there is nothing the server can do about it because the server is doing

everything correctly.

.

The long term solution is when the client is finally updated to process

the system files in multiple transactions rather than in a single

transaction.  When that update is made then there will no longer be a

transaction with tens or hundreds of thousands of files in a single

transaction causing this problem.  At the current time the transaction

commit for system files can take hours because of the number of files

involved in the single transaction.  Note, the problem is ONLY with the

commit time, not the length of the entire transaction since the locks

are only grabbed after the data movement, during end transaction

processing, to limit the length of time locks are held.  Until that

update can be made by the client team the only other possible fix is for

the backup of the SYSTEM OBJECTS filespace to be single threaded.

.

Again, I want to make it clear that this problem is not caused by the

server improperly handling something.  The server is properly handling

the backup and the server is properly terminating the backup because of

the length of time being waited on a lock caused by the length of time

it takes to commit the transaction of the system files.

.

 Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:Cross-txn

grouping for system object) sometimes ago to address this problem. This

problem is solved for Windows 2003 VSS work using the new grouping and I

think the same will be done for Windows 2000/XP.

---



I've just searched on this APAR and it is now closed as a suggestion for
future release so don't expect a fix soon!




APAR status


Closed as suggestion for future release.


Error description




The TSM backup of a Windows system object runs as a single

transaction. Because the backup of the system object can

take quite a long time, due to the number of physical

objects that make up the system object, the backup transaction

can hold locks on the TSM server for a very long time. In a

multithreaded client enviroment other client threads for this

same node may end up having their transaction time out waiting

for the lock(s) held by the system object transaction. When

this occurs the following messages are seen:

ANR0538I A resource waiter has been aborted.

ANR0918E Inventory Query Backup for node ABC terminated - lock

 conflict

While neither client nor server code logic is in error here,

a modification to the transaction processing of system

objects should be made to avoid terminating other client

sessions associated with a muilt-threaded (mult-session)

backup.


Local fix




1 - Do not include system objects in the normal backup.

They can be excluded by:

  Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS

  Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES

2 - Backing up the system objects later using dsmc -optfile=

where the optfile has a resourceutilization set to 1 so

Re: please help - ANR0918E

2003-09-24 Thread Rushforth, Tim
Yes this is fixed for 2003 clients only - doesn't help with 2000 clients
-that is what the APAR was opened for.  IBM has said maybe we'll fix this
for 2000 clients in a future release - but we'll see!

-Original Message-
From: Bill Boyer [mailto:[EMAIL PROTECTED]
Sent: September 24, 2003 9:51 AM
To: [EMAIL PROTECTED]
Subject: Re: please help - ANR0918E

Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new
shadow copy services. But you need a TSM 5.2 server to take advantage of it.

From a previous post by Andy Raibeck:

In addition, the Windows 2003 system state/service backups use a different
transaction protocol that doesn't pin the server recovery log for
extensive periods of time, as might the system object backup method.
This support required changes on the server side as well, and thus the
requirement for a 5.2 server.



Bill Boyer
DSS, Inc.