Re: please help - ANR0918E
Hi Chris: The format of the option is RESOURCETIMEOUT 180 and is placed in dsmserv.opt. You can issue q opt res* to see your current setting. Note that this didn't really help us at all. We eventually split up our TSM server into 2 servers (not just because of this) and we haven't had any problems since! We eventually got IBM to open an APAR on this - IC36769. I am also sending this mail to the list so other people have this info. This is the response from IBM in our open PMR (42130): Action taken: I got answer from client developers that they have fix for that in 2003 but still working on fix for 2000 clients. That will be done in some of next releases of code. From developers The problem is locking on the server, the length of time the locks are held and the resourcewait setting on the server. This is not a problem the server is able to resolve. The server is working correctly. The system files is a long running transaction. If another session needs to lock the system object filespace while the system files transaction is being committed, and that transaction takes a very long time (longer than the resourcewait time) to commit, then this situation occurs and there is nothing the server can do about it because the server is doing everything correctly. . The long term solution is when the client is finally updated to process the system files in multiple transactions rather than in a single transaction. When that update is made then there will no longer be a transaction with tens or hundreds of thousands of files in a single transaction causing this problem. At the current time the transaction commit for system files can take hours because of the number of files involved in the single transaction. Note, the problem is ONLY with the commit time, not the length of the entire transaction since the locks are only grabbed after the data movement, during end transaction processing, to limit the length of time locks are held. Until that update can be made by the client team the only other possible fix is for the backup of the SYSTEM OBJECTS filespace to be single threaded. . Again, I want to make it clear that this problem is not caused by the server improperly handling something. The server is properly handling the backup and the server is properly terminating the backup because of the length of time being waited on a lock caused by the length of time it takes to commit the transaction of the system files. . Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:Cross-txn grouping for system object) sometimes ago to address this problem. This problem is solved for Windows 2003 VSS work using the new grouping and I think the same will be done for Windows 2000/XP. --- I've just searched on this APAR and it is now closed as a suggestion for future release so don't expect a fix soon! APAR status Closed as suggestion for future release. Error description The TSM backup of a Windows system object runs as a single transaction. Because the backup of the system object can take quite a long time, due to the number of physical objects that make up the system object, the backup transaction can hold locks on the TSM server for a very long time. In a multithreaded client enviroment other client threads for this same node may end up having their transaction time out waiting for the lock(s) held by the system object transaction. When this occurs the following messages are seen: ANR0538I A resource waiter has been aborted. ANR0918E Inventory Query Backup for node ABC terminated - lock conflict While neither client nor server code logic is in error here, a modification to the transaction processing of system objects should be made to avoid terminating other client sessions associated with a muilt-threaded (mult-session) backup. Local fix 1 - Do not include system objects in the normal backup. They can be excluded by: Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES 2 - Backing up the system objects later using dsmc -optfile= where the optfile has a resourceutilization set to 1 so that the backup is single threaded. Tim Rushforth City of Winnipeg -Original Message- From: Rees, Chris ( Corp ) [mailto:[EMAIL PROTECTED] Sent: September 24, 2003 3:38 AM To: [EMAIL PROTECTED] Subject: please help - ANR0918E Hi Tim Hope you don't mind me emailing you directly. ! Just wondered if you got this sorted. I found the thread below on adsm.org. We are having exactly the same problems, i.e lock conflict and w2k backup sessions hanging. I am willing to change resource timeout but can't see it in dsmserv.opt. Where do you change it? Any help greatly appreciated Regards Chris Forum: ADSM.ORG - ADSM / TSM Mailing List Archive Date: May 20, 15:57 From: Rushforth, Tim mailto
Re: please help - ANR0918E
Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new shadow copy services. But you need a TSM 5.2 server to take advantage of it. From a previous post by Andy Raibeck: In addition, the Windows 2003 system state/service backups use a different transaction protocol that doesn't pin the server recovery log for extensive periods of time, as might the system object backup method. This support required changes on the server side as well, and thus the requirement for a 5.2 server. Bill Boyer DSS, Inc. -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] Behalf Of Rushforth, Tim Sent: Wednesday, September 24, 2003 10:35 AM To: [EMAIL PROTECTED] Subject: Re: please help - ANR0918E Hi Chris: The format of the option is RESOURCETIMEOUT 180 and is placed in dsmserv.opt. You can issue q opt res* to see your current setting. Note that this didn't really help us at all. We eventually split up our TSM server into 2 servers (not just because of this) and we haven't had any problems since! We eventually got IBM to open an APAR on this - IC36769. I am also sending this mail to the list so other people have this info. This is the response from IBM in our open PMR (42130): Action taken: I got answer from client developers that they have fix for that in 2003 but still working on fix for 2000 clients. That will be done in some of next releases of code. From developers The problem is locking on the server, the length of time the locks are held and the resourcewait setting on the server. This is not a problem the server is able to resolve. The server is working correctly. The system files is a long running transaction. If another session needs to lock the system object filespace while the system files transaction is being committed, and that transaction takes a very long time (longer than the resourcewait time) to commit, then this situation occurs and there is nothing the server can do about it because the server is doing everything correctly. . The long term solution is when the client is finally updated to process the system files in multiple transactions rather than in a single transaction. When that update is made then there will no longer be a transaction with tens or hundreds of thousands of files in a single transaction causing this problem. At the current time the transaction commit for system files can take hours because of the number of files involved in the single transaction. Note, the problem is ONLY with the commit time, not the length of the entire transaction since the locks are only grabbed after the data movement, during end transaction processing, to limit the length of time locks are held. Until that update can be made by the client team the only other possible fix is for the backup of the SYSTEM OBJECTS filespace to be single threaded. . Again, I want to make it clear that this problem is not caused by the server improperly handling something. The server is properly handling the backup and the server is properly terminating the backup because of the length of time being waited on a lock caused by the length of time it takes to commit the transaction of the system files. . Jim Smith created a work item (Id:JSMH-5BURL4 Abstract:Cross-txn grouping for system object) sometimes ago to address this problem. This problem is solved for Windows 2003 VSS work using the new grouping and I think the same will be done for Windows 2000/XP. --- I've just searched on this APAR and it is now closed as a suggestion for future release so don't expect a fix soon! APAR status Closed as suggestion for future release. Error description The TSM backup of a Windows system object runs as a single transaction. Because the backup of the system object can take quite a long time, due to the number of physical objects that make up the system object, the backup transaction can hold locks on the TSM server for a very long time. In a multithreaded client enviroment other client threads for this same node may end up having their transaction time out waiting for the lock(s) held by the system object transaction. When this occurs the following messages are seen: ANR0538I A resource waiter has been aborted. ANR0918E Inventory Query Backup for node ABC terminated - lock conflict While neither client nor server code logic is in error here, a modification to the transaction processing of system objects should be made to avoid terminating other client sessions associated with a muilt-threaded (mult-session) backup. Local fix 1 - Do not include system objects in the normal backup. They can be excluded by: Using the domain statement: DOMAIN ALL-LOCAL -SYSTEMOBJECTS Or Using the exclude statement: EXCLUDE.SYSTEMOBJECT SYSFILES 2 - Backing up the system objects later using dsmc -optfile= where the optfile has a resourceutilization set to 1 so
Re: please help - ANR0918E
Yes this is fixed for 2003 clients only - doesn't help with 2000 clients -that is what the APAR was opened for. IBM has said maybe we'll fix this for 2000 clients in a future release - but we'll see! -Original Message- From: Bill Boyer [mailto:[EMAIL PROTECTED] Sent: September 24, 2003 9:51 AM To: [EMAIL PROTECTED] Subject: Re: please help - ANR0918E Yes, it is in the Windows2003 TSM 5.2 client code. It utilizes the new shadow copy services. But you need a TSM 5.2 server to take advantage of it. From a previous post by Andy Raibeck: In addition, the Windows 2003 system state/service backups use a different transaction protocol that doesn't pin the server recovery log for extensive periods of time, as might the system object backup method. This support required changes on the server side as well, and thus the requirement for a 5.2 server. Bill Boyer DSS, Inc.